bayes_spec
Subpackages
Submodules
bayes_spec.base_model
base_model.py - BaseModel definition
Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)
- class bayes_spec.base_model.BaseModel(data: dict[str, SpecData], n_clouds: int, baseline_degree: int = 0, seed: int = 1234, verbose: bool = False)
Bases:
ABCBaseModel defines functions and attributes common to all model definitions.
- add_baseline_priors(prior_baseline_coeffs: dict[str, list[float]] | None = None)
Add baseline priors to the model. The polynomial baseline is evaluated on the normalized data like: baseline_norm = sum_i(coeff[i]/(i+1)**i * spectral_norm**i)
- Parameters:
prior_baseline_coeffs (Optional[dict[str, list[float]]], optional) – Width of normal prior distribution on the normalized baseline polynomial coefficients. Keys are dataset names and values are lists of length baseline_degree+1. If None, use [1.0]*(baseline_degree+1) for each dataset, defaults to None
- abstract add_likelihood(*args, **kwargs)
Must be defined in inhereted class.
- abstract add_priors(*args, **kwargs)
Must be defined in inhereted class.
- property baseline_deterministics: Iterable[str]
Get the deterministic baseline parameter names.
- Returns:
Deterministic baseline parameter names
- Return type:
Iterable[str]
- property baseline_freeRVs: Iterable[str]
Get the free baseline parameter names.
- Returns:
Free baseline parameter names
- Return type:
Iterable[str]
- bic(chain: int | None = None, solution: int | None = None) float
Calculate the Bayesian information criterion at the mean point estimate.
- Parameters:
chain (Optional[int], optional) – Evaluate BIC for this chain using un-clustered posterior samples. If None evaluate across all chains using clustered posterior samples, defaults to None
solution (Optional[int], optional) – Evaluate BIC for this solution. If None use the unique solution if any. If :param:chain is not None, this parameter has no effect, defaults to None
- Returns:
Bayesian information criterion
- Return type:
float
- property cloud_deterministics: Iterable[str]
Get the deterministic cloud parameter names.
- Returns:
Deterministic cloud parameter names
- Return type:
Iterable[str]
- property cloud_freeRVs: Iterable[str]
Get the free cloud parameter names.
- Returns:
Free cloud parameter names
- Return type:
Iterable[str]
- fit(n: int = 1000000, draws: int = 1000, rel_tolerance: float = 0.01, abs_tolerance: float = 0.01, learning_rate: float = 0.001, obj_n_mc: int = 5, start: dict | None = None, **kwargs)
Approximate posterior distribution using Variational Inference (VI).
- Parameters:
n (int, optional) – Number of VI iterations, defaults to 1_000_000
draws (int, optional) – Number of posterior samples to draw, defaults to 1_000
rel_tolerance (float, optional) – Relative parameter tolerance for VI convergence, defaults to 0.01
abs_tolerance (float, optional) – Absolute parameter tolerance for VI convergence, defaults to 0.01
learning_rate (float, optional) – VI learning rate, defaults to 1e-3
obj_n_mc (int, optional) – Number of Monte Carlo gradient samples, defaults to 5
start (Optional[dict], optional) – Starting point, defaults to None
**kwargs – Additional arguments passed to
advi.fit()
- graph() Source
Generate visualization of the model graph. The output can be displayed in-line in a Jupyter notebook, or rendered with graph().render(‘filename’).
- Returns:
Graph visualization
- Return type:
graphviz.sources.Source
- property hyper_deterministics: Iterable[str]
Get the deterministic hyper parameter names.
- Returns:
Deterministic hyper parameter names
- Return type:
Iterable[str]
- property hyper_freeRVs: Iterable[str]
Get the free hyper parameter names.
- Returns:
Free hyper parameter names
- Return type:
Iterable[str]
- property labeller: MapLabeller
Get the arviz labeller.
- Returns:
Arviz labeller
- Return type:
azl.MapLabeller
- mean_lnlike(chain: int | None = None, solution: int | None = None) float
Evaluate mean log-likelihood over posterior samples.
- Parameters:
chain (Optional[int], optional) – Evaluate mean log-likelihood for this chain using un-clustered posterior samples. If None evaluate across all chains using clustered posterior samples, defaults to None
solution (Optional[int], optional) – Evaluate mean log-likelihood for this solution. If None use the unique solution if any. If :param:chain is not None, this parameter has no effect, defaults to None
- Returns:
Mean log-likelihood over posterior samples
- Return type:
float
- null_bic() float
Evaluate the Bayesian Information Criterion for the null hypothesis (baseline only, no clouds)
- Returns:
Null hypothesis BIC
- Return type:
float
- predict_baseline(baseline_params: dict[str, list[float]] | None = None) dict[str, list[float]]
Predict the un-normalized baseline model.
- Parameters:
baseline_params (Optional[dict[str, list[float]]], optional) – Dictionary of baseline parameters with which to evaluate the baseline model. Keys are the same as in the model: “baseline_{key}_norm”, where {key} are the supplied datasets. The values are lists of length baseline_degree+1. If None, evaluate the baseline model using the current model state, defaults to None
- Returns:
Un-normalized baseline models for each dataset. Keys are dataset names and values are the un-normalized baseline models.
- Return type:
dict[str, list[float]]
- reset_results()
Reset results and convergence checks.
- sample(init: str = 'advi+adapt_diag', n_init: int = 1000000, chains: int = 4, init_kwargs: dict | None = None, nuts_kwargs: dict | None = None, **kwargs)
Sample posterior distribution using MCMC.
- Parameters:
init (str, optional) – Initialization strategy, defaults to “advi+adapt_diag”
n_init (int, optional) – Number of initialization iterations, defaults to 1_000_000
chains (int, optional) – Number of independent Markov chains, defaults to 4
init_kwargs (Optional[dict], optional) – Keyword arguments passed to
init_nuts(), defaults to Nonenuts_kwargs (Optional[dict], optional) – Keyword arguments passed to
pymc.NUTS(), defaults to None**kwargs – Additional arguments passed to
pymc.sample()
- sample_posterior_predictive(solution: int | None = None, thin: int = 100) InferenceData
Generate posterior predictive samples
- Parameters:
solution (Optional[int], optional) – Draw posterior predictive samples from this solution index. If None, draw samples from the un-clustered posterior samples, defaults to None
thin (int, optional) – Thin posterior samples by keeping one in :param:thin, defaults to 100
- Raises:
ValueError – No posterior samples
- Returns:
Posterior predictive samples
- Return type:
az.InferenceData
- sample_prior_predictive(samples: int = 50) InferenceData
Generate prior predictive samples
- Parameters:
samples (int, optional) – Number of prior predictive samples to draw, defaults to 50
- Returns:
Prior predictive samples
- Return type:
az.InferenceData
- sample_smc(**kwargs)
Sample posterior distribution using Sequential Monte Carlo.
- Parameters:
**kwargs – Additional arguments passed to
pymc.sample_smc()
- solve(**kwargs)
Identify unique solutions and break the labeling degeneracy. Adds new groups to the trace called solution_{idx} with the label-corrected posterior samples of each unique solution.
- Parameters:
kwargs – Keyword arguments passed to
cluster_posterior()
- property unique_solution: bool
Check if posterior samples suggest a unique solution.
- Raises:
ValueError – No solutions
- Returns:
True if there is a unique solution, False otherwise
- Return type:
bool
bayes_spec.cluster_posterior
cluster_posterior.py - Utilities for clustering posterior samples with Gaussian Mixture Models.
Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)
- bayes_spec.cluster_posterior.cluster_posterior(trace: InferenceData, n_clusters: int, cluster_features: Iterable[str], num_gmm_samples: int = 10000, max_iter: int = 1000, init_params: str = 'random', n_init: int = 10, kl_div_threshold: float = 0.1, seed: int = 1234) list
Identify unique solutions and break the labeling degeneracy. To do so, we (1) fit a Gaussian Mixture Model (GMM) to the posterior samples of each chain individually. (2) calculate the Kullback–Leibler (KL) divergence (mean log-likelihood ratio) between chains. If the KD divergence is smaller than the given threshold, then both chains are part of the same solution. Otherwise, then each chain belongs to a different solution. The KL divergence is calculated from samples drawn from the fitted GMMs following the Monte Carlo procedure of Hershey & Olson (2007) (3) solve the labeling degeneracy by identifying the most common order of components among chains in each solution.
- Parameters:
trace (az.InferenceData) – Posterior samples
n_clusters (int) – Number of GMM clusters
cluster_features (Iterable[str]) – Parameter names to use for clustering
num_gmm_samples (int, optional) – Number of samples to generate from Gaussian Mixture Model (GMM), defaults to 10_000
max_iter (int, optional) – Maximum number of GMM iterations, defaults to 1_000
init_params (str, optional) – GMM initialization strategy, defaults to “random”
n_init (int, optional) – Number of GMM initializations, defaults to 10
kl_div_threshold (float, optional) – Kullback-Liebler (KL) divergence threshold, defaults to 0.1
seed (int, optional) – Random seed, defaults to 1234
- Returns:
Solutions, where each element is a dictionary containing posterior samples and other statistics
- Return type:
list
bayes_spec.nuts
nuts.py - customize pymc’s NUTS initialization
Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)
- bayes_spec.nuts.init_nuts(model: Model, init: str = 'advi+adapt_diag', n_init: int = 100000, chains: int = 4, rel_tolerance: float = 0.001, abs_tolerance: float = 0.001, learning_rate: float = 0.001, obj_n_mc: int = 5, start: dict | None = None, nuts_kwargs: dict = None, seed: int = 1234, verbose: bool = False) tuple[list, NUTS]
Custom NUTS initialization.
- Parameters:
model (pm.Model) – Model to initialize
init (str, optional) – Initialization strategy, defaults to “advi+adapt_diag”
n_init (int, optional) – Number of initialization iterations, defaults to 100_000
chains (int, optional) – Number of independent Markov chains, defaults to 4
rel_tolerance (float, optional) – VI relative convergence threshold, defaults to 0.001
abs_tolerance (float, optional) – VI absolute convergence threshold, defaults to 0.001
learning_rate (float, optional) – VI learning rate, defaults to 1e-3
obj_n_mc (int, optional) – Number of Monte Carlo gradient samples, defaults to 5
start (Optional[dict], optional) – Starting point, defaults to None
nuts_kwargs (dict, optional) – Additional keyword arguments passed to
pm.NUTS, defaults to Noneseed (int, optional) – Random seed, defaults to 1234
verbose (bool, optional) – Verbose output, defaults to False
- Returns:
Initial point and step method
- Return type:
tuple[list, pm.NUTS]
bayes_spec.optimize
optimize.py - Fit spectra with MCMC and determine optimal number of spectral components.
Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)
- class bayes_spec.optimize.Optimize(model_type: Type[BaseModel], *args, max_n_clouds: int = 5, verbose: bool = False, **kwargs)
Bases:
objectOptimize class definition
- add_likelihood(*args, **kwargs)
Add likelihood to the models
- Parameters:
*args – Arguments passed to
model.add_likelihood()**kwargs – Keyword arguments passed to
model.add_likelihood()
- add_priors(*args, **kwargs)
Add priors to the models
- Parameters:
*args – Arguments passed to
model.add_priors()**kwargs – Keyword arguments passed to
model.add_priors()
- property bics: dict[int, float]
Return the Bayesian Information Criteria for the best solution of each model.
- Returns:
BIC for each model, indexed by the number of clouds
- Return type:
dict[int, float]
- fit_all(start_spread: dict[str, Iterable[float]] | None = None, **kwargs)
Fit all models using variational inference.
- Parameters:
start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None
**kwargs – Keyword arguments passed to
model.fit()
- property null_bic: float
Evaluate the Bayesian Information Criterion for the null hypothesis (baseline only, no clouds)
- Returns:
Null hypothesis BIC
- Return type:
float
- optimize(bic_threshold: float = 10.0, fit_kwargs: dict | None = None, sample_kwargs: dict | None = None, solve_kwargs: dict | None = None, start_spread: dict[str, Iterable[float]] | None = None, smc: bool = False, approx: bool = True)
Determine optimal number of clouds by minimizing the Bayesian Information Criterion using MCMC, Sequntial Monte Carlo, or Variational Inference. Models are sampled in sequential order starting with n_clouds = 1 until the stopping criteria are met twice in succession. Then, if approx=True, sample the best model using MCMC or SMC and solve the labeling degeneracy. Stopping criteria are: 1. Model did not converge 2. Model has multiple solutions (excludeing VI results, which only have one chain) 3. BIC did not improve by more than bic_threshold over previous model
- Parameters:
bic_threshold (float, optional) – The best_model is the first with BIC within min(BIC)+bic_threshold, defaults to 10.0
fit_kwargs (Optional[dict], optional) – Keyword arguments passed to
fit(), defaults to Nonesample_kwargs (Optional[dict], optional) – Keyword arguments passed to
sample(), defaults to Nonesolve_kwargs (Optional[dict], optional) – Keyword arguments passed to
solve(), defaults to Nonestart_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None
smc (bool, optional) – If True, sample all models using SMC, defaults to False
approx (bool, optional) – If True, approximate all models using VI, defaults to True
- sample_all(start_spread: dict[str, Iterable[float]] | None = None, sample_kwargs: dict | None = None, solve_kwargs: dict | None = None)
Sample posterior distribution of all models using MCMC.
- Parameters:
start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None
sample_kwargs (Optional[dict], optional) – Keyword arguments passed to
sample(), defaults to Nonesolve_kwargs (Optional[dict], optional) – Keyword arguments passed to
solve(), defaults to None
- sample_smc_all(sample_kwargs: dict | None = None, solve_kwargs: dict | None = None)
Sample posterior distribution of all models using sequential Monte Carlo.
- Parameters:
sample_kwargs (Optional[dict], optional) – Keyword arguments passed to
sample_smc(), defaults to Nonesolve_kwargs (Optional[dict], optional) – Keyword arguments passed to
solve(), defaults to None
bayes_spec.plots
plots.py - Plotting helper utilities.
Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)
- bayes_spec.plots.plot_pair(trace: InferenceData, var_names: list[str], combine_dims: list[str] | None = None, labeller: MapLabeller | None = None, kind: str = 'scatter', reference_values: dict | None = None, kde_kwargs: dict | None = None, scatter_kwargs: dict | None = None, hexbin_kwargs: dict | None = None, reference_values_kwargs: dict | None = None) Iterable[Axes]
Helper function to generate sample pair plots.
- Parameters:
trace (az.InferenceData) – Samples
var_names (list[str]) – Parameter names to plot
combine_dims (Optional[list[str]]) – Dimensions to combine, by default None == []
labeller (Optional[azl.MapLabeller], optional) – arviz labeler, defaults to None
kind (str) – plot kind, one of “scatter”, “hexbin”, or “kde”, defaults to “scatter”
reference_values – highlight reference values, defaults to None
reference_values – Optional[dict], optional
kde_kwargs – keyword arguments for arviz.plot_kde(), defaults to None
kde_kwargs – Optional[dict], optional
scatter_kwargs – keyword arguments for plt.scatter(), defaults to None
scatter_kwargs – Optional[dict], optional
hexbin_kwargs – keyword arguments for plt.hexbin(), defaults to None
hexbin_kwargs – Optional[dict], optional
reference_values_kwargs – keyword arguments for plt.scatter(), defaults to None
reference_values_kwargs – Optional[dict], optional
- Returns:
matplotlib Axes
- Return type:
Axes
- bayes_spec.plots.plot_predictive(data: dict[str, SpecData], predictive: InferenceData) Iterable[Axes]
Helper function to generate posterior predictive check plots.
- Parameters:
data (dict[str, SpecData]) – Data sets, where the key defines the name of the dataset.
predictive (az.InferenceData) – Predictive samples
- Returns:
matplotlib Axes
- Return type:
Axes
- bayes_spec.plots.plot_traces(posterior: InferenceData, var_names: list[str]) Iterable[Axes]
Helper function to generate trace plots of posterior samples
- Parameters:
posterior (az.InferenceData) – Posterior samples
var_names (list[str]) – Parameters to plot
- Returns:
matplotlib Axes
- Return type:
Axes
bayes_spec.spec_data
spec_data.py - SpecData structure definition
Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)
- class bayes_spec.spec_data.SpecData(spectral: list[float], brightness: list[float], noise: float | list[float], xlabel: str = 'Spectral', ylabel: str = 'Brightness')
Bases:
objectSpecData defines the data structure and utility functions.
- normalize_brightness(x: float) float
Normalize brightness data
- Parameters:
x (float) – Brightness data to normalize
- Returns:
Normalized brightness data
- Return type:
float
- normalize_spectral(x: float) float
Normalize spectral data
- Parameters:
x (float) – Spectral data to normalize
- Returns:
Normalized spectral data
- Return type:
float
- unnormalize_brightness(norm_x: float) float
Un-normalize brighrtness data
- Parameters:
norm_x (float) – Normalized brightness data
- Returns:
Un-normalized brightness data
- Return type:
float
- unnormalize_spectral(norm_x: float) float
Un-normalize spectral data
- Parameters:
norm_x (float) – Normalized spectral data
- Returns:
Un-normalized spectral data
- Return type:
float
bayes_spec.utils
utils.py - Utility functions
Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)
- bayes_spec.utils.gaussian(x: float, amp: float, center: float, fwhm: float) float
Evaluate a Gaussian function
- Parameters:
x (float) – Position at which to evaluate
amp (float) – Gaussian amplitude
center (float) – Gaussian centroid
fwhm (float) – Gaussian full-width at half-maximum
- Returns:
Gaussian evaluated at :param:x
- Return type:
float
Module contents
- class bayes_spec.BaseModel(data: dict[str, SpecData], n_clouds: int, baseline_degree: int = 0, seed: int = 1234, verbose: bool = False)
Bases:
ABCBaseModel defines functions and attributes common to all model definitions.
- add_baseline_priors(prior_baseline_coeffs: dict[str, list[float]] | None = None)
Add baseline priors to the model. The polynomial baseline is evaluated on the normalized data like: baseline_norm = sum_i(coeff[i]/(i+1)**i * spectral_norm**i)
- Parameters:
prior_baseline_coeffs (Optional[dict[str, list[float]]], optional) – Width of normal prior distribution on the normalized baseline polynomial coefficients. Keys are dataset names and values are lists of length baseline_degree+1. If None, use [1.0]*(baseline_degree+1) for each dataset, defaults to None
- abstract add_likelihood(*args, **kwargs)
Must be defined in inhereted class.
- abstract add_priors(*args, **kwargs)
Must be defined in inhereted class.
- property baseline_deterministics: Iterable[str]
Get the deterministic baseline parameter names.
- Returns:
Deterministic baseline parameter names
- Return type:
Iterable[str]
- property baseline_freeRVs: Iterable[str]
Get the free baseline parameter names.
- Returns:
Free baseline parameter names
- Return type:
Iterable[str]
- bic(chain: int | None = None, solution: int | None = None) float
Calculate the Bayesian information criterion at the mean point estimate.
- Parameters:
chain (Optional[int], optional) – Evaluate BIC for this chain using un-clustered posterior samples. If None evaluate across all chains using clustered posterior samples, defaults to None
solution (Optional[int], optional) – Evaluate BIC for this solution. If None use the unique solution if any. If :param:chain is not None, this parameter has no effect, defaults to None
- Returns:
Bayesian information criterion
- Return type:
float
- property cloud_deterministics: Iterable[str]
Get the deterministic cloud parameter names.
- Returns:
Deterministic cloud parameter names
- Return type:
Iterable[str]
- property cloud_freeRVs: Iterable[str]
Get the free cloud parameter names.
- Returns:
Free cloud parameter names
- Return type:
Iterable[str]
- fit(n: int = 1000000, draws: int = 1000, rel_tolerance: float = 0.01, abs_tolerance: float = 0.01, learning_rate: float = 0.001, obj_n_mc: int = 5, start: dict | None = None, **kwargs)
Approximate posterior distribution using Variational Inference (VI).
- Parameters:
n (int, optional) – Number of VI iterations, defaults to 1_000_000
draws (int, optional) – Number of posterior samples to draw, defaults to 1_000
rel_tolerance (float, optional) – Relative parameter tolerance for VI convergence, defaults to 0.01
abs_tolerance (float, optional) – Absolute parameter tolerance for VI convergence, defaults to 0.01
learning_rate (float, optional) – VI learning rate, defaults to 1e-3
obj_n_mc (int, optional) – Number of Monte Carlo gradient samples, defaults to 5
start (Optional[dict], optional) – Starting point, defaults to None
**kwargs – Additional arguments passed to
advi.fit()
- graph() Source
Generate visualization of the model graph. The output can be displayed in-line in a Jupyter notebook, or rendered with graph().render(‘filename’).
- Returns:
Graph visualization
- Return type:
graphviz.sources.Source
- property hyper_deterministics: Iterable[str]
Get the deterministic hyper parameter names.
- Returns:
Deterministic hyper parameter names
- Return type:
Iterable[str]
- property hyper_freeRVs: Iterable[str]
Get the free hyper parameter names.
- Returns:
Free hyper parameter names
- Return type:
Iterable[str]
- property labeller: MapLabeller
Get the arviz labeller.
- Returns:
Arviz labeller
- Return type:
azl.MapLabeller
- mean_lnlike(chain: int | None = None, solution: int | None = None) float
Evaluate mean log-likelihood over posterior samples.
- Parameters:
chain (Optional[int], optional) – Evaluate mean log-likelihood for this chain using un-clustered posterior samples. If None evaluate across all chains using clustered posterior samples, defaults to None
solution (Optional[int], optional) – Evaluate mean log-likelihood for this solution. If None use the unique solution if any. If :param:chain is not None, this parameter has no effect, defaults to None
- Returns:
Mean log-likelihood over posterior samples
- Return type:
float
- null_bic() float
Evaluate the Bayesian Information Criterion for the null hypothesis (baseline only, no clouds)
- Returns:
Null hypothesis BIC
- Return type:
float
- predict_baseline(baseline_params: dict[str, list[float]] | None = None) dict[str, list[float]]
Predict the un-normalized baseline model.
- Parameters:
baseline_params (Optional[dict[str, list[float]]], optional) – Dictionary of baseline parameters with which to evaluate the baseline model. Keys are the same as in the model: “baseline_{key}_norm”, where {key} are the supplied datasets. The values are lists of length baseline_degree+1. If None, evaluate the baseline model using the current model state, defaults to None
- Returns:
Un-normalized baseline models for each dataset. Keys are dataset names and values are the un-normalized baseline models.
- Return type:
dict[str, list[float]]
- reset_results()
Reset results and convergence checks.
- sample(init: str = 'advi+adapt_diag', n_init: int = 1000000, chains: int = 4, init_kwargs: dict | None = None, nuts_kwargs: dict | None = None, **kwargs)
Sample posterior distribution using MCMC.
- Parameters:
init (str, optional) – Initialization strategy, defaults to “advi+adapt_diag”
n_init (int, optional) – Number of initialization iterations, defaults to 1_000_000
chains (int, optional) – Number of independent Markov chains, defaults to 4
init_kwargs (Optional[dict], optional) – Keyword arguments passed to
init_nuts(), defaults to Nonenuts_kwargs (Optional[dict], optional) – Keyword arguments passed to
pymc.NUTS(), defaults to None**kwargs – Additional arguments passed to
pymc.sample()
- sample_posterior_predictive(solution: int | None = None, thin: int = 100) InferenceData
Generate posterior predictive samples
- Parameters:
solution (Optional[int], optional) – Draw posterior predictive samples from this solution index. If None, draw samples from the un-clustered posterior samples, defaults to None
thin (int, optional) – Thin posterior samples by keeping one in :param:thin, defaults to 100
- Raises:
ValueError – No posterior samples
- Returns:
Posterior predictive samples
- Return type:
az.InferenceData
- sample_prior_predictive(samples: int = 50) InferenceData
Generate prior predictive samples
- Parameters:
samples (int, optional) – Number of prior predictive samples to draw, defaults to 50
- Returns:
Prior predictive samples
- Return type:
az.InferenceData
- sample_smc(**kwargs)
Sample posterior distribution using Sequential Monte Carlo.
- Parameters:
**kwargs – Additional arguments passed to
pymc.sample_smc()
- solve(**kwargs)
Identify unique solutions and break the labeling degeneracy. Adds new groups to the trace called solution_{idx} with the label-corrected posterior samples of each unique solution.
- Parameters:
kwargs – Keyword arguments passed to
cluster_posterior()
- property unique_solution: bool
Check if posterior samples suggest a unique solution.
- Raises:
ValueError – No solutions
- Returns:
True if there is a unique solution, False otherwise
- Return type:
bool
- class bayes_spec.Optimize(model_type: Type[BaseModel], *args, max_n_clouds: int = 5, verbose: bool = False, **kwargs)
Bases:
objectOptimize class definition
- add_likelihood(*args, **kwargs)
Add likelihood to the models
- Parameters:
*args – Arguments passed to
model.add_likelihood()**kwargs – Keyword arguments passed to
model.add_likelihood()
- add_priors(*args, **kwargs)
Add priors to the models
- Parameters:
*args – Arguments passed to
model.add_priors()**kwargs – Keyword arguments passed to
model.add_priors()
- property bics: dict[int, float]
Return the Bayesian Information Criteria for the best solution of each model.
- Returns:
BIC for each model, indexed by the number of clouds
- Return type:
dict[int, float]
- fit_all(start_spread: dict[str, Iterable[float]] | None = None, **kwargs)
Fit all models using variational inference.
- Parameters:
start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None
**kwargs – Keyword arguments passed to
model.fit()
- property null_bic: float
Evaluate the Bayesian Information Criterion for the null hypothesis (baseline only, no clouds)
- Returns:
Null hypothesis BIC
- Return type:
float
- optimize(bic_threshold: float = 10.0, fit_kwargs: dict | None = None, sample_kwargs: dict | None = None, solve_kwargs: dict | None = None, start_spread: dict[str, Iterable[float]] | None = None, smc: bool = False, approx: bool = True)
Determine optimal number of clouds by minimizing the Bayesian Information Criterion using MCMC, Sequntial Monte Carlo, or Variational Inference. Models are sampled in sequential order starting with n_clouds = 1 until the stopping criteria are met twice in succession. Then, if approx=True, sample the best model using MCMC or SMC and solve the labeling degeneracy. Stopping criteria are: 1. Model did not converge 2. Model has multiple solutions (excludeing VI results, which only have one chain) 3. BIC did not improve by more than bic_threshold over previous model
- Parameters:
bic_threshold (float, optional) – The best_model is the first with BIC within min(BIC)+bic_threshold, defaults to 10.0
fit_kwargs (Optional[dict], optional) – Keyword arguments passed to
fit(), defaults to Nonesample_kwargs (Optional[dict], optional) – Keyword arguments passed to
sample(), defaults to Nonesolve_kwargs (Optional[dict], optional) – Keyword arguments passed to
solve(), defaults to Nonestart_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None
smc (bool, optional) – If True, sample all models using SMC, defaults to False
approx (bool, optional) – If True, approximate all models using VI, defaults to True
- sample_all(start_spread: dict[str, Iterable[float]] | None = None, sample_kwargs: dict | None = None, solve_kwargs: dict | None = None)
Sample posterior distribution of all models using MCMC.
- Parameters:
start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None
sample_kwargs (Optional[dict], optional) – Keyword arguments passed to
sample(), defaults to Nonesolve_kwargs (Optional[dict], optional) – Keyword arguments passed to
solve(), defaults to None
- sample_smc_all(sample_kwargs: dict | None = None, solve_kwargs: dict | None = None)
Sample posterior distribution of all models using sequential Monte Carlo.
- Parameters:
sample_kwargs (Optional[dict], optional) – Keyword arguments passed to
sample_smc(), defaults to Nonesolve_kwargs (Optional[dict], optional) – Keyword arguments passed to
solve(), defaults to None
- class bayes_spec.SpecData(spectral: list[float], brightness: list[float], noise: float | list[float], xlabel: str = 'Spectral', ylabel: str = 'Brightness')
Bases:
objectSpecData defines the data structure and utility functions.
- normalize_brightness(x: float) float
Normalize brightness data
- Parameters:
x (float) – Brightness data to normalize
- Returns:
Normalized brightness data
- Return type:
float
- normalize_spectral(x: float) float
Normalize spectral data
- Parameters:
x (float) – Spectral data to normalize
- Returns:
Normalized spectral data
- Return type:
float
- unnormalize_brightness(norm_x: float) float
Un-normalize brighrtness data
- Parameters:
norm_x (float) – Normalized brightness data
- Returns:
Un-normalized brightness data
- Return type:
float
- unnormalize_spectral(norm_x: float) float
Un-normalize spectral data
- Parameters:
norm_x (float) – Normalized spectral data
- Returns:
Un-normalized spectral data
- Return type:
float