bayes_spec

Subpackages

Submodules

bayes_spec.base_model

base_model.py - BaseModel definition

Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)

class bayes_spec.base_model.BaseModel(data: dict[str, SpecData], n_clouds: int, baseline_degree: int = 0, seed: int = 1234, verbose: bool = False)

Bases: ABC

BaseModel defines functions and attributes common to all model definitions.

add_baseline_priors(prior_baseline_coeffs: dict[str, list[float]] | None = None)

Add baseline priors to the model. The polynomial baseline is evaluated on the normalized data like: baseline_norm = sum_i(coeff[i]/(i+1)**i * spectral_norm**i)

Parameters:

prior_baseline_coeffs (Optional[dict[str, list[float]]], optional) – Width of normal prior distribution on the normalized baseline polynomial coefficients. Keys are dataset names and values are lists of length baseline_degree+1. If None, use [1.0]*(baseline_degree+1) for each dataset, defaults to None

abstract add_likelihood(*args, **kwargs)

Must be defined in inhereted class.

abstract add_priors(*args, **kwargs)

Must be defined in inhereted class.

property baseline_deterministics: Iterable[str]

Get the deterministic baseline parameter names.

Returns:

Deterministic baseline parameter names

Return type:

Iterable[str]

property baseline_freeRVs: Iterable[str]

Get the free baseline parameter names.

Returns:

Free baseline parameter names

Return type:

Iterable[str]

bic(chain: int | None = None, solution: int | None = None) float

Calculate the Bayesian information criterion at the mean point estimate.

Parameters:
  • chain (Optional[int], optional) – Evaluate BIC for this chain using un-clustered posterior samples. If None evaluate across all chains using clustered posterior samples, defaults to None

  • solution (Optional[int], optional) – Evaluate BIC for this solution. If None use the unique solution if any. If :param:chain is not None, this parameter has no effect, defaults to None

Returns:

Bayesian information criterion

Return type:

float

property cloud_deterministics: Iterable[str]

Get the deterministic cloud parameter names.

Returns:

Deterministic cloud parameter names

Return type:

Iterable[str]

property cloud_freeRVs: Iterable[str]

Get the free cloud parameter names.

Returns:

Free cloud parameter names

Return type:

Iterable[str]

fit(n: int = 1000000, draws: int = 1000, rel_tolerance: float = 0.01, abs_tolerance: float = 0.01, learning_rate: float = 0.001, obj_n_mc: int = 5, start: dict | None = None, **kwargs)

Approximate posterior distribution using Variational Inference (VI).

Parameters:
  • n (int, optional) – Number of VI iterations, defaults to 1_000_000

  • draws (int, optional) – Number of posterior samples to draw, defaults to 1_000

  • rel_tolerance (float, optional) – Relative parameter tolerance for VI convergence, defaults to 0.01

  • abs_tolerance (float, optional) – Absolute parameter tolerance for VI convergence, defaults to 0.01

  • learning_rate (float, optional) – VI learning rate, defaults to 1e-3

  • obj_n_mc (int, optional) – Number of Monte Carlo gradient samples, defaults to 5

  • start (Optional[dict], optional) – Starting point, defaults to None

  • **kwargs – Additional arguments passed to advi.fit()

graph() Source

Generate visualization of the model graph. The output can be displayed in-line in a Jupyter notebook, or rendered with graph().render(‘filename’).

Returns:

Graph visualization

Return type:

graphviz.sources.Source

property hyper_deterministics: Iterable[str]

Get the deterministic hyper parameter names.

Returns:

Deterministic hyper parameter names

Return type:

Iterable[str]

property hyper_freeRVs: Iterable[str]

Get the free hyper parameter names.

Returns:

Free hyper parameter names

Return type:

Iterable[str]

property labeller: MapLabeller

Get the arviz labeller.

Returns:

Arviz labeller

Return type:

azl.MapLabeller

mean_lnlike(chain: int | None = None, solution: int | None = None) float

Evaluate mean log-likelihood over posterior samples.

Parameters:
  • chain (Optional[int], optional) – Evaluate mean log-likelihood for this chain using un-clustered posterior samples. If None evaluate across all chains using clustered posterior samples, defaults to None

  • solution (Optional[int], optional) – Evaluate mean log-likelihood for this solution. If None use the unique solution if any. If :param:chain is not None, this parameter has no effect, defaults to None

Returns:

Mean log-likelihood over posterior samples

Return type:

float

null_bic() float

Evaluate the Bayesian Information Criterion for the null hypothesis (baseline only, no clouds)

Returns:

Null hypothesis BIC

Return type:

float

predict_baseline(baseline_params: dict[str, list[float]] | None = None) dict[str, list[float]]

Predict the un-normalized baseline model.

Parameters:

baseline_params (Optional[dict[str, list[float]]], optional) – Dictionary of baseline parameters with which to evaluate the baseline model. Keys are the same as in the model: “baseline_{key}_norm”, where {key} are the supplied datasets. The values are lists of length baseline_degree+1. If None, evaluate the baseline model using the current model state, defaults to None

Returns:

Un-normalized baseline models for each dataset. Keys are dataset names and values are the un-normalized baseline models.

Return type:

dict[str, list[float]]

reset_results()

Reset results and convergence checks.

sample(init: str = 'advi+adapt_diag', n_init: int = 1000000, chains: int = 4, init_kwargs: dict | None = None, nuts_kwargs: dict | None = None, **kwargs)

Sample posterior distribution using MCMC.

Parameters:
  • init (str, optional) – Initialization strategy, defaults to “advi+adapt_diag”

  • n_init (int, optional) – Number of initialization iterations, defaults to 1_000_000

  • chains (int, optional) – Number of independent Markov chains, defaults to 4

  • init_kwargs (Optional[dict], optional) – Keyword arguments passed to init_nuts(), defaults to None

  • nuts_kwargs (Optional[dict], optional) – Keyword arguments passed to pymc.NUTS(), defaults to None

  • **kwargs – Additional arguments passed to pymc.sample()

sample_posterior_predictive(solution: int | None = None, thin: int = 100) InferenceData

Generate posterior predictive samples

Parameters:
  • solution (Optional[int], optional) – Draw posterior predictive samples from this solution index. If None, draw samples from the un-clustered posterior samples, defaults to None

  • thin (int, optional) – Thin posterior samples by keeping one in :param:thin, defaults to 100

Raises:

ValueError – No posterior samples

Returns:

Posterior predictive samples

Return type:

az.InferenceData

sample_prior_predictive(samples: int = 50) InferenceData

Generate prior predictive samples

Parameters:

samples (int, optional) – Number of prior predictive samples to draw, defaults to 50

Returns:

Prior predictive samples

Return type:

az.InferenceData

sample_smc(**kwargs)

Sample posterior distribution using Sequential Monte Carlo.

Parameters:

**kwargs – Additional arguments passed to pymc.sample_smc()

solve(**kwargs)

Identify unique solutions and break the labeling degeneracy. Adds new groups to the trace called solution_{idx} with the label-corrected posterior samples of each unique solution.

Parameters:

kwargs – Keyword arguments passed to cluster_posterior()

property unique_solution: bool

Check if posterior samples suggest a unique solution.

Raises:

ValueError – No solutions

Returns:

True if there is a unique solution, False otherwise

Return type:

bool

bayes_spec.cluster_posterior

cluster_posterior.py - Utilities for clustering posterior samples with Gaussian Mixture Models.

Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)

bayes_spec.cluster_posterior.cluster_posterior(trace: InferenceData, n_clusters: int, cluster_features: Iterable[str], num_gmm_samples: int = 10000, max_iter: int = 1000, init_params: str = 'random', n_init: int = 10, kl_div_threshold: float = 0.1, seed: int = 1234) list

Identify unique solutions and break the labeling degeneracy. To do so, we (1) fit a Gaussian Mixture Model (GMM) to the posterior samples of each chain individually. (2) calculate the Kullback–Leibler (KL) divergence (mean log-likelihood ratio) between chains. If the KD divergence is smaller than the given threshold, then both chains are part of the same solution. Otherwise, then each chain belongs to a different solution. The KL divergence is calculated from samples drawn from the fitted GMMs following the Monte Carlo procedure of Hershey & Olson (2007) (3) solve the labeling degeneracy by identifying the most common order of components among chains in each solution.

Parameters:
  • trace (az.InferenceData) – Posterior samples

  • n_clusters (int) – Number of GMM clusters

  • cluster_features (Iterable[str]) – Parameter names to use for clustering

  • num_gmm_samples (int, optional) – Number of samples to generate from Gaussian Mixture Model (GMM), defaults to 10_000

  • max_iter (int, optional) – Maximum number of GMM iterations, defaults to 1_000

  • init_params (str, optional) – GMM initialization strategy, defaults to “random”

  • n_init (int, optional) – Number of GMM initializations, defaults to 10

  • kl_div_threshold (float, optional) – Kullback-Liebler (KL) divergence threshold, defaults to 0.1

  • seed (int, optional) – Random seed, defaults to 1234

Returns:

Solutions, where each element is a dictionary containing posterior samples and other statistics

Return type:

list

bayes_spec.nuts

nuts.py - customize pymc’s NUTS initialization

Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)

bayes_spec.nuts.init_nuts(model: Model, init: str = 'advi+adapt_diag', n_init: int = 100000, chains: int = 4, rel_tolerance: float = 0.001, abs_tolerance: float = 0.001, learning_rate: float = 0.001, obj_n_mc: int = 5, start: dict | None = None, nuts_kwargs: dict = None, seed: int = 1234, verbose: bool = False) tuple[list, NUTS]

Custom NUTS initialization.

Parameters:
  • model (pm.Model) – Model to initialize

  • init (str, optional) – Initialization strategy, defaults to “advi+adapt_diag”

  • n_init (int, optional) – Number of initialization iterations, defaults to 100_000

  • chains (int, optional) – Number of independent Markov chains, defaults to 4

  • rel_tolerance (float, optional) – VI relative convergence threshold, defaults to 0.001

  • abs_tolerance (float, optional) – VI absolute convergence threshold, defaults to 0.001

  • learning_rate (float, optional) – VI learning rate, defaults to 1e-3

  • obj_n_mc (int, optional) – Number of Monte Carlo gradient samples, defaults to 5

  • start (Optional[dict], optional) – Starting point, defaults to None

  • nuts_kwargs (dict, optional) – Additional keyword arguments passed to pm.NUTS, defaults to None

  • seed (int, optional) – Random seed, defaults to 1234

  • verbose (bool, optional) – Verbose output, defaults to False

Returns:

Initial point and step method

Return type:

tuple[list, pm.NUTS]

bayes_spec.optimize

optimize.py - Fit spectra with MCMC and determine optimal number of spectral components.

Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)

class bayes_spec.optimize.Optimize(model_type: Type[BaseModel], *args, max_n_clouds: int = 5, verbose: bool = False, **kwargs)

Bases: object

Optimize class definition

add_likelihood(*args, **kwargs)

Add likelihood to the models

Parameters:
  • *args – Arguments passed to model.add_likelihood()

  • **kwargs – Keyword arguments passed to model.add_likelihood()

add_priors(*args, **kwargs)

Add priors to the models

Parameters:
  • *args – Arguments passed to model.add_priors()

  • **kwargs – Keyword arguments passed to model.add_priors()

property bics: dict[int, float]

Return the Bayesian Information Criteria for the best solution of each model.

Returns:

BIC for each model, indexed by the number of clouds

Return type:

dict[int, float]

fit_all(start_spread: dict[str, Iterable[float]] | None = None, **kwargs)

Fit all models using variational inference.

Parameters:
  • start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None

  • **kwargs – Keyword arguments passed to model.fit()

property null_bic: float

Evaluate the Bayesian Information Criterion for the null hypothesis (baseline only, no clouds)

Returns:

Null hypothesis BIC

Return type:

float

optimize(bic_threshold: float = 10.0, fit_kwargs: dict | None = None, sample_kwargs: dict | None = None, solve_kwargs: dict | None = None, start_spread: dict[str, Iterable[float]] | None = None, smc: bool = False, approx: bool = True)

Determine optimal number of clouds by minimizing the Bayesian Information Criterion using MCMC, Sequntial Monte Carlo, or Variational Inference. Models are sampled in sequential order starting with n_clouds = 1 until the stopping criteria are met twice in succession. Then, if approx=True, sample the best model using MCMC or SMC and solve the labeling degeneracy. Stopping criteria are: 1. Model did not converge 2. Model has multiple solutions (excludeing VI results, which only have one chain) 3. BIC did not improve by more than bic_threshold over previous model

Parameters:
  • bic_threshold (float, optional) – The best_model is the first with BIC within min(BIC)+bic_threshold, defaults to 10.0

  • fit_kwargs (Optional[dict], optional) – Keyword arguments passed to fit(), defaults to None

  • sample_kwargs (Optional[dict], optional) – Keyword arguments passed to sample(), defaults to None

  • solve_kwargs (Optional[dict], optional) – Keyword arguments passed to solve(), defaults to None

  • start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None

  • smc (bool, optional) – If True, sample all models using SMC, defaults to False

  • approx (bool, optional) – If True, approximate all models using VI, defaults to True

sample_all(start_spread: dict[str, Iterable[float]] | None = None, sample_kwargs: dict | None = None, solve_kwargs: dict | None = None)

Sample posterior distribution of all models using MCMC.

Parameters:
  • start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None

  • sample_kwargs (Optional[dict], optional) – Keyword arguments passed to sample(), defaults to None

  • solve_kwargs (Optional[dict], optional) – Keyword arguments passed to solve(), defaults to None

sample_smc_all(sample_kwargs: dict | None = None, solve_kwargs: dict | None = None)

Sample posterior distribution of all models using sequential Monte Carlo.

Parameters:
  • sample_kwargs (Optional[dict], optional) – Keyword arguments passed to sample_smc(), defaults to None

  • solve_kwargs (Optional[dict], optional) – Keyword arguments passed to solve(), defaults to None

bayes_spec.plots

plots.py - Plotting helper utilities.

Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)

bayes_spec.plots.plot_pair(trace: InferenceData, var_names: list[str], combine_dims: list[str] | None = None, labeller: MapLabeller | None = None, kind: str = 'scatter', reference_values: dict | None = None, kde_kwargs: dict | None = None, scatter_kwargs: dict | None = None, hexbin_kwargs: dict | None = None, reference_values_kwargs: dict | None = None) Iterable[Axes]

Helper function to generate sample pair plots.

Parameters:
  • trace (az.InferenceData) – Samples

  • var_names (list[str]) – Parameter names to plot

  • combine_dims (Optional[list[str]]) – Dimensions to combine, by default None == []

  • labeller (Optional[azl.MapLabeller], optional) – arviz labeler, defaults to None

  • kind (str) – plot kind, one of “scatter”, “hexbin”, or “kde”, defaults to “scatter”

  • reference_values – highlight reference values, defaults to None

  • reference_values – Optional[dict], optional

  • kde_kwargs – keyword arguments for arviz.plot_kde(), defaults to None

  • kde_kwargs – Optional[dict], optional

  • scatter_kwargs – keyword arguments for plt.scatter(), defaults to None

  • scatter_kwargs – Optional[dict], optional

  • hexbin_kwargs – keyword arguments for plt.hexbin(), defaults to None

  • hexbin_kwargs – Optional[dict], optional

  • reference_values_kwargs – keyword arguments for plt.scatter(), defaults to None

  • reference_values_kwargs – Optional[dict], optional

Returns:

matplotlib Axes

Return type:

Axes

bayes_spec.plots.plot_predictive(data: dict[str, SpecData], predictive: InferenceData) Iterable[Axes]

Helper function to generate posterior predictive check plots.

Parameters:
  • data (dict[str, SpecData]) – Data sets, where the key defines the name of the dataset.

  • predictive (az.InferenceData) – Predictive samples

Returns:

matplotlib Axes

Return type:

Axes

bayes_spec.plots.plot_traces(posterior: InferenceData, var_names: list[str]) Iterable[Axes]

Helper function to generate trace plots of posterior samples

Parameters:
  • posterior (az.InferenceData) – Posterior samples

  • var_names (list[str]) – Parameters to plot

Returns:

matplotlib Axes

Return type:

Axes

bayes_spec.spec_data

spec_data.py - SpecData structure definition

Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)

class bayes_spec.spec_data.SpecData(spectral: list[float], brightness: list[float], noise: float | list[float], xlabel: str = 'Spectral', ylabel: str = 'Brightness')

Bases: object

SpecData defines the data structure and utility functions.

normalize_brightness(x: float) float

Normalize brightness data

Parameters:

x (float) – Brightness data to normalize

Returns:

Normalized brightness data

Return type:

float

normalize_spectral(x: float) float

Normalize spectral data

Parameters:

x (float) – Spectral data to normalize

Returns:

Normalized spectral data

Return type:

float

unnormalize_brightness(norm_x: float) float

Un-normalize brighrtness data

Parameters:

norm_x (float) – Normalized brightness data

Returns:

Un-normalized brightness data

Return type:

float

unnormalize_spectral(norm_x: float) float

Un-normalize spectral data

Parameters:

norm_x (float) – Normalized spectral data

Returns:

Un-normalized spectral data

Return type:

float

bayes_spec.utils

utils.py - Utility functions

Copyright(C) 2024 by Trey V. Wenger; tvwenger@gmail.com This code is licensed under MIT license (see LICENSE for details)

bayes_spec.utils.gaussian(x: float, amp: float, center: float, fwhm: float) float

Evaluate a Gaussian function

Parameters:
  • x (float) – Position at which to evaluate

  • amp (float) – Gaussian amplitude

  • center (float) – Gaussian centroid

  • fwhm (float) – Gaussian full-width at half-maximum

Returns:

Gaussian evaluated at :param:x

Return type:

float

Module contents

class bayes_spec.BaseModel(data: dict[str, SpecData], n_clouds: int, baseline_degree: int = 0, seed: int = 1234, verbose: bool = False)

Bases: ABC

BaseModel defines functions and attributes common to all model definitions.

add_baseline_priors(prior_baseline_coeffs: dict[str, list[float]] | None = None)

Add baseline priors to the model. The polynomial baseline is evaluated on the normalized data like: baseline_norm = sum_i(coeff[i]/(i+1)**i * spectral_norm**i)

Parameters:

prior_baseline_coeffs (Optional[dict[str, list[float]]], optional) – Width of normal prior distribution on the normalized baseline polynomial coefficients. Keys are dataset names and values are lists of length baseline_degree+1. If None, use [1.0]*(baseline_degree+1) for each dataset, defaults to None

abstract add_likelihood(*args, **kwargs)

Must be defined in inhereted class.

abstract add_priors(*args, **kwargs)

Must be defined in inhereted class.

property baseline_deterministics: Iterable[str]

Get the deterministic baseline parameter names.

Returns:

Deterministic baseline parameter names

Return type:

Iterable[str]

property baseline_freeRVs: Iterable[str]

Get the free baseline parameter names.

Returns:

Free baseline parameter names

Return type:

Iterable[str]

bic(chain: int | None = None, solution: int | None = None) float

Calculate the Bayesian information criterion at the mean point estimate.

Parameters:
  • chain (Optional[int], optional) – Evaluate BIC for this chain using un-clustered posterior samples. If None evaluate across all chains using clustered posterior samples, defaults to None

  • solution (Optional[int], optional) – Evaluate BIC for this solution. If None use the unique solution if any. If :param:chain is not None, this parameter has no effect, defaults to None

Returns:

Bayesian information criterion

Return type:

float

property cloud_deterministics: Iterable[str]

Get the deterministic cloud parameter names.

Returns:

Deterministic cloud parameter names

Return type:

Iterable[str]

property cloud_freeRVs: Iterable[str]

Get the free cloud parameter names.

Returns:

Free cloud parameter names

Return type:

Iterable[str]

fit(n: int = 1000000, draws: int = 1000, rel_tolerance: float = 0.01, abs_tolerance: float = 0.01, learning_rate: float = 0.001, obj_n_mc: int = 5, start: dict | None = None, **kwargs)

Approximate posterior distribution using Variational Inference (VI).

Parameters:
  • n (int, optional) – Number of VI iterations, defaults to 1_000_000

  • draws (int, optional) – Number of posterior samples to draw, defaults to 1_000

  • rel_tolerance (float, optional) – Relative parameter tolerance for VI convergence, defaults to 0.01

  • abs_tolerance (float, optional) – Absolute parameter tolerance for VI convergence, defaults to 0.01

  • learning_rate (float, optional) – VI learning rate, defaults to 1e-3

  • obj_n_mc (int, optional) – Number of Monte Carlo gradient samples, defaults to 5

  • start (Optional[dict], optional) – Starting point, defaults to None

  • **kwargs – Additional arguments passed to advi.fit()

graph() Source

Generate visualization of the model graph. The output can be displayed in-line in a Jupyter notebook, or rendered with graph().render(‘filename’).

Returns:

Graph visualization

Return type:

graphviz.sources.Source

property hyper_deterministics: Iterable[str]

Get the deterministic hyper parameter names.

Returns:

Deterministic hyper parameter names

Return type:

Iterable[str]

property hyper_freeRVs: Iterable[str]

Get the free hyper parameter names.

Returns:

Free hyper parameter names

Return type:

Iterable[str]

property labeller: MapLabeller

Get the arviz labeller.

Returns:

Arviz labeller

Return type:

azl.MapLabeller

mean_lnlike(chain: int | None = None, solution: int | None = None) float

Evaluate mean log-likelihood over posterior samples.

Parameters:
  • chain (Optional[int], optional) – Evaluate mean log-likelihood for this chain using un-clustered posterior samples. If None evaluate across all chains using clustered posterior samples, defaults to None

  • solution (Optional[int], optional) – Evaluate mean log-likelihood for this solution. If None use the unique solution if any. If :param:chain is not None, this parameter has no effect, defaults to None

Returns:

Mean log-likelihood over posterior samples

Return type:

float

null_bic() float

Evaluate the Bayesian Information Criterion for the null hypothesis (baseline only, no clouds)

Returns:

Null hypothesis BIC

Return type:

float

predict_baseline(baseline_params: dict[str, list[float]] | None = None) dict[str, list[float]]

Predict the un-normalized baseline model.

Parameters:

baseline_params (Optional[dict[str, list[float]]], optional) – Dictionary of baseline parameters with which to evaluate the baseline model. Keys are the same as in the model: “baseline_{key}_norm”, where {key} are the supplied datasets. The values are lists of length baseline_degree+1. If None, evaluate the baseline model using the current model state, defaults to None

Returns:

Un-normalized baseline models for each dataset. Keys are dataset names and values are the un-normalized baseline models.

Return type:

dict[str, list[float]]

reset_results()

Reset results and convergence checks.

sample(init: str = 'advi+adapt_diag', n_init: int = 1000000, chains: int = 4, init_kwargs: dict | None = None, nuts_kwargs: dict | None = None, **kwargs)

Sample posterior distribution using MCMC.

Parameters:
  • init (str, optional) – Initialization strategy, defaults to “advi+adapt_diag”

  • n_init (int, optional) – Number of initialization iterations, defaults to 1_000_000

  • chains (int, optional) – Number of independent Markov chains, defaults to 4

  • init_kwargs (Optional[dict], optional) – Keyword arguments passed to init_nuts(), defaults to None

  • nuts_kwargs (Optional[dict], optional) – Keyword arguments passed to pymc.NUTS(), defaults to None

  • **kwargs – Additional arguments passed to pymc.sample()

sample_posterior_predictive(solution: int | None = None, thin: int = 100) InferenceData

Generate posterior predictive samples

Parameters:
  • solution (Optional[int], optional) – Draw posterior predictive samples from this solution index. If None, draw samples from the un-clustered posterior samples, defaults to None

  • thin (int, optional) – Thin posterior samples by keeping one in :param:thin, defaults to 100

Raises:

ValueError – No posterior samples

Returns:

Posterior predictive samples

Return type:

az.InferenceData

sample_prior_predictive(samples: int = 50) InferenceData

Generate prior predictive samples

Parameters:

samples (int, optional) – Number of prior predictive samples to draw, defaults to 50

Returns:

Prior predictive samples

Return type:

az.InferenceData

sample_smc(**kwargs)

Sample posterior distribution using Sequential Monte Carlo.

Parameters:

**kwargs – Additional arguments passed to pymc.sample_smc()

solve(**kwargs)

Identify unique solutions and break the labeling degeneracy. Adds new groups to the trace called solution_{idx} with the label-corrected posterior samples of each unique solution.

Parameters:

kwargs – Keyword arguments passed to cluster_posterior()

property unique_solution: bool

Check if posterior samples suggest a unique solution.

Raises:

ValueError – No solutions

Returns:

True if there is a unique solution, False otherwise

Return type:

bool

class bayes_spec.Optimize(model_type: Type[BaseModel], *args, max_n_clouds: int = 5, verbose: bool = False, **kwargs)

Bases: object

Optimize class definition

add_likelihood(*args, **kwargs)

Add likelihood to the models

Parameters:
  • *args – Arguments passed to model.add_likelihood()

  • **kwargs – Keyword arguments passed to model.add_likelihood()

add_priors(*args, **kwargs)

Add priors to the models

Parameters:
  • *args – Arguments passed to model.add_priors()

  • **kwargs – Keyword arguments passed to model.add_priors()

property bics: dict[int, float]

Return the Bayesian Information Criteria for the best solution of each model.

Returns:

BIC for each model, indexed by the number of clouds

Return type:

dict[int, float]

fit_all(start_spread: dict[str, Iterable[float]] | None = None, **kwargs)

Fit all models using variational inference.

Parameters:
  • start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None

  • **kwargs – Keyword arguments passed to model.fit()

property null_bic: float

Evaluate the Bayesian Information Criterion for the null hypothesis (baseline only, no clouds)

Returns:

Null hypothesis BIC

Return type:

float

optimize(bic_threshold: float = 10.0, fit_kwargs: dict | None = None, sample_kwargs: dict | None = None, solve_kwargs: dict | None = None, start_spread: dict[str, Iterable[float]] | None = None, smc: bool = False, approx: bool = True)

Determine optimal number of clouds by minimizing the Bayesian Information Criterion using MCMC, Sequntial Monte Carlo, or Variational Inference. Models are sampled in sequential order starting with n_clouds = 1 until the stopping criteria are met twice in succession. Then, if approx=True, sample the best model using MCMC or SMC and solve the labeling degeneracy. Stopping criteria are: 1. Model did not converge 2. Model has multiple solutions (excludeing VI results, which only have one chain) 3. BIC did not improve by more than bic_threshold over previous model

Parameters:
  • bic_threshold (float, optional) – The best_model is the first with BIC within min(BIC)+bic_threshold, defaults to 10.0

  • fit_kwargs (Optional[dict], optional) – Keyword arguments passed to fit(), defaults to None

  • sample_kwargs (Optional[dict], optional) – Keyword arguments passed to sample(), defaults to None

  • solve_kwargs (Optional[dict], optional) – Keyword arguments passed to solve(), defaults to None

  • start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None

  • smc (bool, optional) – If True, sample all models using SMC, defaults to False

  • approx (bool, optional) – If True, approximate all models using VI, defaults to True

sample_all(start_spread: dict[str, Iterable[float]] | None = None, sample_kwargs: dict | None = None, solve_kwargs: dict | None = None)

Sample posterior distribution of all models using MCMC.

Parameters:
  • start_spread (Optional[dict[str, Iterable[float]]], optional) – Keys are parameter names and values are range, defaults to None

  • sample_kwargs (Optional[dict], optional) – Keyword arguments passed to sample(), defaults to None

  • solve_kwargs (Optional[dict], optional) – Keyword arguments passed to solve(), defaults to None

sample_smc_all(sample_kwargs: dict | None = None, solve_kwargs: dict | None = None)

Sample posterior distribution of all models using sequential Monte Carlo.

Parameters:
  • sample_kwargs (Optional[dict], optional) – Keyword arguments passed to sample_smc(), defaults to None

  • solve_kwargs (Optional[dict], optional) – Keyword arguments passed to solve(), defaults to None

class bayes_spec.SpecData(spectral: list[float], brightness: list[float], noise: float | list[float], xlabel: str = 'Spectral', ylabel: str = 'Brightness')

Bases: object

SpecData defines the data structure and utility functions.

normalize_brightness(x: float) float

Normalize brightness data

Parameters:

x (float) – Brightness data to normalize

Returns:

Normalized brightness data

Return type:

float

normalize_spectral(x: float) float

Normalize spectral data

Parameters:

x (float) – Spectral data to normalize

Returns:

Normalized spectral data

Return type:

float

unnormalize_brightness(norm_x: float) float

Un-normalize brighrtness data

Parameters:

norm_x (float) – Normalized brightness data

Returns:

Un-normalized brightness data

Return type:

float

unnormalize_spectral(norm_x: float) float

Un-normalize spectral data

Parameters:

norm_x (float) – Normalized spectral data

Returns:

Un-normalized spectral data

Return type:

float