tensiometer.gaussian_tension¶

This file contains the functions and utilities to compute agreement and disagreement between two different chains using a Gaussian approximation for the posterior.

For more details on the method implemented see arxiv 1806.04649 and arxiv 1912.04880.

tensiometer.gaussian_tension.KL_PCA(chain_1, chain_12, param_names=None, conditional_params=[], param_map=None, normparam=None, num_modes=None, localize=True, dimensional_reduce=True, dimensional_threshold=0.1, verbose=True, **kwargs)[source]¶

Perform the KL analysis of two chains. Directions that chain_2 improves over chain_1.

Parameters:

chain_1 – MCSamples the first input chain.
chain_12 – MCSamples the second input chain.
param_names – list of names of the parameters to use
param_names – (optional) parameter names to restrict the calculation. If none is given the default assumes that all running parameters
conditional_params – (optional) list of parameters to treat as fixed, i.e. for KL_PCA conditional on fixed values of these parameters
param_map – (optional) a transformation to apply to parameter values; A list or string containing either N (no transformation) or L (for log transform) or M (for minus log transform of negative parameters) for each parameter. By default uses log if no parameter values cross zero. The transformed parameters are added to the joint chain.
normparam – (optional) name of parameter to normalize result (i.e. this parameter will have unit power) By default scales to the parameter that has the largest impactr on the KL mode variance.
num_modes – (optional) only return the num_modes best modes.
localize – (optional) localize the first covariance with the second, useful when chain_1 spans a much larger region with respect to chain_12.
dimensional_reduce – (optional) perform dimensional reduction of the KL modes considered keeping only parameters with a large impact on KL mode variances. Default is True.
dimensional_threshold – (optional) threshold for dimensional reducetion. Default is 10% so that parameters with a contribution less than 10% of KL mode variance are neglected from a specific KL mode.
verbose – (optional) chatty output. Default True.

tensiometer.gaussian_tension.Q_DM(chain_1, chain_2, prior_chain=None, param_names=None, cutoff=0.05, prior_factor=1.0)[source]¶

Compute the value and degrees of freedom of the quadratic form giving the probability of a difference between the means of the two input chains, in the Gaussian approximation.

This is defined as in (Raveri and Hu 18) to be:

\[Q_{\rm DM} \equiv (\theta_1-\theta_2) (\mathcal{C}_1+\mathcal{C}_2 -\mathcal{C}_1\mathcal{C}_\Pi^{-1}\mathcal{C}_2 -\mathcal{C}_2\mathcal{C}_\Pi^{-1}\mathcal{C}_1)^{-1} (\theta_1-\theta_2)^T\]

where \(\theta_i\) is the parameter mean of the i-th posterior, \(\mathcal{C}\) the posterior covariance and \(\mathcal{C}_\Pi\) the prior covariance. \(Q_{\rm DM}\) is \(\chi^2\) distributed with number of degrees of freedom equal to the rank of the shift covariance.

Parameters:

chain_1 – MCSamples the first input chain.
chain_2 – MCSamples the second input chain.
prior_chain – (optional) the prior only chain. If the prior is not well approximated by a ranged prior and is informative it is better to explicitly use a prior only chain. If this is not given the algorithm will assume ranged priors with the ranges computed from the input chains.
param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
cutoff – (optional) the algorithms needs to detect prior constrained directions (that do not contribute to the test) from data constrained directions. This is achieved through a Karhunen–Loeve decomposition to avoid issues with physical dimensions of parameters and cutoff sets the minimum improvement with respect to the prior that is used. Default is five percent.
prior_factor – (optional) factor to scale the prior covariance. In case of strongly non-Gaussian posteriors it might be useful to artificially tighten the prior to have less noise in telling apart parameter space directions that are constrained by data and prior. Default is no scaling, prior_factor=1.

Returns:

\(Q_{\rm DM}\) value and number of degrees of freedom. Since \(Q_{\rm DM}\) is \(\chi^2\) distributed the probability to exceed the test can be computed using the cdf method of scipy.stats.chi2 or tensiometer.utilities.from_chi2_to_sigma().

tensiometer.gaussian_tension.Q_DMAP(chain_1, chain_2, chain_12, prior_chain=None, param_names=None, prior_factor=1.0, feedback=True)[source]¶

Compute the value and degrees of freedom of the quadratic form giving the goodness of fit loss measure, in Gaussian approximation.

This is defined as in (Raveri and Hu 18) to be:

\[Q_{\rm DMAP} \equiv Q_{\rm MAP}^{12} -Q_{\rm MAP}^{1} -Q_{\rm MAP}^{2}\]

where \(Q_{\rm MAP}^{12}\) is the joint likelihood at maximum posterior (MAP) and \(Q_{\rm MAP}^{i}\) is the likelihood at MAP for the two single data sets. In Gaussian approximation this is distributed as:

\[Q_{\rm DMAP} \sim \chi^2(N_{\rm eff}^1 + N_{\rm eff}^2 - N_{\rm eff}^{12})\]

where \(N_{\rm eff}\) is the number of effective parameters, as computed by the function tensiometer.gaussian_tension.get_Neff() for the joint and the two single data sets.

Parameters:

chain_1 – MCSamples the first input chain.
chain_2 – MCSamples the second input chain.
chain_12 – MCSamples the joint input chain.
prior_chain – (optional) the prior chain. If the prior is not well approximated by a ranged prior and is informative it is better to explicitly use a prior only chain. If this is not given the algorithm will assume ranged priors with the ranges computed from the input chain.
param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
prior_factor – (optional) factor to scale the prior covariance. In case of strongly non-Gaussian posteriors it might be useful to artificially tighten the prior to have less noise in telling apart parameter space directions that are constrained by data and prior. Default is no scaling, prior_factor=1.
feedback – logical flag to set whether the function should print a warning every time the explicit MAP file is not found. By default this is true.

Returns:

\(Q_{\rm DMAP}\) value and number of degrees of freedom. Since \(Q_{\rm DMAP}\) is \(\chi^2\) distributed the probability to exceed the test can be computed using the cdf method of scipy.stats.chi2 or tensiometer.utilities.from_chi2_to_sigma().

tensiometer.gaussian_tension.Q_MAP(chain, num_data, prior_chain=None, normalization_factor=0.0, prior_factor=1.0, feedback=True)[source]¶

Compute the value and degrees of freedom of the quadratic form giving the goodness of fit measure at maximum posterior (MAP), in Gaussian approximation.

This is defined as in (Raveri and Hu 18) to be:

\[Q_{\rm MAP} \equiv -2\ln \mathcal{L}(\theta_{\rm MAP})\]

where \(\mathcal{L}(\theta_{\rm MAP})\) is the data likelihood evaluated at MAP. In Gaussian approximation this is distributed as:

\[Q_{\rm MAP} \sim \chi^2(d-N_{\rm eff})\]

where \(d\) is the number of data points and \(N_{\rm eff}\) is the number of effective parameters, as computed by the function tensiometer.gaussian_tension.get_Neff().

Parameters:

chain – MCSamples the input chain.
num_data – number of data points.
prior_chain – (optional) the prior chain. If the prior is not well approximated by a ranged prior and is informative it is better to explicitly use a prior only chain. If this is not given the algorithm will assume ranged priors with the ranges computed from the input chain.
normalization_factor – (optional) likelihood normalization factor. This should make the likelihood a chi square.
prior_factor – (optional) factor to scale the prior covariance. In case of strongly non-Gaussian posteriors it might be useful to artificially tighten the prior to have less noise in telling apart parameter space directions that are constrained by data and prior. Default is no scaling, prior_factor=1.
feedback – logical flag to set whether the function should print a warning every time the explicit MAP file is not found. By default this is true.

Returns:

\(Q_{\rm MAP}\) value and number of degrees of freedom. Since \(Q_{\rm MAP}\) is \(\chi^2\) distributed the probability to exceed the test can be computed using the cdf method of scipy.stats.chi2 or tensiometer.utilities.from_chi2_to_sigma().

tensiometer.gaussian_tension.Q_UDM(chain_1, chain_12, lower_cutoff=1.05, upper_cutoff=100.0, param_names=None)[source]¶

Compute the value and degrees of freedom of the quadratic form giving the probability of a difference between the means of the two input chains, in update form with the Gaussian approximation.

This is defined as in (Raveri and Hu 18) to be:

\[Q_{\rm UDM} \equiv (\theta_1-\theta_{12}) (\mathcal{C}_1-\mathcal{C}_{12})^{-1} (\theta_1-\theta_{12})^T\]

where \(\theta_1\) is the parameter mean of the first posterior, \(\theta_{12}\) is the parameter mean of the joint posterior, \(\mathcal{C}\) the posterior covariance and \(\mathcal{C}_\Pi\) the prior covariance. \(Q_{\rm UDM}\) is \(\chi^2\) distributed with number of degrees of freedom equal to the rank of the shift covariance.

In case of uninformative priors the statistical significance of \(Q_{\rm UDM}\) is the same as the one reported by \(Q_{\rm DM}\) but offers likely mitigation against non-Gaussianities of the posterior distribution. In the case where both chains are Gaussian \(Q_{\rm UDM}\) is symmetric if the first input chain is swapped \(1\leftrightarrow 2\). If the input distributions are not Gaussian it is better to use the most constraining chain as the base for the parameter update.

Parameters:

chain_1 – MCSamples the first input chain.
chain_12 – MCSamples the joint input chain.
lower_cutoff – (optional) the algorithms needs to detect prior constrained directions (that do not contribute to the test) from data constrained directions. This is achieved through a Karhunen–Loeve decomposition to avoid issues with physical dimensions of parameters and cutoff sets the minimum improvement with respect to the prior that is used. Default is five percent.
upper_cutoff – (optional) upper cutoff for the selection of KL modes.
param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.

Returns:

\(Q_{\rm UDM}\) value and number of degrees of freedom. Since \(Q_{\rm UDM}\) is \(\chi^2\) distributed the probability to exceed the test can be computed using the cdf method of scipy.stats.chi2 or tensiometer.utilities.from_chi2_to_sigma().

tensiometer.gaussian_tension.Q_UDM_KL_components(chain_1, chain_12, param_names=None)[source]¶

Function that computes the Karhunen–Loeve (KL) decomposition of the covariance of a chain with the covariance of that chain joint with another one. This function is used for the parameter shift algorithm in update form.

Parameters:	chain_1 – `MCSamples` the first input chain. chain_12 – `MCSamples` the joint input chain. param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
Returns:	the KL eigenvalues, the KL eigenvectors and the parameter names that are used, sorted in decreasing order.

tensiometer.gaussian_tension.Q_UDM_covariance_components(chain_1, chain_12, param_names=None, which='1')[source]¶

Compute the decomposition of the covariance matrix in terms of KL modes.

Parameters:

chain_1 – MCSamples the first input chain.
chain_12 – MCSamples the joint input chain.
param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
which – (optional) which decomposition to report. Possibilities are ‘1’ for the chain 1 covariance matrix, ‘2’ for the chain 2 covariance matrix and ‘12’ for the joint covariance matrix.

Returns:

parameter names used in the calculation, values of improvement, fractional covariance matrix and covariance matrix (inverse covariance).

tensiometer.gaussian_tension.Q_UDM_fisher_components(chain_1, chain_12, param_names=None, which='1')[source]¶

Compute the decomposition of the Fisher matrix in terms of KL modes.

Parameters:

chain_1 – MCSamples the first input chain.
chain_12 – MCSamples the joint input chain.
param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
which – (optional) which decomposition to report. Possibilities are ‘1’ for the chain 1 Fisher matrix, ‘2’ for the chain 2 Fisher matrix and ‘12’ for the joint Fisher matrix.

Returns:

parameter names used in the calculation, values of improvement and fractional Fisher matrix.

tensiometer.gaussian_tension.Q_UDM_get_cutoff(chain_1, chain_2, chain_12, prior_chain=None, param_names=None, prior_factor=1.0)[source]¶

Function to estimate the cutoff for the spectrum of parameter differences in update form to match Delta Neff.

Parameters:

chain_1 – MCSamples the first input chain.
chain_2 – MCSamples the second chain that joined with the first one (modulo the prior) should give the joint chain.
chain_12 – MCSamples the joint input chain.
prior_chain – MCSamples (optional) If the prior is not well approximated by a ranged prior and is informative it is better to explicitly use a prior only chain. If this is not given the algorithm will assume ranged priors with the ranges computed from the input chain.
param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
prior_factor – (optional) factor to scale the prior covariance. In case of strongly non-Gaussian posteriors it might be useful to artificially tighten the prior to have less noise in telling apart parameter space directions that are constrained by data and prior. Default is no scaling, prior_factor=1.

Returns:

the optimal KL cutoff, KL eigenvalues, KL eigenvectors and the parameter names that are used.

tensiometer.gaussian_tension.gaussian_approximation(chain, param_names=None)[source]¶

Function that computes the Gaussian approximation of a given chain.

Parameters:	chain – `MCSamples` the input chain. param_names – (optional) parameter names to restrict the Gaussian approximation. If none is given the default assumes that all parameters should be used.
Returns:	`GaussianND` object with the Gaussian approximation of the chain.

tensiometer.gaussian_tension.get_MAP_loglike(chain, feedback=True)[source]¶

Utility function to obtain the data part of the maximum posterior for a given chain. The best possibility is that a separate file with the posterior explicit MAP is given. If this is not the case then the function will try to get the likelihood at MAP from the samples. This possibility is far more noisy in general.

Parameters:	chain – `MCSamples` the input chain. feedback – logical flag to set whether the function should print a warning every time the explicit MAP file is not found. By default this is true.
Returns:	the data log likelihood at maximum posterior.

tensiometer.gaussian_tension.get_Neff(chain, prior_chain=None, param_names=None, prior_factor=1.0, localize=False, **kwargs)[source]¶

Function to compute the number of effective parameters constrained by a chain over the prior. The number of effective parameters is defined as in Eq. (29) of (Raveri and Hu 18) as:

\[N_{\rm eff} \equiv N -{\rm tr}[ \mathcal{C}_\Pi^{-1}\mathcal{C}_p ]\]

where \(N\) is the total number of nominal parameters of the chain, \(\mathcal{C}_\Pi\) is the covariance of the prior and \(\mathcal{C}_p\) is the posterior covariance.

Parameters:

chain – MCSamples the input chain.
prior_chain – (optional) the prior chain. If the prior is not well approximated by a ranged prior and is informative it is better to explicitly use a prior only chain. If this is not given the algorithm will assume ranged priors with the ranges computed from the input chain.
param_names – (optional) parameter names to restrict the calculation of \(N_{\rm eff}\). If none is given the default assumes that all running parameters should be used.
prior_factor – (optional) factor to scale the prior covariance. In case of strongly non-Gaussian posteriors it might be useful to artificially tighten the prior to have less noise in telling apart parameter space directions that are constrained by data and prior. Default is no scaling, prior_factor=1.

Returns:

the number of effective parameters.

tensiometer.gaussian_tension.get_localized_covariance(chain_1, chain_2, param_names, localize_params=None, scale=10.0)[source]¶: Get localized covariance of chain_1 localized with chain_2

tensiometer.gaussian_tension.get_prior_covariance(chain, param_names=None)[source]¶

Utility to estimate the prior covariance from the ranges of a chain. The flat range prior covariance (link) is given by:

\[C_{ij} = \delta_{ij} \frac{( max(p_i) - min(p_i) )^2}{12}\]

Parameters:	chain – `MCSamples` the input chain. param_names – optional choice of parameter names to restrict the calculation.
Returns:	the estimated covariance of the prior.