tensiometer.mcmc_tension

This file contains the functions and utilities to compute agreement and disagreement between two different chains with Monte Carlo methods.

For more details on the method implemented see arxiv 1806.04649 and arxiv 1912.04880.

tensiometer.mcmc_tension.OptimizeBandwidth_1D(diff_chain, param_names=None, num_bins=1000)[source]

Compute an estimate of an optimal bandwidth for covariance scaling as in GetDist. This is performed on whitened samples (with identity covariance), in 1D, and then scaled up with a dimensionality correction.

Parameters:
  • diff_chain
  • param_names
  • num_bins – number of bins used for the 1D estimate
Returns:

scaling vector for the whitened parameters

tensiometer.mcmc_tension.Scotts_RT(num_params, num_samples)[source]

Compute Scott’s rule of thumb bandwidth covariance scaling. This is the default scaling that is used to compute the KDE estimate of parameter shifts.

Parameters:
  • num_params – the number of parameters in the chain.
  • num_samples – the number of samples in the chain.
Returns:

Scott’s scaling.

tensiometer.mcmc_tension.Silvermans_RT(num_params, num_samples)[source]

Compute Silverman’s rule of thumb bandwidth covariance scaling. This is the default scaling that is used to compute the KDE estimate of parameter shifts.

Parameters:
  • num_params – the number of parameters in the chain.
  • num_samples – the number of samples in the chain.
Returns:

Silverman’s scaling.

tensiometer.mcmc_tension.exact_parameter_shift(diff_chain, param_names=None, scale=None, method='brute_force', feedback=1, **kwargs)[source]

Compute the MCMC estimate of the probability of a parameter shift given an input parameter difference chain. This function uses a Kernel Density Estimate (KDE) algorithm discussed in (Raveri, Zacharegkas and Hu 19). If the difference chain contains \(n_{\rm samples}\) this algorithm scales as \(O(n_{\rm samples}^2)\) and might require long run times. For this reason the algorithm is parallelized with the joblib library. To compute the KDE density estimate several methods are implemented.

In the 2D case this defaults to exact_parameter_shift_2D_fft() unless the kwarg use_fft is False.

Parameters:
  • diff_chainMCSamples input parameter difference chain
  • param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
  • scale – (optional) scale for the KDE smoothing. If none is provided the algorithm uses Silverman’s rule of thumb scaling.
  • method

    (optional) a string containing the indication for the method to use in the KDE calculation. This can be very intensive so different techniques are provided.

    • method = brute_force is a parallelized brute force method. This method scales as \(O(n_{\rm samples}^2)\) and can be afforded only for small tensions. When suspecting a difference that is larger than 95% other methods are better.
    • method = nearest_elimination is a KD Tree based elimination method. For large tensions this scales as \(O(n_{\rm samples}\log(n_{\rm samples}))\) and in worse case scenarions, with small tensions, this can scale as \(O(n_{\rm samples}^2)\) but with significant overheads with respect to the brute force method. When expecting a statistically significant difference in parameters this is the recomended algorithm.

    Suggestion is to go with brute force for small problems, nearest elimination for big problems with signifcant tensions.

  • feedback – (optional) print to screen the time taken for the calculation.
  • kwargs

    extra options to pass to the KDE algorithm. The nearest_elimination algorithm accepts the following optional arguments:

    • stable_cycle: (default 2) number of elimination cycles that show no improvement in the result.
    • chunk_size: (default 40) chunk size for elimination cycles. For best perfornamces this parameter should be tuned to result in the greatest elimination rates.
    • smallest_improvement: (default 1.e-4) minimum percentage improvement rate before switching to brute force.
    • use_fft: whether to use fft when possible.
Returns:

probability value and error estimate.

tensiometer.mcmc_tension.exact_parameter_shift_2D_fft(diff_chain, param_names=None, scale=None, nbins=1024, feedback=1, boundary_correction_order=1, mult_bias_correction_order=1, **kwarks)[source]

Compute the MCMC estimate of the probability of a parameter shift given an input parameter difference chain in 2 dimensions and by using FFT. This function uses GetDist 2D fft and optimal bandwidth estimates to perform the MCMC parameter shift integral discussed in (Raveri, Zacharegkas and Hu 19).

Parameters:
  • diff_chainMCSamples input parameter difference chain
  • param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
  • scale – (optional) scale for the KDE smoothing. If none is provided the algorithm uses GetDist optimized bandwidth.
  • nbins – (optional) number of 2D bins for the fft. Default is 1024.
  • mult_bias_correction_order – (optional) multiplicative bias correction passed to GetDist. See get2DDensity().
  • boundary_correction_order – (optional) boundary correction passed to GetDist. See get2DDensity().
  • feedback – (optional) print to screen the time taken for the calculation.
Returns:

probability value and error estimate.

tensiometer.mcmc_tension.parameter_diff_chain(chain_1, chain_2, boost=1)[source]

Compute the chain of the parameter differences between the two input chains. The parameters of the difference chain are related to the parameters of the input chains, \(\theta_1\) and \(\theta_2\) by:

\[\Delta \theta \equiv \theta_1 - \theta_2\]

This function only returns the differences for the parameters that are common to both chains. This function preserves the chain separation (if any) so that the convergence of the difference chain can be tested. This function does not assume Gaussianity of the chain. This functions does assume that the parameter determinations from the two chains (i.e. the underlying data sets) are uncorrelated. Do not use this function for chains that are correlated.

Parameters:
  • chain_1MCSamples first input chain with \(n_1\) samples
  • chain_2MCSamples second input chain with \(n_2\) samples
  • boost – (optional) boost the number of samples in the difference chain. By default the length of the difference chain will be the length of the longest chain. Given two chains the full difference chain can contain \(n_1\times n_2\) samples but this is usually prohibitive for realistic chains. The boost parameters wil increase the number of samples to be \({\rm boost}\times {\rm max}(n_1,n_2)\). Default boost parameter is one. If boost is None the full difference chain is going to be computed (and will likely require a lot of memory and time).
Returns:

MCSamples the instance with the parameter difference chain.

tensiometer.mcmc_tension.parameter_diff_weighted_samples(samples_1, samples_2, boost=1, indexes_1=None, indexes_2=None)[source]

Compute the parameter differences of two input weighted samples. The parameters of the difference samples are related to the parameters of the input samples, \(\theta_1\) and \(\theta_2\) by:

\[\Delta \theta \equiv \theta_1 - \theta_2\]

This function does not assume Gaussianity of the chain. This functions does assume that the parameter determinations from the two chains (i.e. the underlying data sets) are uncorrelated. Do not use this function for chains that are correlated.

Parameters:
  • samples_1WeightedSamples first input weighted samples with \(n_1\) samples.
  • samples_2WeightedSamples second input weighted samples with \(n_2\) samples.
  • boost – (optional) boost the number of samples in the difference. By default the length of the difference samples will be the length of the longest one. Given two samples the full difference samples can contain \(n_1\times n_2\) samples but this is usually prohibitive for realistic chains. The boost parameters wil increase the number of samples to be \({\rm boost}\times {\rm max}(n_1,n_2)\). Default boost parameter is one. If boost is None the full difference chain is going to be computed (and will likely require a lot of memory and time).
  • indexes_1 – (optional) array with the indexes of the parameters to use for the first samples. By default this tries to use all parameters.
  • indexes_2 – (optional) array with the indexes of the parameters to use for the second samples. By default this tries to use all parameters.
Returns:

WeightedSamples the instance with the parameter difference samples.