tensiometer.mcmc_tension

This module contains the functions and utilities to compute non-Gaussian Monte Carlo tension estimators.

The submodule param_diff contains the functions and utilities to compute the distribution of parameter differences from the parameter posterior of two experiments.

The submodule kde contains the functions to compute the statistical significance of a difference in parameters with KDE methods.

This submodule flow contains the functions and utilities to compute the statistical significance of a difference in parameters with normalizing flow methods.

For more details on the method implemented see arxiv 1806.04649 and arxiv 1912.04880.

tensiometer.mcmc_tension.param_diff.parameter_diff_chain(chain_1, chain_2, boost=1)[source]

Compute the chain of the parameter differences between the two input chains. The parameters of the difference chain are related to the parameters of the input chains, \(\theta_1\) and \(\theta_2\) by:

\[\Delta \theta \equiv \theta_1 - \theta_2\]

This function only returns the differences for the parameters that are common to both chains. This function preserves the chain separation (if any) so that the convergence of the difference chain can be tested. This function does not assume Gaussianity of the chain. This functions does assume that the parameter determinations from the two chains (i.e. the underlying data sets) are uncorrelated. Do not use this function for chains that are correlated.

Parameters:
  • chain_1MCSamples first input chain with \(n_1\) samples
  • chain_2MCSamples second input chain with \(n_2\) samples
  • boost – (optional) boost the number of samples in the difference chain. By default the length of the difference chain will be the length of the longest chain. Given two chains the full difference chain can contain \(n_1\times n_2\) samples but this is usually prohibitive for realistic chains. The boost parameters wil increase the number of samples to be \({\rm boost}\times {\rm max}(n_1,n_2)\). Default boost parameter is one. If boost is None the full difference chain is going to be computed (and will likely require a lot of memory and time).
Returns:

MCSamples the instance with the parameter difference chain.

tensiometer.mcmc_tension.param_diff.parameter_diff_weighted_samples(samples_1, samples_2, boost=1, indexes_1=None, indexes_2=None)[source]

Compute the parameter differences of two input weighted samples. The parameters of the difference samples are related to the parameters of the input samples, \(\theta_1\) and \(\theta_2\) by:

\[\Delta \theta \equiv \theta_1 - \theta_2\]

This function does not assume Gaussianity of the chain. This functions does assume that the parameter determinations from the two chains (i.e. the underlying data sets) are uncorrelated. Do not use this function for chains that are correlated.

Parameters:
  • samples_1WeightedSamples first input weighted samples with \(n_1\) samples.
  • samples_2WeightedSamples second input weighted samples with \(n_2\) samples.
  • boost – (optional) boost the number of samples in the difference. By default the length of the difference samples will be the length of the longest one. Given two samples the full difference samples can contain \(n_1\times n_2\) samples but this is usually prohibitive for realistic chains. The boost parameters wil increase the number of samples to be \({\rm boost}\times {\rm max}(n_1,n_2)\). Default boost parameter is one. If boost is None the full difference chain is going to be computed (and will likely require a lot of memory and time).
  • indexes_1 – (optional) array with the indexes of the parameters to use for the first samples. By default this tries to use all parameters.
  • indexes_2 – (optional) array with the indexes of the parameters to use for the second samples. By default this tries to use all parameters.
Returns:

WeightedSamples the instance with the parameter difference samples.

tensiometer.mcmc_tension.kde.AMISE_bandwidth(num_params, num_samples)[source]

Compute Silverman’s rule of thumb bandwidth covariance scaling AMISE. This is the default scaling that is used to compute the KDE estimate of parameter shifts.

Parameters:
  • num_params – the number of parameters in the chain.
  • num_samples – the number of samples in the chain.
Returns:

AMISE bandwidth matrix.

Reference:

Chacón, J. E., Duong, T. (2018).  Multivariate Kernel Smoothing and Its Applications.  United States: CRC Press.

tensiometer.mcmc_tension.kde.MAX_bandwidth(num_params, num_samples)[source]

Compute the maximum bandwidth matrix. This bandwidth is generally oversmoothing.

Parameters:
  • num_params – the number of parameters in the chain.
  • num_samples – the number of samples in the chain.
Returns:

MAX bandwidth matrix.

Reference:

Chacón, J. E., Duong, T. (2018).  Multivariate Kernel Smoothing and Its Applications.  United States: CRC Press.

tensiometer.mcmc_tension.kde.MISE_bandwidth(num_params, num_samples, feedback=0, **kwargs)[source]

Computes the MISE bandwidth matrix by numerically minimizing the MISE over the space of positive definite symmetric matrices.

Parameters:
  • num_params – the number of parameters in the chain.
  • num_samples – the number of samples in the chain.
  • feedback – feedback level. If > 2 prints a lot of information.
  • kwargs – optional arguments to be passed to the optimizer algorithm.
Returns:

MISE bandwidth matrix.

Reference:

Chacón, J. E., Duong, T. (2018).  Multivariate Kernel Smoothing and Its Applications.  United States: CRC Press.

tensiometer.mcmc_tension.kde.MISE_bandwidth_1d(num_params, num_samples, **kwargs)[source]

Computes the MISE bandwidth matrix. All coordinates are considered the same so the MISE estimate just rescales the identity matrix.

Parameters:
  • num_params – the number of parameters in the chain.
  • num_samples – the number of samples in the chain.
  • kwargs – optional arguments to be passed to the optimizer algorithm.
Returns:

MISE 1d bandwidth matrix.

Reference:

Chacón, J. E., Duong, T. (2018).  Multivariate Kernel Smoothing and Its Applications.  United States: CRC Press.

tensiometer.mcmc_tension.kde.OptimizeBandwidth_1D(diff_chain, param_names=None, num_bins=1000)[source]

Compute an estimate of an optimal bandwidth for covariance scaling as in GetDist. This is performed on whitened samples (with identity covariance), in 1D, and then scaled up with a dimensionality correction.

Parameters:
  • diff_chainMCSamples input parameter difference chain
  • param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
  • num_bins – number of bins used for the 1D estimate
Returns:

scaling vector for the whitened parameters

tensiometer.mcmc_tension.kde.Scotts_bandwidth(num_params, num_samples)[source]

Compute Scott’s rule of thumb bandwidth covariance scaling. This should be a fast approximation of the 1d MISE estimate.

Parameters:
  • num_params – the number of parameters in the chain.
  • num_samples – the number of samples in the chain.
Returns:

Scott’s scaling matrix.

Reference:

Chacón, J. E., Duong, T. (2018).  Multivariate Kernel Smoothing and Its Applications.  United States: CRC Press.

tensiometer.mcmc_tension.kde.UCV_SP_bandwidth(white_samples, weights, feedback=0, near=1, near_max=20)[source]

Computes the optimal unbiased cross validation bandwidth scaling for the BALL sampling point KDE estimator.

Parameters:
  • white_samples – pre-whitened samples (identity covariance).
  • weights – input sample weights.
  • feedback – (optional) how verbose is the algorithm. Default is zero.
  • near – (optional) number of nearest neighbour to use. Default is 1.
  • near_max – (optional) number of nearest neighbour to use for the UCV calculation. Default is 20.
tensiometer.mcmc_tension.kde.UCV_bandwidth(weights, white_samples, alpha0=None, feedback=0, mode='full', **kwargs)[source]

Computes the optimal unbiased cross validation bandwidth for the input samples by numerical minimization.

Parameters:
  • weights – input sample weights.
  • white_samples – pre-whitened samples (identity covariance)
  • alpha0 – (optional) initial guess for the bandwidth. If none is given then the AMISE band is used as the starting point for minimization.
  • feedback – (optional) how verbose is the algorithm. Default is zero.
  • mode – (optional) selects the space for minimization. Default is over the full space of SPD matrices. Other options are diag to perform minimization over diagonal matrices and 1d to perform minimization over matrices that are proportional to the identity.
  • kwargs – other arguments passed to scipy.optimize.minimize()
Returns:

UCV bandwidth matrix.

Reference:

Chacón, J. E., Duong, T. (2018).  Multivariate Kernel Smoothing and Its Applications.  United States: CRC Press.

tensiometer.mcmc_tension.kde.kde_parameter_shift(diff_chain, param_names=None, scale=None, method='neighbor_elimination', feedback=1, **kwargs)[source]

Compute the KDE estimate of the probability of a parameter shift given an input parameter difference chain. This function uses a Kernel Density Estimate (KDE) algorithm discussed in (Raveri, Zacharegkas and Hu 19). If the difference chain contains \(n_{\rm samples}\) this algorithm scales as \(O(n_{\rm samples}^2)\) and might require long run times. For this reason the algorithm is parallelized with the joblib library. If the problem is 1d or 2d use the fft algorithm in kde_parameter_shift_1D_fft() and kde_parameter_shift_2D_fft().

Parameters:
  • diff_chainMCSamples input parameter difference chain
  • param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
  • scale

    (optional) scale for the KDE smoothing. The scale is always referred to white samples with unit covariance. If none is provided the algorithm uses MISE estimate. Options are:

    1. a scalar for fixed scaling over all dimensions;
    2. a matrix from anisotropic smoothing;
    3. MISE, AMISE, MAX for the corresponding smoothing scale;
    4. BALL or ELL for variable adaptive smoothing with nearest neighbour;
  • method

    (optional) a string containing the indication for the method to use in the KDE calculation. This can be very intensive so different techniques are provided.

    1. method = brute_force is a parallelized brute force method. This method scales as \(O(n_{\rm samples}^2)\) and can be afforded only for small tensions. When suspecting a difference that is larger than 95% other methods are better.
    2. method = neighbor_elimination is a KD Tree based elimination method. For large tensions this scales as \(O(n_{\rm samples}\log(n_{\rm samples}))\) and in worse case scenarions, with small tensions, this can scale as \(O(n_{\rm samples}^2)\) but with significant overheads with respect to the brute force method. When expecting a statistically significant difference in parameters this is the recomended algorithm.

    Suggestion is to go with brute force for small problems, neighbor elimination for big problems with signifcant tensions. Default is neighbor_elimination.

  • feedback – (optional) print to screen the time taken for the calculation.
  • kwargs

    extra options to pass to the KDE algorithm. The neighbor_elimination algorithm accepts the following optional arguments:

    1. stable_cycle: (default 2) number of elimination cycles that show no improvement in the result.
    2. chunk_size: (default 40) chunk size for elimination cycles. For best perfornamces this parameter should be tuned to result in the greatest elimination rates.
    3. smallest_improvement: (default 1.e-4) minimum percentage improvement rate before switching to brute force.
    4. near: (default 1) n-nearest neighbour to use for variable bandwidth KDE estimators.
    5. near_alpha: (default 1.0) scaling for nearest neighbour distance.
Returns:

probability value and error estimate from binomial.

Reference:

Raveri, Zacharegkas and Hu 19

tensiometer.mcmc_tension.kde.kde_parameter_shift_1D_fft(diff_chain, param_names=None, scale=None, nbins=1024, feedback=1, boundary_correction_order=1, mult_bias_correction_order=1, **kwarks)[source]

Compute the MCMC estimate of the probability of a parameter shift given an input parameter difference chain in 1 dimension and by using FFT. This function uses GetDist 1D fft and optimal bandwidth estimates to perform the MCMC parameter shift integral discussed in (Raveri, Zacharegkas and Hu 19).

Parameters:
  • diff_chainMCSamples input parameter difference chain
  • param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
  • scale – (optional) scale for the KDE smoothing. If none is provided the algorithm uses GetDist optimized bandwidth.
  • nbins – (optional) number of 1D bins for the fft. Powers of 2 work best. Default is 1024.
  • mult_bias_correction_order – (optional) multiplicative bias correction passed to GetDist. See get2DDensity().
  • boundary_correction_order – (optional) boundary correction passed to GetDist. See get2DDensity().
  • feedback – (optional) print to screen the time taken for the calculation.
Returns:

probability value and error estimate.

Reference:

Raveri, Zacharegkas and Hu 19

tensiometer.mcmc_tension.kde.kde_parameter_shift_2D_fft(diff_chain, param_names=None, scale=None, nbins=1024, feedback=1, boundary_correction_order=1, mult_bias_correction_order=1, **kwarks)[source]

Compute the MCMC estimate of the probability of a parameter shift given an input parameter difference chain in 2 dimensions and by using FFT. This function uses GetDist 2D fft and optimal bandwidth estimates to perform the MCMC parameter shift integral discussed in (Raveri, Zacharegkas and Hu 19).

Parameters:
  • diff_chainMCSamples input parameter difference chain
  • param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
  • scale – (optional) scale for the KDE smoothing. If none is provided the algorithm uses GetDist optimized bandwidth.
  • nbins – (optional) number of 2D bins for the fft. Powers of 2 work best. Default is 1024.
  • mult_bias_correction_order – (optional) multiplicative bias correction passed to GetDist. See get2DDensity().
  • boundary_correction_order – (optional) boundary correction passed to GetDist. See get2DDensity().
  • feedback – (optional) print to screen the time taken for the calculation.
Returns:

probability value and error estimate.

Reference:

Raveri, Zacharegkas and Hu 19

class tensiometer.mcmc_tension.flow.DiffFlowCallback(diff_chain, param_names=None, Z2Y_bijector='MAF', pregauss_bijector=None, learning_rate=0.001, feedback=1, validation_split=0.1, early_stop_nsigma=0.0, early_stop_patience=10, **kwargs)[source]

A class to compute the normalizing flow estimate of the probability of a parameter shift given an input parameter difference chain.

A normalizing flow is trained to approximate the difference distribution and then used to numerically evaluate the probablity of a parameter shift (see REF). To do so, it defines a bijective mapping that is optimized to gaussianize the difference chain samples. This mapping is performed in two steps, using the gaussian approximation as pre-whitening. The notations used in the code are:

  • X designates samples in the original parameter difference space;
  • Y designates samples in the gaussian approximation space, Y is obtained by shifting and scaling X by its mean and covariance (like a PCA);
  • Z designates samples in the gaussianized space, connected to Y with a normalizing flow denoted Z2Y_bijector.

The user may provide the Z2Y_bijector as a Bijector object from Tensorflow Probability or make use of the utility class SimpleMAF to instantiate a Masked Autoregressive Flow (with Z2Y_bijector=’MAF’).

This class derives from Callback from Keras, which allows for visualization during training. The normalizing flows (X->Y->Z) are implemented as Bijector objects and encapsulated in a Keras Model.

Here is an example:

# Initialize the flow and model
diff_flow_callback = DiffFlowCallback(diff_chain, Z2Y_bijector='MAF')
# Train the model
diff_flow_callback.train()
# Compute the shift probability and confidence interval
p, p_low, p_high = diff_flow_callback.estimate_shift_significance()
Parameters:
  • diff_chain (MCSamples) – input parameter difference chain.
  • param_names (list, optional) – parameter names of the parameters to be used in the calculation. By default all running parameters.
  • Z2Y_bijector (optional) – either a Bijector object representing the mapping from Z to Y, or ‘MAF’ to instantiate a SimpleMAF, defaults to ‘MAF’.
  • pregauss_bijector (optional) – not implemented yet, defaults to None.
  • learning_rate (float, optional) – initial learning rate, defaults to 1e-3.
  • feedback (int, optional) – feedback level, defaults to 1.
  • validation_split (float, optional) – fraction of samples to use for the validation sample, defaults to 0.1
  • early_stop_nsigma (float, optional) – absolute error on the tension at the zero-shift point to be used as an approximate convergence criterion for early stopping, defaults to 0.
  • early_stop_patience (int, optional) – minimum number of epochs to use when early_stop_nsigma is non-zero, defaults to 10.
Raises:

NotImplementedError – if pregauss_bijector is not None.

Reference:

George Papamakarios, Theo Pavlakou, Iain Murray (2017). Masked Autoregressive Flow for Density Estimation. arXiv:1705.07057

estimate_shift(tol=0.05, max_iter=1000, step=100000)[source]

Compute the normalizing flow estimate of the probability of a parameter shift given the input parameter difference chain. This is done with a Monte Carlo estimate by comparing the probability density at the zero-shift point to that at samples drawn from the normalizing flow approximation of the distribution.

Parameters:
  • tol (float, optional) – absolute tolerance on the shift significance, defaults to 0.05.
  • max_iter (int, optional) – maximum number of sampling steps, defaults to 1000.
  • step (int, optional) – number of samples per step, defaults to 100000.
Returns:

probability value and error estimate.

on_epoch_end(epoch, logs={})[source]

This method is used by Keras to show progress during training if feedback is True.

train(epochs=100, batch_size=None, steps_per_epoch=None, callbacks=[], verbose=1, **kwargs)[source]

Train the normalizing flow model. Internallay, this runs the fit method of the Keras Model, to which **kwargs are passed.

Parameters:
  • epochs (int, optional) – number of training epochs, defaults to 100.
  • batch_size (int, optional) – number of samples per batch, defaults to None. If None, the training sample is divided into steps_per_epoch batches.
  • steps_per_epoch (int, optional) – number of steps per epoch, defaults to None. If None and batch_size is also None, then steps_per_epoch is set to 100.
  • callbacks (list, optional) – a list of additional Keras callbacks, such as ReduceLROnPlateau, defaults to [].
  • verbose (int, optional) – verbosity level, defaults to 1.
Returns:

A History object. Its history attribute is a dictionary of training and validation loss values and metrics values at successive epochs: “shift0_chi2” is the squared norm of the zero-shift point in the gaussianized space, with the probability-to-exceed and corresponding tension in “shift0_pval” and “shift0_nsigma”; “chi2Z_ks” and “chi2Z_ks_p” contain the \(D_n\) statistic and probability-to-exceed of the Kolmogorov-Smironov test that squared norms of the transformed samples Z are \(\chi^2\) distributed (with a number of degrees of freedom equal to the number of parameters).

class tensiometer.mcmc_tension.flow.SimpleMAF(num_params, n_maf=None, hidden_units=None, permutations=True, activation=<function asinh>, kernel_initializer='glorot_uniform', feedback=0, **kwargs)[source]

A class to implement a simple Masked AutoRegressive Flow (MAF) using the implementation tfp.bijectors.AutoregressiveNetwork from from Tensorflow Probability. Additionally, this class provides utilities to load/save models, including random permutations.

Parameters:
  • num_params (int) – number of parameters, ie the dimension of the space of which the bijector is defined.
  • n_maf (int, optional) – number of MAFs to stack. Defaults to None, in which case it is set to 2*num_params.
  • hidden_units (list, optional) – a list of the number of nodes per hidden layers. Defaults to None, in which case it is set to [num_params*2]*2.
  • permutations (bool, optional) – whether to use shuffle dimensions between stacked MAFs, defaults to True.
  • activation (optional) – activation function to use in all layers, defaults to tf.math.asinh().
  • kernel_initializer (str, optional) – kernel initializer, defaults to ‘glorot_uniform’.
  • feedback (int, optional) – print the model architecture, defaults to 0.
Reference:

George Papamakarios, Theo Pavlakou, Iain Murray (2017). Masked Autoregressive Flow for Density Estimation. arXiv:1705.07057

classmethod load(num_params, path, **kwargs)[source]

Load a saved SimpleMAF object. The number of parameters and all other keyword arguments (except for permutations) must be included as the MAF is first created with random weights and then these weights are restored.

Parameters:
  • num_params (int) – number of parameters, ie the dimension of the space of which the bijector is defined.
  • path (str) – path of the directory from which to load.
Returns:

a SimpleMAF.

save(path)[source]

Save a SimpleMAF object.

Parameters:path (str) – path of the directory where to save.