kde
This module contains functions to estimate the probability of a parameter shift given with KDE methods.
For further details we refer to arxiv 2105.03324.
- tensiometer.mcmc_tension.kde.AMISE_bandwidth(num_params, num_samples)[source]
Compute Silverman’s rule of thumb bandwidth covariance scaling AMISE. This is the default scaling that is used to compute the KDE estimate of parameter shifts.
- Parameters:
num_params – the number of parameters in the chain.
num_samples – the number of samples in the chain.
- Returns:
AMISE bandwidth matrix.
- Reference:
Chacón, J. E., Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications. United States: CRC Press.
- tensiometer.mcmc_tension.kde.MAX_bandwidth(num_params, num_samples)[source]
Compute the maximum bandwidth matrix. This bandwidth is generally oversmoothing.
- Parameters:
num_params – the number of parameters in the chain.
num_samples – the number of samples in the chain.
- Returns:
MAX bandwidth matrix.
- Reference:
Chacón, J. E., Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications. United States: CRC Press.
- tensiometer.mcmc_tension.kde.MISE_bandwidth(num_params, num_samples, feedback=0, **kwargs)[source]
Computes the MISE bandwidth matrix by numerically minimizing the MISE over the space of positive definite symmetric matrices.
- Parameters:
num_params – the number of parameters in the chain.
num_samples – the number of samples in the chain.
feedback – feedback level. If > 2 prints a lot of information.
kwargs – optional arguments to be passed to the optimizer algorithm.
- Returns:
MISE bandwidth matrix.
- Reference:
Chacón, J. E., Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications. United States: CRC Press.
- tensiometer.mcmc_tension.kde.MISE_bandwidth_1d(num_params, num_samples, **kwargs)[source]
Computes the MISE bandwidth matrix. All coordinates are considered the same so the MISE estimate just rescales the identity matrix.
- Parameters:
num_params – the number of parameters in the chain.
num_samples – the number of samples in the chain.
kwargs – optional arguments to be passed to the optimizer algorithm.
- Returns:
MISE 1d bandwidth matrix.
- Reference:
Chacón, J. E., Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications. United States: CRC Press.
- tensiometer.mcmc_tension.kde.OptimizeBandwidth_1D(diff_chain, param_names=None, num_bins=1000)[source]
Compute an estimate of an optimal bandwidth for covariance scaling as in GetDist. This is performed on whitened samples (with identity covariance), in 1D, and then scaled up with a dimensionality correction.
- Parameters:
diff_chain –
MCSamplesinput parameter difference chainparam_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
num_bins – number of bins used for the 1D estimate
- Returns:
scaling vector for the whitened parameters
- tensiometer.mcmc_tension.kde.Scotts_bandwidth(num_params, num_samples)[source]
Compute Scott’s rule of thumb bandwidth covariance scaling. This should be a fast approximation of the 1d MISE estimate.
- Parameters:
num_params – the number of parameters in the chain.
num_samples – the number of samples in the chain.
- Returns:
Scott’s scaling matrix.
- Reference:
Chacón, J. E., Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications. United States: CRC Press.
- tensiometer.mcmc_tension.kde.UCV_SP_bandwidth(white_samples, weights, feedback=0, near=1, near_max=20)[source]
Computes the optimal unbiased cross validation bandwidth scaling for the BALL sampling point KDE estimator.
- Parameters:
white_samples – pre-whitened samples (identity covariance).
weights – input sample weights.
feedback – (optional) how verbose is the algorithm. Default is zero.
near – (optional) number of nearest neighbour to use. Default is 1.
near_max – (optional) number of nearest neighbour to use for the UCV calculation. Default is 20.
- tensiometer.mcmc_tension.kde.UCV_bandwidth(weights, white_samples, alpha0=None, feedback=0, mode='full', **kwargs)[source]
Computes the optimal unbiased cross validation bandwidth for the input samples by numerical minimization.
- Parameters:
weights – input sample weights.
white_samples – pre-whitened samples (identity covariance)
alpha0 – (optional) initial guess for the bandwidth. If none is given then the AMISE band is used as the starting point for minimization.
feedback – (optional) how verbose is the algorithm. Default is zero.
mode – (optional) selects the space for minimization. Default is over the full space of SPD matrices. Other options are diag to perform minimization over diagonal matrices and 1d to perform minimization over matrices that are proportional to the identity.
kwargs – other arguments passed to
scipy.optimize.minimize()
- Returns:
UCV bandwidth matrix.
- Reference:
Chacón, J. E., Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications. United States: CRC Press.
- tensiometer.mcmc_tension.kde.kde_parameter_shift(diff_chain, param_names=None, scale=None, method='neighbor_elimination', feedback=1, **kwargs)[source]
Compute the KDE estimate of the probability of a parameter shift given an input parameter difference chain. This function uses a Kernel Density Estimate (KDE) algorithm discussed in (Raveri, Zacharegkas and Hu 19). If the difference chain contains \(n_{\rm samples}\) this algorithm scales as \(O(n_{\rm samples}^2)\) and might require long run times. For this reason the algorithm is parallelized with the joblib library. If the problem is 1d or 2d use the fft algorithm in
kde_parameter_shift_1D_fft()andkde_parameter_shift_2D_fft().- Parameters:
diff_chain –
MCSamplesinput parameter difference chainparam_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
scale –
(optional) scale for the KDE smoothing. The scale is always referred to white samples with unit covariance. If none is provided the algorithm uses MISE estimate. Options are:
a scalar for fixed scaling over all dimensions;
a matrix from anisotropic smoothing;
MISE, AMISE, MAX for the corresponding smoothing scale;
BALL or ELL for variable adaptive smoothing with nearest neighbour;
method –
(optional) a string containing the indication for the method to use in the KDE calculation. This can be very intensive so different techniques are provided.
method = brute_force is a parallelized brute force method. This method scales as \(O(n_{\rm samples}^2)\) and can be afforded only for small tensions. When suspecting a difference that is larger than 95% other methods are better.
method = neighbor_elimination is a KD Tree based elimination method. For large tensions this scales as \(O(n_{\rm samples}\log(n_{\rm samples}))\) and in worse case scenarions, with small tensions, this can scale as \(O(n_{\rm samples}^2)\) but with significant overheads with respect to the brute force method. When expecting a statistically significant difference in parameters this is the recomended algorithm.
Suggestion is to go with brute force for small problems, neighbor elimination for big problems with signifcant tensions. Default is neighbor_elimination.
feedback – (optional) print to screen the time taken for the calculation.
kwargs –
extra options to pass to the KDE algorithm. The neighbor_elimination algorithm accepts the following optional arguments:
stable_cycle: (default 2) number of elimination cycles that show no improvement in the result.
chunk_size: (default 40) chunk size for elimination cycles. For best perfornamces this parameter should be tuned to result in the greatest elimination rates.
smallest_improvement: (default 1.e-4) minimum percentage improvement rate before switching to brute force.
near: (default 1) n-nearest neighbour to use for variable bandwidth KDE estimators.
near_alpha: (default 1.0) scaling for nearest neighbour distance.
- Returns:
probability value and error estimate from binomial.
- Reference:
- tensiometer.mcmc_tension.kde.kde_parameter_shift_1D_fft(diff_chain, prior_diff_chain=None, param_names=None, scale=None, nbins=1024, feedback=1, boundary_correction_order=1, mult_bias_correction_order=1, **kwarks)[source]
Compute the MCMC estimate of the probability of a parameter shift given an input parameter difference chain in 1 dimension and by using FFT. This function uses GetDist 1D fft and optimal bandwidth estimates to perform the MCMC parameter shift integral discussed in (Raveri, Zacharegkas and Hu 19).
- Parameters:
diff_chain –
MCSamplesinput parameter difference chainprior_diff_chain –
MCSamples(optional) prior parameter difference chain. If present the code will use likelihood thresholded tension calculation, giving a result that is parameter invariant.param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
scale – (optional) scale for the KDE smoothing. If none is provided the algorithm uses GetDist optimized bandwidth.
nbins – (optional) number of 1D bins for the fft. Powers of 2 work best. Default is 1024.
mult_bias_correction_order – (optional) multiplicative bias correction passed to GetDist. See
get2DDensity().boundary_correction_order – (optional) boundary correction passed to GetDist. See
get2DDensity().feedback – (optional) print to screen the time taken for the calculation.
- Returns:
probability value and error estimate.
- Reference:
- tensiometer.mcmc_tension.kde.kde_parameter_shift_2D_fft(diff_chain, prior_diff_chain=None, param_names=None, scale=None, nbins=1024, feedback=1, boundary_correction_order=1, mult_bias_correction_order=1, **kwarks)[source]
Compute the MCMC estimate of the probability of a parameter shift given an input parameter difference chain in 2 dimensions and by using FFT. This function uses GetDist 2D fft and optimal bandwidth estimates to perform the MCMC parameter shift integral discussed in (Raveri, Zacharegkas and Hu 19).
- Parameters:
diff_chain –
MCSamplesinput parameter difference chainprior_diff_chain –
MCSamples(optional) prior parameter difference chain. If present the code will use likelihood thresholded tension calculation, giving a result that is parameter invariant.param_names – (optional) parameter names of the parameters to be used in the calculation. By default all running parameters.
scale – (optional) scale for the KDE smoothing. If none is provided the algorithm uses GetDist optimized bandwidth.
nbins – (optional) number of 2D bins for the fft. Powers of 2 work best. Default is 1024.
mult_bias_correction_order – (optional) multiplicative bias correction passed to GetDist. See
get2DDensity().boundary_correction_order – (optional) boundary correction passed to GetDist. See
get2DDensity().feedback – (optional) print to screen the time taken for the calculation.
- Returns:
probability value and error estimate.
- Reference: