Cyrille Doux (cdoux@sas.upenn.edu), Marco Raveri (mraveri@sas.upenn.edu)
In this notebook we show how to compute the statistical significance of a parameter shift between two experiments, DES Y1 and Planck 18, with the two techniques discussed in Raveri and Doux (2021), arXiv:2105.03324.
# Show plots inline, and load main getdist plot module and samples class
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 1
# import libraries:
import sys, os
sys.path.insert(0,os.path.realpath(os.path.join(os.getcwd(),'../..')))
from getdist import plots, MCSamples
from getdist.gaussian_mixtures import GaussianND
import getdist
getdist.chains.print_load_details = False
import scipy
import matplotlib.pyplot as plt
import IPython
from IPython.display import Markdown
import numpy as np
import seaborn as sns
# import the tensiometer tools that we need:
import tensiometer
from tensiometer import utilities
from tensiometer import gaussian_tension
from tensiometer import mcmc_tension
We start importing the chains and computing the parameter difference chain. We use the boost parameter to ensure that we enough samples for later.
# load the samples (remove no burn in since the example chains have already been cleaned):
chains_dir = './../../test_chains/'
# the Planck 2018 TTTEEE chain:
chain_1 = getdist.mcsamples.loadMCSamples(file_root=chains_dir+'Planck18TTTEEE', no_cache=True)
# the DES Y1 3x2 chain:
chain_2 = getdist.mcsamples.loadMCSamples(file_root=chains_dir+'DES', no_cache=True)
# the difference chain:
diff_chain = mcmc_tension.parameter_diff_chain(chain_1, chain_2, boost=4)
Now we do a sanity check plot:
param_names = diff_chain.getParamNames().getRunningNames()
g = plots.get_subplot_plotter()
g.triangle_plot([diff_chain], params=param_names, filled=True, markers={_p:0 for _p in param_names})
Looks non-Gaussian uh? Let's see below how to cope with this!
The code provides a helper function, tensiometer.mcmc_tension.flow_parameter_shift(diff_chain)
, to create the model, train it and compute the shift significance. However, we will do these three steps manually to do some fine tuning and demonstrate some functionalities.
We first create the MAF model, with default architecture, but an initial learning rate of 0.01.
diff_flow_callback = tensiometer.mcmc_tension.DiffFlowCallback(diff_chain, feedback=1, learning_rate=0.01)
We now train the model. To improve convergence, we started with a high learning rate and will use a Keras callback to decrease it when the validation loss is on a plateau.
from tensorflow.keras.callbacks import ReduceLROnPlateau
callbacks = [ReduceLROnPlateau()]
batch_size = 8192
epochs = 100
steps_per_epoch = 128
diff_flow_callback.train(batch_size=batch_size, epochs=epochs, steps_per_epoch=steps_per_epoch, callbacks=callbacks)
exact_shift_P_1, exact_shift_low_1, exact_shift_hi_1 = diff_flow_callback.estimate_shift()
print(f'Shift probability considering all parameters = {exact_shift_P_1:.5f} +{exact_shift_hi_1-exact_shift_P_1:.5f} -{exact_shift_P_1-exact_shift_low_1:.5f}')
# turn the result to effective number of sigmas:
print(f' n_sigma = {utilities.from_confidence_to_sigma(exact_shift_P_1):.3f}')
Note that there is some variance in the result due to initialization. You can repeat the above calculation some times to evaluate the variance.
We can now plot the learned distribution. To do so we draw some samples from the learned distribution and then feed them to getdist plotting:
N = 10000
X_sample = np.array(diff_flow_callback.dist_learned.sample(N))
diff_flow = MCSamples(samples=X_sample, names=param_names, label='Learned distribution')
colors=['orange', 'dodgerblue', 'k']
g = plots.get_subplot_plotter()
g.settings.num_plot_contours = 2
gaussian_approx = tensiometer.gaussian_tension.gaussian_approximation(diff_chain)
g.triangle_plot([diff_chain, diff_flow, gaussian_approx], params=param_names,
filled=False, markers={_p:0 for _p in param_names},
colors=colors, diag1d_kwargs={'colors':colors})
As we can see the two distributions match astonishingly well, capturing all non-Gaussian feature that we can identify in this plot.
We first run the KDE algorithm with default settings and high feedback to have a sense of its inner workings:
shift_P, shift_low, shift_hi = mcmc_tension.kde_parameter_shift(diff_chain, feedback=10)
print(f'Shift probability considering all parameters = {shift_P:.5f} +{shift_hi-shift_P:.5f} -{shift_P-shift_low:.5f}')
# turn the result to effective number of sigmas:
print(f' n_sigma = {utilities.from_confidence_to_sigma(shift_P):.3f}')
As we can see the algorithm proceeds in two steps, elimination of points that are clearly above the probability of zero shift based on the probability estimate containing just a few nearest points and then brute force polishing of the leftover points. This gets the right answer and takes a fairly reasonable amount of time.
You could try to run this cell with method='brute_force' to see the type of performance improvement that this algorithm achieves.
One key parameter in the estimate of the KDE shift is the smoothing scale for the pdf. The default choice is the one that minimizes the mean integrated square error (MISE) under assumptions of Gaussianity.
shift_P, shift_low, shift_hi = mcmc_tension.kde_parameter_shift(diff_chain, scale='MISE', feedback=0)
print(f'Shift probability considering all parameters = {shift_P:.5f} +{shift_hi-shift_P:.5f} -{shift_P-shift_low:.5f}')
# turn the result to effective number of sigmas:
print(f' n_sigma = {utilities.from_confidence_to_sigma(shift_P):.3f}')
But we can try different choices. In particular we could try the asyntotic MISE estimator (AMISE):
shift_P, shift_low, shift_hi = mcmc_tension.kde_parameter_shift(diff_chain, scale='AMISE', feedback=0)
print(f'Shift probability considering all parameters = {shift_P:.5f} +{shift_hi-shift_P:.5f} -{shift_P-shift_low:.5f}')
# turn the result to effective number of sigmas:
print(f' n_sigma = {utilities.from_confidence_to_sigma(shift_P):.3f}')
which is slightly undersmoothing and hence resulting in a slightly higher tension.
Or we could try the maximum bandwidth that is (by design) oversmoothing:
shift_P, shift_low, shift_hi = mcmc_tension.kde_parameter_shift(diff_chain, scale='MAX', feedback=0)
print(f'Shift probability considering all parameters = {shift_P:.5f} +{shift_hi-shift_P:.5f} -{shift_P-shift_low:.5f}')
# turn the result to effective number of sigmas:
print(f' n_sigma = {utilities.from_confidence_to_sigma(shift_P):.3f}')
As we can see the MISE smoothing scale achieves a balance between these two. Overestimating the smoothing scale usually results in slightly smaller tensions.
We can now try the adaptive bandwidth:
shift_P, shift_low, shift_hi = mcmc_tension.kde_parameter_shift(diff_chain, scale='BALL', feedback=0)
print(f'Shift probability considering all parameters = {shift_P:.5f} +{shift_hi-shift_P:.5f} -{shift_P-shift_low:.5f}')
# turn the result to effective number of sigmas:
print(f' n_sigma = {utilities.from_confidence_to_sigma(shift_P):.3f}')
As we can see all results are mostly in agreement with each other and this provides an invaluable cross check of the different calculation techniques.
The number of samples used in the calculation can be changed to make sure that everything is converged and makes sense. Since all algorithms work in a fairly straightforward way on a laptop we encourage to always try and compare different outputs to make sure the result is sensible.
We now verify that the calculation makes physical sense. We know that a difference between these two results mostly lives in a 2 dimensional parameter space so we can do the calculation there and plot the result.
In this case, since we are in two dimensions we can use the fft algorithm that is sensibly faster.
param_names = ['delta_omegam', 'delta_sigma8']
shift_P, shift_low, shift_hi = mcmc_tension.kde_parameter_shift_2D_fft(diff_chain, param_names=param_names, feedback=0)
print(f'Shift probability considering two parameters = {shift_P:.5f} +{shift_hi-shift_P:.5f} -{shift_P-shift_low:.5f}')
# turn the result to effective number of sigmas:
print(f' n_sigma = {utilities.from_confidence_to_sigma(shift_P):.3f}')
g = plots.get_single_plotter()
diff_chain.updateSettings({'contours': [0.68, 0.95, shift_P]})
g.settings.num_plot_contours = 3
g.triangle_plot(diff_chain, param_names, filled=True, markers={name:0. for name in param_names});