What is an experiment measuring?¶
Marco Raveri (marco.raveri@unige.it)
In this notebook we show a worked example of how to go from the posterior of a given experiment to understanding the physics that controls its best constrained properties.
If you are interested in the details of this treatment we refer you to Dacunha, Raveri, Park, Doux, Jain (2021), arXiv:2112.05737.
We work with the results of the Dark Energy Survey (DES) first year of data (https://arxiv.org/abs/1708.01530). Specifically we consider the full weak lensing and galaxy clustering (3x2) analysis.
# Show plots inline, and load main getdist plot module and samples class
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 1
# import libraries:
import sys, os
os.environ['TF_USE_LEGACY_KERAS'] = '1' # needed for tensorflow KERAS compatibility
os.environ['DISPLAY'] = 'inline' # hack to get getdist working
sys.path.insert(0,os.path.realpath(os.path.join(os.getcwd(),'../..')))
from getdist import plots, MCSamples
from getdist.gaussian_mixtures import GaussianND
import getdist
getdist.chains.print_load_details = False
import scipy
import matplotlib.pyplot as plt
import IPython
from IPython.display import Markdown
import numpy as np
import seaborn as sns
# import the tensiometer tools that we need:
from tensiometer.utilities import stats_utilities as utilities
from tensiometer import gaussian_tension
We now import the chains with the parameter posteriors.
Notice that we have removed many parameters from the chains since they were irrelevant to the example. The chains are already fully polished. Burn in has been removed and the samples have been thinned (to save space on github).
We have run a prior only chain to ensure that the modeling of the prior is as faithful as possible. In particular, in standard cosmological analyses, we have priors on derived parameters that would give non-trivial shapes to the parameters that are being sampled (see for example Appendix F in https://arxiv.org/abs/1806.04649)
# load the chains (remove no burn in since the example chains have already been cleaned):
chains_dir = os.path.realpath(os.path.join(os.getcwd(), '../..', 'test_chains'))
# the DES Y1 3x2 chain:
settings = {'ignore_rows':0, 'smooth_scale_1D':0.3, 'smooth_scale_2D':0.3}
chain = getdist.mcsamples.loadMCSamples(file_root=os.path.join(chains_dir, 'DES'), no_cache=True, settings=settings)
# the prior chain:
prior_chain = getdist.mcsamples.loadMCSamples(file_root=os.path.join(chains_dir, 'prior'), no_cache=True, settings=settings)
# we add a couple of derived parameters that have been removed from the chains (to save a little space on github):
for ch in [chain, prior_chain]:
p = ch.getParams()
ch.addDerived(p.omegach2/(p.H0/100.)**2, name='omegac', label='\\Omega_c')
ch.addDerived(p.omegabh2/(p.H0/100.)**2, name='omegab', label='\\Omega_b')
ch.addDerived(p.sigma8*np.sqrt(p.omegam/0.3), name='S8', label='S_8')
Now we look at the parameters that were used to run the chain and we plot them against the prior:
param_names = chain.getParamNames().getRunningNames()
g = plots.get_subplot_plotter()
g.triangle_plot([prior_chain, chain], params=param_names, filled=True)
plt.show()
This plot is not particularly illuminating. We might guess that 4 parameteres are constrained but it is hard to be sure about the number because of degeneracies.
How many parameters are constrained?¶
To quantify how many parameters DES is measuring over the prior we compute the number of constrained parameters, as discussed in (https://arxiv.org/pdf/1806.04649.pdf).
The posterior covariance, $C_p$, combines the prior, $C_\Pi$, and data, $C$, covariances as: \begin{align} C_p^{-1} = C_\Pi^{-1} + C^{-1} \end{align} A parameter is measured when the posterior is closest to the purely data covariance $C^{-1}C_p \sim I$ (i.e. the prior is irrelevant).
With a little algebra the number of parameters that are constrained by the data over the prior is then: \begin{align} N_{\rm eff} \equiv N -{\rm Tr}(C_\Pi^{-1} C_p) \end{align} where $N$ is the nominal number of parameters.
This tells us how many parameters the data set is measuring. Notice that one important property is that this quantity is reparametrization invariant so it does not really depend on the specific parameters that we are using.
gaussian_tension.get_Neff(chain, param_names=param_names, prior_chain=prior_chain)
3.314772623683254
This tells us that DES is measuging three cosmological parameters that we now try to nail down.
Since the number of constrained parameters is reparametrization invariant we first use some of our physical intuition to change to a more suitable parameter basis and neglect $\tau$.
param_names = ['omegam', 'omegab', 'sigma8', 'ns', 'H0']
We can plot these parameters:
param_limits = {'omegam':[0.,.5], 'omegab':[0.,.5]}
g = plots.get_subplot_plotter()
g.triangle_plot([prior_chain,chain], params=param_names, filled=True, param_limits=param_limits)
plt.show()
This gives us a picture that is closer to our expectation of three+ constrained parameters and we can verify that by recomputing the number of constrained parameters in this parameter basis:
gaussian_tension.get_Neff(chain, param_names=param_names, prior_chain=prior_chain)
3.5927990946746267
As we can see we have all the parameters that we need. This number is sliglthy larger than the previous result because of clear non-Gaussianities.
Constrained parameters combinations:¶
The parameters that we have here are degenerate and we want to disentangle them to better understand their physical meaning.
Since our measurements are mostly multiplicative when building the power spectra we consider the logarithm of these parameters to perform a power law decomposition:
for ch in [chain, prior_chain]:
p = ch.getParams()
ch.addDerived(np.log(p.omegam), name='log_omegam', label='\\log \\Omega_m')
ch.addDerived(np.log(p.omegab), name='log_omegab', label='\\log \\Omega_b')
ch.addDerived(np.log(p.sigma8), name='log_sigma8', label='\\log \\sigma_8')
ch.addDerived(np.log(p.ns), name='log_ns', label='\\log n_s')
ch.addDerived(np.log(p.H0), name='log_H0', label='\\log H_0')
We now want to compute the parameter space combinations for which the posterior is an improvement of the prior, i.e. $F_p - F_\Pi > 0$ in terms of empirical Fisher matrices $F\equiv C^{-1}$.
To do this we compute the KL decomposition of the posterior covariance with respect to the prior covariance, as discussed in (https://arxiv.org/pdf/2112.05737.pdf).
# do the KL decomposition on the log parameters:
KL_param_names = ['log_'+name for name in param_names]
# compute the KL modes:
KL_eig, KL_eigv, KL_param_names = gaussian_tension.Q_UDM_KL_components(prior_chain, chain, param_names=KL_param_names)
# print:
with np.printoptions(precision=2, suppress=True):
print('Improvement factor over the prior:', KL_eig)
print('Improvement in error units:', np.sqrt(KL_eig-1))
Improvement factor over the prior: [1419.15 51.34 31.38 1.34 1.11] Improvement in error units: [37.66 7.1 5.51 0.58 0.34]
As we can see we have three parameters whose error bar is improved over the prior by more than a factor of two. Other modes show an improvement in errorbar that is smaller than 50%.
We now want to plot the projection of these modes over parameters to see what parameter is contributing the most (in units of its variance). To do so we decompose the parameter Fisher matrix over KL components. In the following figure rows sum to one and tell us how much a given KL mode contributes to the error bar (not marginalized) of the given parameter.
# First we compute the fractional Fisher matrix:
KL_param_names, KL_eig, fractional_fisher, _ = gaussian_tension.Q_UDM_fisher_components(prior_chain, chain, KL_param_names, which='1')
# plot (showing values and names):
im1 = plt.imshow( fractional_fisher, cmap='viridis')
num_params = len(fractional_fisher)
for i in range(num_params):
for j in range(num_params):
if fractional_fisher[j,i]>0.5:
col = 'k'
else:
col = 'w'
plt.text(i, j, np.round(fractional_fisher[j,i],2), va='center', ha='center', color=col)
plt.xlabel('KL mode (error improvement)');
plt.ylabel('Parameters');
ticks = np.arange(num_params)
labels = [ str(t+1)+'\n ('+str(l)+')' for t,l in zip(ticks,np.round(np.sqrt(KL_eig-1.),2))]
plt.xticks(ticks, labels, horizontalalignment='center');
labels = [ '$'+chain.getParamNames().parWithName(name).label+'$' for name in KL_param_names ]
plt.yticks(ticks, labels, horizontalalignment='right');
plt.show()
As we can see the first two, best constrained, modes almost entirely inform $\Omega_m$ and $\sigma_8$. The third mode is almost entirely projecting on $\Omega_b$. The last two weak modes are projecting on $n_s$ and $H_0$.
We can now visualize the modes in parameter space:
# compute the parameter space combinations corresponding to KL modes:
param_directions = np.linalg.inv(KL_eigv.T)
g = plots.get_subplot_plotter()
g.triangle_plot([chain], params=param_names, filled=True)
# add the modes:
for i in range(len(param_names)-1):
for j in range(i+1,len(param_names)):
ax = g.subplots[j,i]
# get mean:
m1, m2 = chain.getMeans(pars=[chain.index[name]
for name in [KL_param_names[i], KL_param_names[j]]])
ax.scatter(np.exp(m1), np.exp(m2), color='k')
alpha = 3.*np.linspace(-1.,1.,100)
for k in range(3):
ax.plot(np.exp(m1+alpha*param_directions[:,k][i]), np.exp(m2+alpha*param_directions[:,k][j]), color=sns.hls_palette(6)[k], label='Mode '+str(k+1))
g.fig.legend(*ax.get_legend_handles_labels());
plt.show()
We can project the posterior on the KL modes. In this space the posterior parameters are going to be uncorrelated and with unit variance.
idx = [chain.index[name] for name in KL_param_names]
prior_mean = prior_chain.getMeans(pars=[prior_chain.index[name] for name in KL_param_names])
temp_names = ['mode'+str(i+1) for i in range(len(KL_param_names))]
KL_filtered_samples = np.dot(chain.samples[:,idx]-prior_mean, KL_eigv)
KL_chain = MCSamples(samples=KL_filtered_samples,
loglikes=chain.loglikes,
weights=chain.weights,
names=temp_names,
label='KL modes DES')
idx = [prior_chain.index[name] for name in KL_param_names]
KL_filtered_prior_samples = np.dot(prior_chain.samples[:,idx]-prior_mean, KL_eigv)
KL_prior_chain = MCSamples(samples=KL_filtered_prior_samples,
loglikes=prior_chain.loglikes,
weights=prior_chain.weights,
names=temp_names,
label='KL modes prior')
g = plots.get_subplot_plotter()
g.triangle_plot([KL_chain,KL_prior_chain], params=temp_names, filled=True)
plt.show()
Note that the first few modes are well constrained and their posterior is fairly Gaussian, as data constraint over prior degrades non-Gaussianities arise.
Since we are considering the logarithm of parameters our KL decomposition corresponds to a power law decomposition. We now reconstruct the representation of the best constrained mode in the original parameter space.
All this analysis in packaged in a series of utility functions:
CPCA_results = gaussian_tension.linear_CPCA_chains(prior_chain, chain, KL_param_names)
print(gaussian_tension.print_CPCA_results(CPCA_results))
CPCA for 5 parameters:
1 : \log \Omega_m
2 : \log \Omega_b
3 : \log \sigma_8
4 : \log n_s
5 : \log H_0
Neff = 3.30
Neff mode = [ 1.00 0.98 0.97 0.25 0.10]
<KL> divergence = 9.69
<KL> mode = [ 4.52 2.15 1.81 0.57 0.65]
CPC amplitudes - 1 (variance improvement per mode):
1 : 1418.1468 ( 3765.8 %)
2 : 50.3426 ( 709.5 %)
3 : 30.3798 ( 551.2 %)
4 : 0.3414 ( 58.4 %)
5 : 0.1137 ( 33.7 %)
CPC modes:
1 : 0.867 0.170 0.438 0.004 -0.100
2 : 0.612 0.383 -0.437 0.018 -0.255
3 : -0.460 -1.024 0.269 0.017 0.318
4 : 0.062 -0.178 -0.045 -0.107 -0.044
5 : 0.023 0.238 -0.015 -0.038 0.148
Parameter contribution to CPC-mode variance:
mode number : 1 2 3 4 5
log_omegam : 0.439 0.415 0.034 0.104 0.008
log_omegab : 0.001 0.000 0.456 0.220 0.323
log_sigma8 : 0.550 0.393 0.003 0.049 0.006
log_ns : 0.006 0.189 0.066 0.623 0.116
log_H0 : 0.004 0.004 0.441 0.004 0.547
CPC parameter combinations:
1 : 1418.147 ( 3765.8 %)
+0.79*(\log \Omega_m +1.4) +1.00*(\log \sigma_8 +0.13) = 0 +- 0.029 (post) / 1.1 (prior)
2 : 50.343 ( 709.5 %)
+1.00*(\log \Omega_m +1.4) -1.57*(\log \sigma_8 +0.13) +1.71*(\log n_s +0.0) = 0 +- 0.12 (post) / 0.88 (prior)
3 : 30.380 ( 551.2 %)
+1.00*(\log \Omega_b +2.88) -1.85*(\log H_0 -4.42) = 0 +- 0.19 (post) / 1.1 (prior)
4 : 0.341 ( 58.4 %)
-0.00*(\log \Omega_m +1.4) +0.07*(\log \Omega_b +2.88) +1.00*(\log n_s +0.0) = 0 +- 0.11 (post) / 0.13 (prior)
5 : 0.114 ( 33.7 %)
+0.17*(\log \Omega_b +2.88) -0.50*(\log n_s +0.0) +1.00*(\log H_0 -4.42) = 0 +- 0.2 (post) / 0.21 (prior)
CPC modes parameters correlations:
mode number : 1 2 3 4 5
omegabh2 : 0.120 0.118 -0.088 -0.237 0.863
omegach2 : 0.118 -0.117 0.242 0.363 0.916
theta : -0.246 -0.674 0.707 0.514 0.636
tau : -0.028 0.009 -0.026 -0.141 -0.026
logA : -0.167 -0.180 -0.070 -0.611 -0.580
ns : -0.139 0.191 -0.255 -0.977 -0.273
H0 : -0.162 -0.397 0.421 -0.020 0.947
omegam : 0.606 0.782 -0.659 0.333 -0.060
sigma8 : -0.329 -0.764 0.602 -0.372 0.035
chi2_DES : -0.024 0.224 -0.113 0.109 -0.113
chi2_prior : 0.047 -0.031 0.016 0.000 0.007
theta_BAO_DES : -0.566 -0.910 0.903 0.202 -0.006
omegac : 0.488 0.580 -0.430 0.546 -0.361
omegab : 0.341 0.532 -0.563 -0.355 0.600
S8 : 0.348 -0.470 0.272 -0.327 0.036
log_omegam : 0.613 0.786 -0.665 0.335 -0.042
log_omegab : 0.317 0.496 -0.596 -0.368 0.614
log_sigma8 : -0.334 -0.767 0.603 -0.374 0.045
log_ns : -0.139 0.192 -0.257 -0.978 -0.275
log_H0 : -0.170 -0.413 0.426 -0.014 0.953
The Covariant Principal Components tells us a reasonable approximation for the best constrained parameter modes.
This parameter basis should single out the most constrained parameter space directions over the priors and as such should be less sensitive to volume effects in $\Lambda$CDM.
temp_chain = chain.copy()
p = temp_chain.getParams()
temp_chain.addDerived(p.sigma8*p.omegam**0.75, name='p1', label='p_1 = \\sigma_8 \\Omega_m^{0.75}');
temp_chain.addDerived(p.omegam/p.sigma8, name='p2', label='p_2 = \\Omega_m / \\sigma_8');
temp_chain.addDerived(p.omegab/(p.H0/100)**2, name='p3', label='p_3 = \\Omega_b / h^2');
g = plots.get_subplot_plotter()
g.triangle_plot([temp_chain], params=['p1', 'p2', 'p3'], filled=True)
plt.show()
We can note a couple of interesting things from this plot. These are the modes that are significantly improved over the prior but these are not necessarily detected to be different from zero. Especially the second and third modes are not detected to super-high statistical significance (as mode 1) and hence are closer to log-normal distributed. We have also truncated the decomposition to have better looking parameters and that is responsible for some of their correlation.
Physics of constrained parameters combinations:¶
We next ask what is the physical meaning of these parameters. To do so we perform variations along the KL components and look at the matter power spectrum and other physical observables:
# first import camb
import camb
from camb import model
from camb.sources import GaussianSourceWindow
# then define helper to go from chains to camb parameters:
def camb_helper(params):
# parameters that we use:
omegam, omegab, sigma8, ns, H0 = params
# parse:
h = H0/100.
reference_As = 2.60005144e-09
kwargs = {'ns': ns,
'H0': H0,
'ombh2': omegab*h**2,
'omch2': (omegam-omegab)*h**2,
'mnu': 0.06,
'tau': 0.079,
'redshifts': [0.],
'kmax': 10.0,
'lmax': 2000,
'lens_potential_accuracy': 1}
# run CAMB once to get reference sigma 8:
pars = camb.set_params(As = reference_As, **kwargs)
results = camb.get_results(pars)
reference_sigma_8 = results.get_sigma8_0()
# get the reference parameters:
correct_As = reference_As/reference_sigma_8**2*sigma8**2
pars = camb.set_params(As = correct_As, **kwargs)
# other settings:
#pars.NonLinear = model.NonLinear_none
pars.NonLinear = model.NonLinear_both
#
return pars
# compute mean cosmology:
mean_params = chain.getMeans(pars=[chain.index[name] for name in KL_param_names])
mean_camb_pars = camb_helper(np.exp(mean_params))
mean_results = camb.get_results(mean_camb_pars)
kh, z, pk_ref = mean_results.get_matter_power_spectrum(minkh=1e-4, maxkh=10., npoints = 1000)
# now do parameter variations along the KL modes of the posterior with respect to the prior:
for ind in range(3):
# do a 2 sigma variation to see effects in the plots:
camb_pars_plus = camb_helper(np.exp(mean_params +1.*param_directions[:,ind]))
camb_pars_minus = camb_helper(np.exp(mean_params -1.*param_directions[:,ind]))
# compute camb power spectra:
results_plus = camb.get_results(camb_pars_plus)
results_minus = camb.get_results(camb_pars_minus)
_, _, pk_plus = results_plus.get_matter_power_spectrum(minkh=1e-4, maxkh=10., npoints = 1000)
_, _, pk_minus = results_minus.get_matter_power_spectrum(minkh=1e-4, maxkh=10., npoints = 1000)
# plot:
fig, ax = plt.subplots(1,2, figsize = (12,4))
fig.suptitle('Mode '+str(ind+1)+', improvement over prior '+str(round(np.sqrt(KL_eig[ind]-1),1)))
ax[0].plot(kh, pk_ref[0], color='k')
ax[0].plot(kh, pk_plus[0], color=sns.hls_palette(6)[0])
ax[0].plot(kh, pk_minus[0], color=sns.hls_palette(6)[2])
ax[0].set_yscale('log')
ax[0].set_ylabel('$P(k)$')
ax[1].plot(kh, (pk_plus[0]-pk_ref[0])/pk_ref[0], color=sns.hls_palette(6)[0])
ax[1].plot(kh, (pk_minus[0]-pk_ref[0])/pk_ref[0], color=sns.hls_palette(6)[2])
ax[1].axhline(0., ls='--', color='k')
ax[1].set_ylabel('$\\Delta P(k)/ P(k)$')
for _ax in ax:
_ax.set_xlabel('$k/h Mpc$')
_ax.set_xscale('log')
_ax.set_xlim([np.amin(kh), np.amax(kh)])
plt.show()
plt.close(fig)
Prior removal:¶
We know parameters have projection effects due to prior constrained volume being marginalized over. Here we (linearly) undo that to get rid of prior effects.
There are two ways of getting rid of the prior. The first one is fix totally prior constrained directions. This results in the best-fitting subspaces described in (https://arxiv.org/pdf/1902.01366.pdf).
To do so we take each sample, remove the prior mean, transform it to KL space, then go back to parameter space truncating modes, leaving out the least constrained ones and the add back the prior mean.
idx = [chain.index[name] for name in KL_param_names]
prior_mean = prior_chain.getMeans(pars=[prior_chain.index[name] for name in KL_param_names])
inv_KL_eigv = np.linalg.inv(KL_eigv)
inv_KL_eigv[3:,:]=0
KL_filtered_samples = np.dot(np.dot(chain.samples[:,idx]-prior_mean, KL_eigv), inv_KL_eigv)+prior_mean
KL_filtered_samples = np.exp(KL_filtered_samples)
KL_chain = MCSamples(samples=KL_filtered_samples,
loglikes=chain.loglikes,
weights=chain.weights,
names=param_names,
labels=[ name.label for name in chain.getParamNames().parsWithNames(param_names)],
label='KL filtered DES',
settings=settings)
g = plots.get_subplot_plotter()
g.triangle_plot([chain, KL_chain], params=param_names, filled=True)
bestfit_chain = [par.bestfit_sample for par in chain.getMargeStats().parsWithNames(param_names)]
bestfit_chain_kl = [par.bestfit_sample for par in KL_chain.getMargeStats().parsWithNames(param_names)]
for i in range(len(param_names)):
_ax = g.subplots[i,i]
_ax.axvline(bestfit_chain[i], ls='--', lw=1., color='red')
_ax.axvline(bestfit_chain_kl[i], ls='--', lw=1., color='blue')
for i in range(len(param_names)):
for j in range(len(param_names)):
if i>j:
_ax = g.subplots[i,j]
_ax.axhline(bestfit_chain[i], ls='--', lw=1., color='red')
_ax.axhline(bestfit_chain_kl[i], ls='--', lw=1., color='blue')
_ax.axvline(bestfit_chain[j], ls='--', lw=1., color='red')
_ax.axvline(bestfit_chain_kl[j], ls='--', lw=1., color='blue')
plt.show()
As we can see constraints are tighter since we have fixed prior constrained directions.
The best fit is closer to the peak since the postrior looks increasingly Gaussian as we remove parameters.
Some contours do not change, which is good, some others do because they were influenced by the prior.
This is a-posteriori dimensional reduction, the difference with a-priori dimensional reduction is that this happens around the mean of the posterior and not the prior. Non linearities between the two points (rotations) can alter the local structure of the posterior. Hence a-priori dimensional reduction can introduce biases while doing that a-posteriori does not.
The second way of getting rid of the prior is to send it to infinity. This is far more challenging because it is more subject to non-Gaussianities. To mitigate this we do not undo the prior on data constrained directions but only on prior constrained ones.
idx = [chain.index[name] for name in KL_param_names]
prior_mean = prior_chain.getMeans(pars=[prior_chain.index[name] for name in KL_param_names])
inv_KL_eigv = np.linalg.inv(KL_eigv)
KL_eig_inv = 1./KL_eig
KL_eig_inv[:3] = 0
KL_filtered_samples = np.dot(np.dot(np.diag(1./(1.-KL_eig_inv)), np.dot(KL_eigv.T, (chain.samples[:,idx]).T)).T-np.dot(np.diag(1./(1.-KL_eig_inv)*KL_eig_inv), np.dot(KL_eigv.T, prior_mean)), inv_KL_eigv)
KL_filtered_samples = np.exp(KL_filtered_samples)
KL_chain = MCSamples(samples=KL_filtered_samples,
loglikes=chain.loglikes,
weights=chain.weights,
names=param_names,
labels=[ name.label for name in chain.getParamNames().parsWithNames(param_names)],
label='KL filtered DES',
settings=settings)
g = plots.get_subplot_plotter()
param_limits = {'omegam':[0.,.5], 'omegab':[0.,.5]}
g.triangle_plot([KL_chain, chain], params=param_names, filled=True, param_limits=param_limits)
bestfit_chain = [par.bestfit_sample for par in chain.getMargeStats().parsWithNames(param_names)]
bestfit_chain_kl = [par.bestfit_sample for par in KL_chain.getMargeStats().parsWithNames(param_names)]
for i in range(len(param_names)):
_ax = g.subplots[i,i]
_ax.axvline(bestfit_chain[i], ls='--', lw=1., color='blue')
_ax.axvline(bestfit_chain_kl[i], ls='--', lw=1., color='red')
for i in range(len(param_names)):
for j in range(len(param_names)):
if i>j:
_ax = g.subplots[i,j]
_ax.axhline(bestfit_chain[i], ls='--', lw=1., color='blue')
_ax.axhline(bestfit_chain_kl[i], ls='--', lw=1., color='red')
_ax.axvline(bestfit_chain[j], ls='--', lw=1., color='blue')
_ax.axvline(bestfit_chain_kl[j], ls='--', lw=1., color='red')
plt.show()