Line: 1 to 1  


Line: 1 to 1  

Added:  
> > 
< WE DEFINE HERE THE PATH TO THE GITHUB, USED FOR LINKS LATER
Higgs Combination and Properties: Long Exercise  CMSDAS 2014
IntroductionOverviewThe purpose of this long exercise is to understand how some of the key properties of the Higgs boson are measured at CMS: how the couplings are determined from the combination of the results of the searches in different final states (decay modes, production topologies), and how exotic J^{CP} hypotheses are tested and excluded from the kinematics of the Higgs boson decay products. For this exercise, we will use the data from the real higgs searches in CMS, and analyze them with the standard CMS statistical toolkit that has been developed for this purpose. In order to keep the amount of computing time down to a manageable level, we will use only the data from the most sensitive channels from each analysis (but including also new channels that were not present at the time of the latest CMS combination), and a version of the statistical tools containing the newest optimized code that is not yet in the production release.
Prerequisites
Course Structure Outline
Miscellaneous NotesThe color scheme of the Exercise is as follows:
Intro I: The Higgs at CMSMeasurements of the Higgs properties in CMS exploit the different production and decay modes expected for the SMH boson.
The SMH boson is produced through gluonfusion (ggH, red), vectorboson fusion (VBF, blue), associated vectorboson production (VH, purple  where the Higgs is radiated off an energetic V=W,Z boson), and in associated production with a topquark pair (ttH, green): As can be seen, the latter three modes have very clear signatures that can be targeted experimentally by, e.g., requiring vector boson candidates in the event or top quark candidates. One note should be made about the gluonfusion process: since the SMH boson does not couple to massless particles (like the gluon), the process is loopinduced, with the particles running in the loop being those which couple to the SMH boson and to the gluon. Since the gluon knows about colour charge, the SM particles that can run in the loop are quarks. Since the SMH coupling to quarks is proportional to their mass, the most important contributions to the gluonfusion crosssection come from the toploop and bottomloops, with the contributions form other quarks being negligible. In this way one can immediately see that the gluonfusion process is sensitive to the top and bottom couplings to the SMH, but could also be affected by BSM particles. At the LHC, given the very large gluon luminosities, the SMH boson is mainly produced through gluonfusion and, at much lower rates, the more distinctive processes:
At a mass of m_{H} ~ 125 GeV, the SMH boson decays into almost every possibly mode except into a pair of top quarks:
Here, the decay into a pair of photons is also loopmediated as in the case of gluonfusion production, except that in this case, the photon is sensitive not only to the colourcharge but also to the electriccharge:
While the top and W contributions to the diphoton decay, contributions have also been calculated for other quarks and even the tau lepton. In this way, the diphoton decay branching fraction is sensitive no only to the SMH coupling to those SM particles but as well to BSM particles. In terms of decay rates, although they seem very small for some decays, in practice and in order to observe the ZZ or WW decays one needs to take into account the branching fractions of the Z and W. After that, the picture is a little different:
Finally, while it is somewhat straightforward to identify the decay signatures and experimentally select pure samples of H→γγ or H→ττ, it is much more complicated to distinguish between production modes. In an nutshell, only leptonic decays of Vs in VH production are extremely pure with the present amount of data. Both VBF and tt¨¬„H selections inevitably include some gluonfusion events where there was one or more jets radiated from the gluon or quark lines. With more data, stricter cuts can be applied and purer production tags developed. But for the reasons above we usually refer to "dijet tag" and not "VBF production" when discussing the experimental selections, since such selections will contains some events other than VBF. In CMS, a large amount of production and decay modes are studied exploring different experimental selections targeting the different production modes. The following table summarises the existing and planned analyses:
There are several notes to this table:
But not all ends with the measurements of σÃ¡ªBR. In fact there is a much richer and complex structure in terms of the tensor couplings. This is a more difficult arena where the spin (J) and chargeparity (CP) properties of the boson are involved. The only channel where all information about the boson is available in the detector is the H→ZZ→4e¡° where the leptons are electrons or muons (and not neutrinos). This does not mean that other channels cannot contribute to this picture (like H→WW→e¡°νe¡°ν or H→γγ), but the fact that all information is readily available renders the analysis much more straightforward in the H→ZZ→4e¡° channels. The following is an illustration from http://arxiv.org/pdf/1208.4018.pdf :
Since the spinparity tensor structure is much more complicated than the scalar structure of the couplings, much more data will be needed to perform precise measurements. For the moment the strategy is to perform hypotheses tests between the 0^{+} (SMHlike) hypothesis and other, alternative, hypotheses such as pseudoscalar (0^{}) bosons (or admixtures), spin2 bosons, like the graviton, (2^{+}), or exotic spin1 bosons. As an example, the following figure illustrates the differences in one observable of the ZZ (left), WW (middle), and γγ (right) decays for different J^{CP} hypotheses (0^{+} red circles, 0^{} magenta squares, and two types of spin2 particle in green and blue):
Intro 2: Datacards
Setting up your computing environment
The Higgs statistical codeThe statistical toolkit for the Higgs analyses is provided as a package that is compiled within CMSSW, even if it does not depend on the rest of the CMSSW packages. Tthe code relies critically on the RooFit and RooStats packages, and at the moment it is compatible with the versions packaged in the 6.1.1 and 6.1.2 releases (older releases have bugs, newer releases are backwardsincompatible, not tested and not supported) The code can be retrieved from the github code hosting side. The code for the development version that we'll be using for this CMSDAS school is visible at https://github.com/cmsanalysis/HiggsAnalysisCombinedLimit/tree/CMSDAS2014CERN Log into a SLC5 machine (lxplus5.cern.ch) and do scramv1 project CMSSW CMSSW_6_1_2 cd CMSSW_6_1_2/src cmsenv git clone https://github.com/cmsanalysis/HiggsAnalysisCombinedLimit.git HiggsAnalysis/CombinedLimit/ cd HiggsAnalysis/CombinedLimit/ git checkout CMSDAS2014CERN scramv1 b j 5 after the first installation you can instead just do cd CMSSW_6_1_2/src/HiggsAnalysis/CombinedLimit cmsenv
The datacardsThe datacards contain CMS data, so they cannot be put on github. Normally, they're kept in a protected svn repository at CERN, but for convenience you will be able to get them instead from a tarball on AFS.
cd CMSSW_6_1_2/src/HiggsAnalysis/CombinedLimit mkdir datacards cd datacards tar xvf /afs/cern.ch/user/g/gpetrucc/public/CMSDAS2014CERNcards.tar.gz
A counting experiment: search for invisible higgs decays in the VBF modeA counting experiment is a search where we just count the number of events that pass a selection, and compare that number with the event yields expected from the signal and background.
For this kind of analysis, all the information needed for the statistical interpretation of the results is encoded in a simple text file An example of search that is done this way is the one for a Higgs boson produced in the VBF mode and decaying to invisible particles (something that does not happen in the standard model but can happen in other scenarios like supersymmetry) You can look at the datacard Here is a linebyline commentary of it # Invisible Higgs analysis for mH=125 GeV This line is just a comment, describing the datacard. All lines with "#" are comments, and ignored in the processing. Similarly, lines all of minus characters are ignored
imax 1 number of channels jmax 6 number of backgrounds kmax 9 number of nuisance parameters (sources of systematical uncertainties)
These lines describe the basics of the datacard: the number of channels, physical processes, and systematical uncertainties Only the first two words (e.g.
 # we have just one channel, in which we observe 0 events bin vbf observation 390
These lines describe the number of observed events in each final state. In this case, there's a final state, with label
 bin vbf vbf vbf vbf vbf vbf vbf process qqH zvv wmu wel wtau qcd others process 0 1 2 3 4 5 6 rate 207.589 101.879 67.2274 68.2064 54.4137 36.8432 10.3709 These lines describe the number of expected events from each physical process in each final state
The first two rows of each column identify the final state (always "vbf" in this datacard, since it's the only channel considered) and the physics process. Higgs boson signals in the different production modes have special names that combine can recognise: The third line tells the code if a given process is by default to be interpreted as a signal (process number 0 or negative) or as a background (positive process number). For compatibility with other tools that use the same datacard format, in each final state the process numbers should be different for the different process, and increasing from left to right along the columns. The fourth line contains the expected event yield. In this case, about 200 signal events would be expected (leftmost column) over a background of about 340 events (sum of all the others).
 lumi_8TeV lnN 1.04      1.04 CMS_vbfhinv_qqh_norm lnN 1.12981       CMS_vbfhinv_zvv_stat gmN 12  8.4899      CMS_vbfhinv_zvv_norm lnN  1.25331      CMS_vbfhinv_wmu_norm lnN   1.23748     CMS_vbfhinv_wel_norm lnN    1.29736    CMS_vbfhinv_wtau_norm lnN     1.44471   CMS_vbfhinv_qcd_norm lnN      1.84373  CMS_vbfhinv_mc_norm lnN       1.30341 The rest of the datacard describes the systematical uncertainties, and will be explained more in detail below
Systematical uncertaintiesThe way systematical uncertainties are implemented in the Higgs statistics package is by identifying each independent source of uncertainty, and describing the effect it has on the event yields of the different processes. Each source is identified by a name, and in the statistical model it will be associated with a nuisance parameter. An indidual source of uncertainty can have an effect on multiple processes, also across different channels, and all these effects will be correlated (e.g. the uncertainty on the production cross section for Higgs from gluon fusion will affect the expected event yield for this process in all datacards). The size of the effect doesn't have to be the same, e.g. a 1% uncertainty on the muon identification efficiency will have a 2% effect on H→WW→2μ2ν but a 4% effect on H→ZZ→4l, and anticorrelations are also possible (e.g. an increase in btagging efficiency will simultaneously and coherently increase the signal yield in final states that require tagged bjets and decrease the signal yield in final states that require no tagged bjets). The use of names for each source of uncertainty allow the code to be able to combine multiple datacards recognizing which uncertainties are in common and which are instead separate.
The most common model used for systematical uncertainties is the lognormal, which is identified by the For example, the first two uncertaintes of the VBF H→inv datacard are lognormals bin vbf vbf vbf vbf vbf vbf vbf process qqH zvv wmu wel wtau qcd others process 0 1 2 3 4 5 6 rate 207.589 101.879 67.2274 68.2064 54.4137 36.8432 10.3709  lumi_8TeV lnN 1.04      1.04 CMS_vbfhinv_qqh_norm lnN 1.12981      
The first one, associated to the normalization of the luminosity in the 8 TeV run (
The second one,
You can note that the name of the second uncertainty has a prefix
Another class of systematical uncertainties on the predictions in the signal region are those associated to statistical uncertainties in the control regions used to estimate the backgrounds, or with the limited size of the MC samples used. If the uncertainties are small, Poisson fluctuations become approximately Gaussian, and they be conveniently approximated by lognormal uncertainties with bin vbf vbf vbf vbf vbf vbf vbf process qqH zvv wmu wel wtau qcd others process 0 1 2 3 4 5 6 rate 207.589 101.879 67.2274 68.2064 54.4137 36.8432 10.3709  [...] CMS_vbfhinv_zvv_stat gmN 12  8.4899     
The gamma distribution, in the form we use it, is characterized by two parameters: N, the number of events in the control region (or in MC), and α the weight by which each event is multiplied to determine the expected yield in the signal region (if the weights are not all the same, one can take α to be the average weight). By definition, the prediciton in the signal region is
In the datacards, the uncertainty is encoded writing
Exercise for the reader
What is the total expected background yield, with its statistical and systematical uncertainties? </twistyPlugin twikiMakeVisibleInline> The total yield for the background is 101.879 + 67.2274 + 68.2064 + 54.4137 + 36.8432 + 10.3709 = 338.94 events
The statistical uncertainty is the usual Poisson one, so sqrt(338.94) = 18.4 For the systematical uncertainties, each row is independent, so we can compute the total background uncertainty by adding up the individual contributions in quadrature:
So, keeping only the relevant decimals, the total expected background is 340 Â¡À 18(stat) Â¡À 61 (syst). </twistyPlugin>
A shape analysis using templates: H→ττA shape analysis relies not only on the expected event yields for the signals and backgrounds, but also on the distribution of those events in some discriminating variable. This approach is often convenient with respect to a counting experiment, especially when the background cannot be predicted reliably apriori, since the information from the shape allows a better discrimination between a signallike and a backgroundlike excess, and provides an insitu normalization of the background. The simpler case of shape analysis is the binned one: the observed distribution of the events in data and the expectations from the signal and all backgrounds are provided as histograms, all with the same binning. Mathematically, this is equivalent to just making N counting experiments, one for each bin of the histogram; however, in practice it's much more convenient to provide the the predictions as histograms directly.
For this purpose we will look at the H→ττ search in the τ_{μ}τ_{h} final state with a VBF dijet topology. The datacard is max 1 number of categories jmax 9 number of samples minus one kmax * number of nuisance parameters  shapes * * htt_mt.input_8TeV.root $CHANNEL/$PROCESS $CHANNEL/$PROCESS_$SYSTEMATIC shapes ggH * htt_mt.input_8TeV.root $CHANNEL/$PROCESS$MASS $CHANNEL/$PROCESS$MASS_$SYSTEMATIC shapes qqH * htt_mt.input_8TeV.root $CHANNEL/$PROCESS$MASS $CHANNEL/$PROCESS$MASS_$SYSTEMATIC shapes VH * htt_mt.input_8TeV.root $CHANNEL/$PROCESS$MASS $CHANNEL/$PROCESS$MASS_$SYSTEMATIC  bin muTau_vbf observation 199  bin muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf process 2 1 0 1 2 3 4 5 6 7 process ggH qqH VH ZTT QCD W ZL ZJ TT VV rate 2.36529 7.04351 0.028046 94.286 39.2563 46.6949 0.828452 3.72816 4.29862 1.48636  lumi_8TeV lnN 1.042 1.042 1.042        pdf_qqbar lnN  1.03 1.02        pdf_gg lnN 1.08          UEPS lnN 1.04 1.04 1.04        QCDscale_ggH2in lnN 1.3          QCDscale_qqH lnN  1.04         QCDscale_VH lnN   1.04        CMS_eff_m lnN 1.02 1.02 1.02 1.020   1.020 1.020 1.020 1.020 CMS_eff_t_mutau_8TeV lnN 1.08 1.08 1.08 1.080     1.080 1.080 CMS_eff_t_mutau_high_8TeV lnN 1.015 1.015 1.015 1.015     1.015 1.015 CMS_scale_t_mutau_8TeV shape 1 1 1 1.000       CMS_scale_j_8TeV lnN 1.1 1.05 1.15      1.050 1.100 CMS_htt_scale_met_8TeV lnN 1.05 1.05 1.05    1.050 1.050 1.070 1.070 CMS_htt_zttNorm_8TeV lnN    1.030   1.030 1.030   CMS_htt_ttbarNorm_8TeV lnN         1.080  CMS_htt_ttbarNorm_mutau_vbf_8TeV lnN         1.330  CMS_htt_DiBosonNorm_8TeV lnN          1.150 CMS_htt_DiBosonNorm_mutau_vbf_8TeV lnN          1.280 CMS_htt_WNorm_mutau_vbf_8TeV lnN      1.124     CMS_htt_extrap_ztt_mutau_vbf_8TeV lnN    1.100       CMS_htt_QCDSyst_mutau_vbf_8TeV lnN     1.190      CMS_htt_ZLeptonFakeTau_mutau_8TeV lnN       1.300    CMS_htt_ZJetFakeTau_mutau_vbf_8TeV lnN        1.400   Compared to the H→inv datacard, you can see that the format is pretty similar, except for two aspects:
shapes lines are used to associate to each channel and physical process the histogram corresponding to the expected distribution of the events, and similarly to associate to the observed data the observed distribution. The format is:
*shapes* _process_ _channel_ _file_ _histogramname_ _histogramnameforsystematics_
The first two columns after
The third and fourth column are used to point to the shape. The patterns
Let's now open the rootfile associated with this datacard in root, and look at the histogram associated to the QCD background. There are no specific shapes * * htt_mt.input_8TeV.root $CHANNEL/$PROCESS $CHANNEL/$PROCESS_$SYSTEMATIC
and for us %CODE{"cpp"}% /// root l root l htt_mt.input_8TeV.root /// Attaching file htt_mt.input_8TeV.root as _file0... _file0>cd("muTau_vbf"); QCD>Draw("HIST"); %ENDCODE% Note that we're not drawing the error bars on the histograms, since combine is not using them so it would be misleading to show them. We can also check the normalization of the histogram, to see that it matches the expected yield in the datacard as it should. %CODE{"cpp"}% cout << QCD>Integral() << endl; /// > 39.2563 %ENDCODE% Note that underflow and overflow bins are not used by combine.
Similarly we can look at how the 199 events observed in data are distributed: the first rule again will give %CODE{"cpp"}% cout << data_obs>Integral() << endl; /// > 199 data_obs>Draw(); %ENDCODE%
We can also at how the signal looks like for different Higgs boson mass hypotheses; we will use the useful %CODE{"cpp"}% qqH110>Draw("HIST"); for (int m = 110; m <= 140; m += 10) { TH1F h = (TH1F) gDirectory>Get(Form("qqH%d",m)); h>SetLineColor(206+4*(m110)/10); h>SetLineWidth(2); h>Draw("HIST SAME"); } %ENDCODE% and in a similar way we can see how do the mean and rms of the signal vary with the mass %CODE{"cpp"}% for (int m = 110; m <= 140; m += 5) { TH1F h = (TH1F) gDirectory>Get(Form("qqH%d",m)); printf("for mH=%3d, signal mean = %5.1f, rms = %5.1f\n", m, h>GetMean(), h>GetRMS()); } %ENDCODE% for mH=110, signal mean = 100.5, rms = 15.1 for mH=115, signal mean = 104.6, rms = 16.8 for mH=120, signal mean = 108.8, rms = 17.5 for mH=125, signal mean = 113.2, rms = 17.6 for mH=130, signal mean = 117.3, rms = 18.0 for mH=135, signal mean = 121.3, rms = 19.1 for mH=140, signal mean = 125.5, rms = 21.1
Shape uncertainties using templates
Still using the shapes * * htt_mt.input_8TeV.root $CHANNEL/$PROCESS $CHANNEL/$PROCESS_$SYSTEMATIC [...]  bin muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf muTau_vbf process 2 1 0 1 2 3 4 5 6 7 process ggH qqH VH ZTT QCD W ZL ZJ TT VV rate 2.36529 7.04351 0.028046 94.286 39.2563 46.6949 0.828452 3.72816 4.29862 1.48636  [...] CMS_scale_t_mutau_8TeV shape 1 1 1 1.000      
This tells us that there is a shape uncertainty called For shape analyses using templates, the effect of a shapeaffecting systematical uncertainty is encoded in the datacard by providing two additional templates corresponding to Â¡ÀNσ shifts of the corresponding parameter. For each process affected by this uncertainty, the column in the datacard will contain a value 1/N. In this case, as often, Â¡À1σ shapes have been used, and so the columns of the datacard contain a value of 1.
The associated shapes are determined using the last column of the %CODE{"cpp"}% /// root l htt_mt.input_8TeV.root /// Attaching file htt_mt.input_8TeV.root as _file0... _file0>cd("muTau_vbf"); TH1F nominal = (TH1F) gDirectory>Get("ZTT"); TH1F tau_up = (TH1F) gDirectory>Get("ZTT_CMS_scale_t_mutau_8TeVUp"); TH1F tau_dn = (TH1F) gDirectory>Get("ZTT_CMS_scale_t_mutau_8TeVDown"); printf("Nominal shape: mean = %6.2f (norm: %7.3f events)\n", nominal>GetMean(), nominal>Integral()); // > Nominal shape: mean = 88.03 (norm: 94.286 events) printf("Up shape: mean = %6.2f (norm: %7.3f events)\n", tau_up>GetMean(), tau_up>Integral()); // > Up shape: mean = 88.86 (norm: 97.172 events) printf("Down shape: mean = %6.2f (norm: %7.3f events)\n", tau_dn>GetMean(), tau_dn>Integral()); // > Down shape: mean = 87.20 (norm: 91.995 events) nominal>SetLineWidth(2); nominal>SetLineColor(kBlack); tau_up>SetLineWidth(2); tau_up>SetLineColor(kBlue); tau_dn>SetLineWidth(2); tau_dn>SetLineColor(kRed); nominal>Draw("HIST"); tau_up>Draw("HIST SAME"); tau_dn>Draw("HIST SAME"); nominal>GetYaxis()>SetRangeUser(0,45); %ENDCODE%
A parametric shape analysis: H→γγIn some cases, it can be convenient to describe the expected signal and background shapes in terms of analytical functions rather than templates; a typical example are the searches where the signal is apparent as a narrow peak over a smooth continuum background. In this context, uncertainties affecting the shapes of the signal and backgrounds can be implemented naturally as uncertainties on the parameters of those analytical functions. It is also possible to adapt an agnostic approach in which the parameters of the background model are left freely floating in the fit to the data, i.e. only requiring the background to be well described by a smooth function. Technically, this is implemented by means of the RooFit package, that allows writing generic probability density functions, and saving them into ROOT files. The pdfs can be either taken from RooFit's standard library of functions (e.g. Gaussians, polynomials, ...) or handcoded in C++, and combined together to form even more complex shapes. A prototypical case for this kind of analysis is H→γγ analysis. For this excercise, we will use a datacard that contains only the 8 TeV data for four event categories: cat0 and cat1 are categories of untagged events containing good quality diphotons, the former of the two with a higer purity but lower event yield obtained preferentially selecting high p_{T} diphotons; cat4 and cat5 are categories of dijet events with different levels of tightness (and so also different level of signal contamination from gluon fusion). The datacard is the following: imax 4 number of bins jmax 5 number of processes minus 1 kmax * number of nuisance parameters  shapes WH cat0 hgg.inputsig_8TeV_MVA.root wsig_8TeV:hggpdfrel_wh_cat0 shapes ZH cat0 hgg.inputsig_8TeV_MVA.root wsig_8TeV:hggpdfrel_zh_cat0 shapes bkg_mass cat0 hgg.inputbkgdata_8TeV_MVA.root cms_hgg_workspace:pdf_data_pol_model_8TeV_cat0 shapes data_obs cat0 hgg.inputbkgdata_8TeV_MVA.root cms_hgg_workspace:roohist_data_mass_cat0 shapes ggH cat0 hgg.inputsig_8TeV_MVA.root wsig_8TeV:hggpdfrel_ggh_cat0 shapes qqH cat0 hgg.inputsig_8TeV_MVA.root wsig_8TeV:hggpdfrel_vbf_cat0 shapes ttH cat0 hgg.inputsig_8TeV_MVA.root wsig_8TeV:hggpdfrel_tth_cat0 [... same as above for cat1, cat4, cat5 ...]  bin cat0 cat1 cat4 cat5 observation 1.0 1.0 1.0 1.0  bin cat0 cat0 cat0 cat0 cat0 cat0 cat1 cat1 cat1 cat1 cat1 cat1 cat4 cat4 cat4 cat4 cat4 cat4 cat5 cat5 cat5 cat5 cat5 cat5 process ZH qqH WH ttH ggH bkg_mass ZH qqH WH ttH ggH bkg_mass ZH qqH WH ttH ggH bkg_mass ZH qqH WH ttH ggH bkg_mass process 4 3 2 1 0 1 4 3 2 1 0 1 4 3 2 1 0 1 4 3 2 1 0 1 rate 6867.0000 19620.0000 12753.0000 19620.0000 19620.0000 1.0000 7259.4000 19620.0000 12360.6000 19620.0000 19620.0000 1.0000 7063.2000 19620.0000 12556.8000 19620.0000 19620.0000 1.0000 3924.0000 19620.0000 15696.0000 19620.0000 19620.0000 1.0000  CMS_eff_j lnN 0.999125 0.964688 0.999125 0.998262 0.996483  0.999616 0.980982 0.999616 0.99934 0.999012  1.02 1.02 1.02 1.02 1.02  1.02 1.02 1.02 1.02 1.02  CMS_hgg_JECmigration lnN             0.853986 0.995971 0.853986 0.846677 0.927283  1.025 1.005 1.025 1.025 1.025  CMS_hgg_UEPSmigration lnN             0.737174 0.991941 0.737174 0.724019 0.86911  1.045 1.01 1.045 1.045 1.045  CMS_hgg_eff_MET lnN                         CMS_hgg_eff_e lnN                         CMS_hgg_eff_m lnN                         CMS_hgg_eff_trig lnN 1.01 1.01 1.01 1.01 1.01  1.01 1.01 1.01 1.01 1.01  1.01 1.01 1.01 1.01 1.01  1.01 1.01 1.01 1.01 1.01  CMS_hgg_n_id lnN 1.034/0.958 1.039/0.949 1.034/0.958 1.053/0.915 1.035/0.958  1.042/0.936 1.038/0.948 1.042/0.936 1.053/0.909 1.038/0.954  1.016/0.972 1.022/0.963 1.016/0.972 1.017/0.967 1.019/0.964  1.024/0.959 1.030/0.948 1.024/0.959 1.014/0.975 1.023/0.962  CMS_hgg_n_pdf_1 lnN  0.998/0.996   1.002/0.998   0.992/0.999   1.001/1.000   0.999/0.998   1.003/0.999   0.996/1.000   1.002/0.999  CMS_hgg_n_pdf_10 lnN  1.028/1.004   0.998/0.998   1.051/0.997   1.004/0.999   1.020/1.000   1.000/0.998   1.010/0.999   0.999/1.000  [... and more rows like this ...] CMS_hgg_n_sc_gf lnN     0.842/1.123      0.976/1.031      0.858/1.095      0.880/1.083  CMS_hgg_n_sc_vbf lnN  0.993/1.010      0.994/0.994      0.997/1.001      0.999/0.996     CMS_hgg_n_sigmae lnN 0.950/1.099 0.943/1.114 0.950/1.099 0.956/1.089 0.944/1.112  0.954/1.093 0.948/1.104 0.954/1.093 0.971/1.057 0.919/1.165  0.996/1.006 0.992/1.016 0.996/1.006 0.995/1.010 0.994/1.012  0.991/1.019 0.985/1.031 0.991/1.019 0.995/1.010 0.988/1.026  CMS_id_eff_eb lnN 1.01999 1.020047 1.01999 1.02002 1.020022  1.018535 1.019332 1.018535 1.018642 1.019654  1.017762 1.017952 1.017762 1.018824 1.01812  1.016723 1.016872 1.016723 1.01884 1.017172  CMS_id_eff_ee lnN 1.000284 1.000137 1.000284 1.000206 1.0002  1.004035 1.001979 1.004035 1.003759 1.001148  1.006058 1.005556 1.006058 1.003308 1.005127  1.008752 1.008351 1.008752 1.003254 1.007586  JEC lnN 0.995187 0.938204 0.995187 0.990442 0.980656  0.997891 0.966719 0.997891 0.996368 0.994567  1.11 1.035 1.11 1.11 1.11  1.11 1.035 1.11 1.11 1.11  QCDscale_VH lnN 0.982/1.021  0.982/1.021    0.982/1.021  0.982/1.021    0.982/1.021  0.982/1.021    0.982/1.021  0.982/1.021    QCDscale_ggH lnN     0.918/1.076      0.918/1.076      0.918/1.076      0.918/1.076  QCDscale_qqH lnN  0.992/1.003      0.992/1.003      0.992/1.003      0.992/1.003     QCDscale_ttH lnN    0.906/1.041      0.906/1.041      0.906/1.041      0.906/1.041   UEPS lnN 0.988624 0.858751 0.988624 0.977408 0.954278  0.995014 0.923929 0.995014 0.991416 0.987158  1.26 1.08 1.26 1.26 1.26  1.26 1.08 1.26 1.26 1.26  lumi_8TeV lnN 1.044 1.044 1.044 1.044 1.044  1.044 1.044 1.044 1.044 1.044  1.044 1.044 1.044 1.044 1.044  1.044 1.044 1.044 1.044 1.044  pdf_gg lnN    0.920/1.080 0.930/1.076     0.920/1.080 0.930/1.076     0.920/1.080 0.930/1.076     0.920/1.080 0.930/1.076  pdf_qqbar lnN 0.958/1.042 0.972/1.026 0.958/1.042    0.958/1.042 0.972/1.026 0.958/1.042    0.958/1.042 0.972/1.026 0.958/1.042    0.958/1.042 0.972/1.026 0.958/1.042    vtxEff lnN 0.991/1.025 0.989/1.030 0.991/1.025 0.993/1.020 0.989/1.030  0.990/1.026 0.990/1.027 0.990/1.026 0.994/1.016 0.984/1.042  1.000/0.999 0.997/1.007 1.000/0.999 1.000/1.003 0.998/1.006  0.999/1.004 0.998/1.006 0.999/1.004 1.000/1.000 0.997/1.007  CMS_hgg_nuissancedeltamcat4 param 0.0 0.001458 CMS_hgg_nuissancedeltafracright_8TeV param 1.0 0.002000 CMS_hgg_nuissancedeltamcat1 param 0.0 0.001470 CMS_hgg_nuissancedeltamcat0 param 0.0 0.001530 CMS_hgg_nuissancedeltasmearcat4 param 0.0 0.001122 CMS_hgg_nuissancedeltasmearcat1 param 0.0 0.001167 CMS_hgg_nuissancedeltasmearcat0 param 0.0 0.001230 CMS_hgg_globalscale param 0.0 0.004717
The first difference compared to the template datacard is in the shapes ggH cat0 hgg.inputsig_8TeV_MVA.root wsig_8TeV:hggpdfrel_ggh_cat0 shapes data_obs cat0 hgg.inputbkgdata_8TeV_MVA.root cms_hgg_workspace:roohist_data_mass_cat0
In the datacard using templates, the column after the file name would have been the name of the histogram. Here, instead, we found two names, separated by a colon ( Let's inspect this workspace, starting with the data %CODE{"cpp"}% TFile *fDat = TFile::Open("hgg.inputbkgdata_8TeV_MVA.root"); RooAbsData *data = cms_hgg_workspace>data("roohist_data_mass_cat0"); data>Print("") // > RooDataHist::roohist_data_mass_cat0[CMS_hgg_mass] = 160 bins (1449 weights) // so, we have a binned dataset, whose variable is called CMS_hgg_mass: RooRealVar *mass = cms_hgg_workspace>var("CMS_hgg_mass"); mass>Print(""); // RooRealVar::CMS_hgg_mass = 140 L(100  180) // we can make a plot of the dataset with the following RooPlot *plot = mass>frame(); data>plotOn(plot); plot>Draw(); %ENDCODE%
Now let's look also at the signal %CODE{"cpp"}% TFile *sig = TFile::Open("hgg.inputsig_8TeV_MVA.root"); RooAbsPdf *ggH = wsig_8TeV>pdf("hggpdfrel_ggh_cat0"); ggH>Print(""); // > RooAddPdf::hggpdfrel_ggh_cat0[ hist_func_frac_g0_ggh_cat0 * hgg_gaus_g0_ggh_cat0 + hist_func_frac_g1_ggh_cat0 * hgg_gaus_g1_ggh_cat0 + const_func_frac_g2_ggh_cat0 * hgg_gaus_g2_ggh_cat0 ] = 0.604097 // this appears to be a linear combination of multiple gaussian pdfs (hgg_gaus_g0_ggh_cat0, hgg_gaus_g1_ggh_cat0, ...) with coefficents hist_func_frac_g0_ggh_cat0, hist_func_frac_g1_ggh_cat0 // let's get the list of the parameters that describe it RooArgSet *params = ggH>getParameters(*data); params>Print(""); // > RooArgSet::parameters = (CMS_hgg_globalscale,CMS_hgg_nuissancedeltamcat0,CMS_hgg_nuissancedeltasmearcat0,MH) // MH is a special parameter, which combine and text2workspace set to the Higgs mass hypothesis. // now since we're just looking into the input workspace, we can set it by hand wsig_8TeV>var("MH")>setVal(125.7); // Now we can make a plot of the pdf, and show also the contributions from the different gaussians from which it is composed RooPlot *plot = wsig_8TeV>var("CMS_hgg_mass")>frame(); ggH>plotOn(plot); ggH>plotOn(plot, RooFit::Components("hgg_gaus_g0_ggh_cat0"), RooFit::LineColor(kRed)); ggH>plotOn(plot, RooFit::Components("hgg_gaus_g1_ggh_cat0"), RooFit::LineColor(209)); ggH>plotOn(plot, RooFit::Components("hgg_gaus_g2_ggh_cat0"), RooFit::LineColor(222)); plot>Draw(); %ENDCODE%
Parametric signal normalization
There is also another feature of the H→γγ datacard that should catch your eye quicky: the event yields are remarkably strange: the the signal yield is
Let's start with the simple case: an event yield of
Now, the signal and background yields: 19k signal events from each production mode with just one background event would be really nice to have, but of course it can't be true. The way the H→γγ works is by relying on an additional feature of text2workspace: in addition to providing generic shapes for the signals and backgrounds, it is also possible to provide generic functions that describe the expected signal and background yields. This additional functions are multiplied by the number in the
At present, this feature is not indicated by any special line the datacard: simply, whenever a RooAbsPdf is loaded, text2workspace will search also for a RooAbsReal object with the same name but a Armed with this new piece of knowledge, we can now determine what is the expected signal yield for ggH in category 0: %CODE{"cpp"}% TFile *sig = TFile::Open("hgg.inputsig_8TeV_MVA.root"); wsig_8TeV>var("MH")>setVal(125.7); RooAbsPdf *ggH = wsig_8TeV>pdf("hggpdfrel_ggh_cat0"); RooAbsReal *ggH_norm = wsig_8TeV>function("hggpdfrel_ggh_cat0_norm"); cout << ggH_norm>getVal()*19620.0000 << endl; // > 12.3902 %ENDCODE%
This approach can also be used to make a background freely floating, by associating to them as normalization term a floating RooRealVar. However, this can be achieved in a more transparent way by putting instead a normalization uncertainty on that background using a flat pdf: e.g. to leave the background floating between 50% and 200% of its input prediction, a
Shape uncertainties using parametersThe part of the H→γγ datacard related to the systematics starts with many lines of lognormals that should already be familiar, except possibly for the notation with two numbers separated by a slash (e.g. 0.950/1.099). This notation is used for asymmetrical uncertainties: a lognormal with 0.950/1.099 means that at 1σ the yield is scaled down by a factor 0.95, while at +1σ the yield is scaled up by a factor 1.099. The last part of the datacard contains some lines that use a different syntax, e.g. CMS_hgg_globalscale param 0.0 0.004717
These lines encode uncertainties on the parameters of the signal and background pdfs. The example line quoted here informs text2workspace that the parameter The effect can be visualized from RooFit, e.g. %CODE{"cpp"}% TFile *sig = TFile::Open("hgg.inputsig_8TeV_MVA.root"); wsig_8TeV>var("MH")>setVal(125.7); RooAbsPdf *ggH = wsig_8TeV>pdf("hggpdfrel_ggh_cat0"); // prepare the canvas RooPlot *plot = wsig_8TeV>var("CMS_hgg_mass")>frame(); // plot nominal pdf ggH>plotOn(plot, RooFit::LineColor(kBlack)); // plot minus 3 sigma pdf wsig_8TeV>var("CMS_hgg_globalscale")>setVal(3*0.004717); ggH>plotOn(plot, RooFit::LineColor(kBlue)); // plot plus 3 sigma pdf wsig_8TeV>var("CMS_hgg_globalscale")>setVal(+3*0.004717); ggH>plotOn(plot, RooFit::LineColor(kRed)); plot>Draw(); %ENDCODE%
Exercises for the reader
</twistyPlugin twikiMakeVisibleInline> %CODE{"cpp"}% TFile *fDat = TFile::Open("hgg.inputbkgdata_8TeV_MVA.root"); cout << cms_hgg_workspace>data("roohist_data_mass_cat0")>sumEntries() << endl; // .... and so on %ENDCODE% The result is:
</twistyPlugin>
</twistyPlugin twikiMakeVisibleInline> %CODE{"cpp"}% TFile *fSig = TFile::Open("hgg.inputsig_8TeV_MVA.root"); wsig_8TeV>var("MH")>setVal(125.7); cout << wsig_8TeV>function("hggpdfrel_ggh_cat0_norm")>getVal()*19620.0000 << endl; cout << wsig_8TeV>function("hggpdfrel_vbf_cat0_norm")>getVal()*19620.0000 + wsig_8TeV>function("hggpdfrel_wh_cat0_norm")>getVal()*12753.0000 + wsig_8TeV>function("hggpdfrel_zh_cat0_norm")>getVal()*6867.0000 << endl; // .... and so on %ENDCODE%
So, the category with highest purity of VBF+VH signal is cat4 (however, this purity is achieved at the cost of a lower overall signal efficiency)
</twistyPlugin>
Unbinned, multidimensional analysis: H→ZZ
The datacard is identical in form to that of H→γγ, i.e. it provides shapes as RooAbsPdf from a RooWorkspace, and implements shape uncertainties through parameters (for some a fixed range is given to the parameter, e.g. imax 1 jmax 7 kmax *  shapes * * hzz4l_4muS_8TeV_0.input.root w:$PROCESS  bin a1_0 observation 13  ## mass window [105.7,140.7] bin a1_0 a1_0 a1_0 a1_0 a1_0 a1_0 a1_0 a1_0 process ggH qqH WH ZH ttH bkg2d_qqzz bkg2d_ggzz bkg2d_zjets process 4 3 2 1 0 1 2 3 rate 1.0000 1.0000 1.0000 1.0000 1.0000 7.2035 0.1360 0.7119  lumi_8TeV lnN 1.044 1.044 1.044 1.044 1.044 1.044 1.044  pdf_gg lnN 1.0720    1.0780  1.0710  pdf_qqbar lnN  1.0270 1.0350 1.0349  1.0342   pdf_hzz4l_accept lnN 1.02 1.02 1.02 1.02 1.02    QCDscale_ggH lnN 1.0750        QCDscale_qqH lnN  1.0020       QCDscale_VH lnN   1.0040 1.0155     QCDscale_ttH lnN     1.0655    QCDscale_ggVV lnN       1.2435  QCDscale_VV lnN      1.0285   BRhiggs_hzz4l lnN 1.02 1.02 1.02 1.02 1.02    CMS_eff_m lnN 1.043 1.043 1.043 1.043 1.043 1.043 1.043  CMS_hzz4mu_Zjets lnN        0.6/1.4 CMS_zz4l_bkgMELA param 0 1 [3,3] QCDscale_ggH2in lnN         QCDscale_qqH lnN         CMS_zz4l_ggH_PToverM_sys param 0 1 [3,3] CMS_zz4l_qqH_PToverM_sys param 0 1 [3,3] CMS_zz4l_ttH_PToverM_sys param 0 1 [3,3] CMS_zz4l_WH_PToverM_sys param 0 1 [3,3] CMS_zz4l_ZH_PToverM_sys param 0 1 [3,3] CMS_zz4l_qqZZ_PToverM_sys param 0 1 [3,3] CMS_zz4l_ggZZ_PToverM_sys param 0 1 [3,3] CMS_zz4l_ZX_PToverM_sys param 0 1 [3,3] CMS_zz4l_mean_m_sig param 0.0 1.0 ## CMS_zz4l_mean_m_sig = 0.001 CMS_zz4l_sigma_m_sig param 0.0 0.2 CMS_zz4l_n_sig_1_8 param 0.0 0.01 interf_ggH param 0 1 [1,1] The interesting difference however is the observed dataset: %CODE{"cpp"}% gSystem>Load("libHiggsAnalysisCombinedLimit.so") gFile = TFile::Open("hzz4l_4muS_8TeV_0.input.root"); RooAbsData *data = w>data("data_obs"); data>Print("") // > RooDataSet::data_obs[CMS_zz4l_mass,melaLD,CMS_zz4l_PToverM] = 13 entries %ENDCODE% In the Hγγ the dataset was a RooDataHist with one entry per bin, each entry having a weight equal to the number of events in that bin. Here instead it's a RooDataSet, with one entry per event. Let's explore its contents, both building some binned 1D projections and by making a 2D scatter plot. %CODE{"cpp"}% w>var("CMS_zz4l_mass")>Print(""); // > RooRealVar::CMS_zz4l_mass = 114.157 L(104  142) B(200) RooPlot *mass = w>var("CMS_zz4l_mass")>frame(RooFit::Bins(19)); data>plotOn(mass); mass>Draw(); RooPlot *mela = w>var("melaLD")>frame(RooFit::Bins(10)); data>plotOn(mass); mass>Draw(); TGraph points(data>numEntries()); for (int i = 0; i < data>numEntries(); ++i) { const RooArgSet *ev = data>get(i); points>SetPoint(i, ev>getRealValue("CMS_zz4l_mass"), ev>getRealValue("melaLD")); } points>SetMarkerStyle(20); points>GetXaxis()>SetTitle("m_{4l} (GeV)") points>GetYaxis()>SetTitle("K_{D}") .x tdrstyle.cc points>Draw("AP"); %ENDCODE%
Combining datacardsIn order to analyze the outcomes of multiple searches together, it is necessary to combine their datacards into a single one.
This is done by the combineCards.py Name1=card1.txt Name2=card2.txt .... > card.txt If the input datacards had just one bin each, then the output channels will be called Name1, Name2, and so on. Otherwise, a prefix Name1_ ... Name2_ will be added to the bin labels in each datacard. The supplied bin names Name1, Name2, etc. must themselves conform to valid C++/python identifier syntax.
The package containing the datacards also has a simple script
For example, for the H→WW search, we will combine the 0jets oppositeflavour channel and the dijet oppositeflavour channels, which have respectively the best sensitivity to gluon fusion and vbf higgs production. The combineCards.py hww0j_8TeV=hwwof_0j_shape_8TeV.txt hww2j_8TeV=hwwof_2j_shape_8TeV.txt > comb_hww.txt The combined datacard looks like the following Combination of hww0j_8TeV=hwwof_0j_shape_8TeV.txt hww2j_8TeV=hwwof_2j_shape_8TeV.txt imax 2 number of bins jmax 23 number of processes minus 1 kmax 89 number of nuisance parameters  shapes * hww0j_8TeV hwwof_0j.input_8TeV.root histo_$PROCESS histo_$PROCESS_$SYSTEMATIC shapes data_obs hww0j_8TeV hwwof_0j.input_8TeV.root histo_Data shapes * hww2j_8TeV hww19.47fb.mH125.of_2j_shape_mll.root histo_$PROCESS histo_$PROCESS_$SYSTEMATIC shapes data_obs hww2j_8TeV hww19.47fb.mH125.of_2j_shape_mll.root histo_Data  bin hww0j_8TeV hww2j_8TeV observation 5729.0 24.0  bin hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww0j_8TeV hww2j_8TeV hww2j_8TeV hww2j_8TeV hww2j_8TeV hww2j_8TeV hww2j_8TeV hww2j_8TeV h process ZH qqH WH ggH qqWW WjetsE Wgamma Top WH_SM WjetsM ggWW ZH_SM VV Wg3l qqH_SM Ztt Zjets ggH_SM qqH ggH WW WWewk Vg WJet Top g process 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 2 0 15 16 17 18 4 7 rate 1.7481 3.0668 5.8955 237.1930 3969.6300 282.7930 115.5780 498.7010 0.0000 331.7970 210.6270 0.0000 132.6090 167.8020 0.0000 45.9960 0.0000 0.0000 5.0290 1.4400 4.0230 1.9080 0.0389 2.4540 14.0300 0  [...] CMS_8TeV_hww_Top_2j_of_2j_extr lnN                         1.188  [...] CMS_hwwof_0j_MVAWHStatBounding_8TeV shape   1.0                        [....] lumi_8TeV lnN 1.044 1.044 1.044 1.044 1.0  1.044  1.044  1.0 1.044 1.044 1.044 1.044 1.044  1.044 1.044 1.044 1.044 1.044 1.044   1
You can see that the
In the last command you see that an option
Higgs CouplingsThe basics
From Datacards to LikelihoodsThe datacards from each search contain the information about the observed events and the expected yields for the various physical processes, backgrounds and signals. For a SM Higgs of fixed mass, there is no free parameter in the theory, and so nothing to fit. To search for deviations from the SM Higgs predictions, we define a model that can predict different expected yields for the signal, and fit the parameters of this model to the data. We will start with the simplest possible model, with only one parameter μ: the predictions for the signal in a given final state in this model will be μ times the yield expected for a SM Higgs (this parameter can be interpreted as a scale factor for all the production cross sections of the signal, so we sometimes write μ = σ/σ_{SM})
In the case of a simple counting experiment with no systematical uncertainties, we can write the likelihood function of the observed data as function of the parameter of interest in terms of the observed events ( N), expected signal events for a SM Higgs ( S), and expected background events ( B)
If there are systematical uncertainties, they are modelled as additional nuisance parameters θ that affect the expected signal and background, and the likelihood contains some additional constraint term for θ that models our apriori knowledge of the parameter (e.g. in the case of the normalization of the integrated luminosity for the VBF H→inv datacard given before, the signal and background yields will be proportional to the luminosity and the constranint will encode the fact we knew the luminosity with a relative 4% uncertainty). The likelihood can be written now as
When we include multiple channels, or when we do a binned shape analysis, the likelihood becomes the product over the channels or bins
For unbinned analyses, there is an additional term that includes a product on all the events of the channel involving the expected probability density functions for the distribution of the signal and background events (f_{S} and f_{B}) weighted with the expected events for each
Likelihoods in combine and RooStatsThe first step of the statistical analysis is to convert the text datacard into a likelihood function, in the format used by RooStats. For some simple cases, combine does this automatically when given a text file, but it is possible to do it also manually beforehand  and it's necessary to do it manually whenever we want to use a different model than the one with just μ.
The conversion from datacard to likelihood is done with the text2workspace.py datacard.txt m mass [ P physicsModel [ PO physicsModelOptions .... ] ] [other options] o output.root The physicsModel is the reference to the python class that describes what are the parameters of the model (e.g. μ) and how those parameters affect the yields of the signal and background (e.g. that the signal is multiplied by μ while the background not)
The most basic Higgs model, defining a single parameter of interest μ is below. As python and roofit don't like greek letters, μ is called
def doParametersOfInterest(self): """Create POI and other parameters, and define the POI set.""" #  Signal Strength as only POI  self.modelBuilder.doVar("r[1,0,20]"); self.modelBuilder.doSet("POI","r") #  Higgs Mass as other parameter  if self.options.mass = 0: if self.modelBuilder.out.var("MH"): self.modelBuilder.out.var("MH").removeRange() self.modelBuilder.out.var("MH").setVal(self.options.mass) else: self.modelBuilder.doVar("MH[%g]" % self.options.mass); def getHiggsSignalYieldScale(self,production,decay, energy): return "r" %ENDCODE%
This class Let's use this model to convert the H→WW + 2 jets datacard into a likelihood text2workspace.py hwwof_2j_shape_8TeV.txt m 125.7 P HiggsAnalysis.CombinedLimit.PhysicsModel:strictSMLikeHiggs o hww2j.root the beginning of the datacard was  bin of_2j observation 24 shapes * * hww19.47fb.mH125.of_2j_shape_mll.root histo_$PROCESS histo_$PROCESS_$SYSTEMATIC shapes data_obs * hww19.47fb.mH125.of_2j_shape_mll.root histo_Data  bin of_2j of_2j of_2j of_2j of_2j of_2j of_2j of_2j of_2j of_2j of_2j process ggH qqH ggWW VgS Vg WJet Top WW WWewk VV DYTT process 0 1 1 2 3 4 5 6 7 8 9 rate 1.44 5.029 0.4295 0.0195 0.0389 2.454 14.03 4.023 1.908 0.4833 1.879 
Now we can open the %CODE{"cpp"}% // Load the libraries needed to interpret the likelihood gSystem>Load("libHiggsAnalysisCombinedLimit.so"); // Open the file gFile = TFile::Open("hww2j.root"); // Retrieve the workspace from the file RooWorkspace *w = (RooWorkspace *) gFile>Get("w"); // Retrieve the ModelConfig from the workspace RooStats::ModelConfig *mc = (RooStats::ModelConfig *) w>genobj("ModelConfig"); // Now we get the list of parameters of the workspace mc>GetParametersOfInterest()>Print("V"); // the output will be // 1) RooRealVar:: r = 1 // Now we get the expected signal yield for ggH and DYTT cout << w>function("n_exp_final_binof_2j_proc_ggH")>getVal() << endl; cout << w>function("n_exp_final_binof_2j_proc_DYTT")>getVal() << endl; // the output will be 1.44, and 1.879, as expected // Now we change the value of r to be 2.0... we'd expect the ggH yield to grow to 2.88 and the DYTT to remain the same w>var("r")>setVal(2.0); cout << w>function("n_exp_final_binof_2j_proc_ggH")>getVal() << endl; // > 2.88 cout << w>function("n_exp_final_binof_2j_proc_DYTT")>getVal() << endl; // > 1.879 // We can also get the number of observed events, that should match the "observation" line in the datacard cout << w>data("data_obs")>sumEntries() << endl; // > 24 %ENDCODE%
Onedimensional fits and profile likelihoodNow that we have written the likelihood as function of a parameter μ we can proceed further to see how well different values of μ describe the observed data.
The first step is to find the value of μ that describes the data best, i.e. that maximises the likelihood. This value of μ will be denoted as text2workspace.py comb_hww.txt m 125.7 P HiggsAnalysis.CombinedLimit.PhysicsModel:strictSMLikeHiggs o comb_hww.root combine M MultiDimFit comb_hww.root m 125.7 n HWW_FIT The output of the program will be >>> including systematics >>> method used to compute upper limit is MultiDimFit >>> random number generator seed is 123456 Computing limit starting from observation  MultiDimFit  best fit parameter values: r : +0.599 Done in 0.01 min (cpu), 0.02 min (real) The output file from combine will contain a tree with a single entry, corresponding to this value. %CODE{"cpp"}% // root.exe b l higgsCombineHWW_FIT.MultiDimFit.mH125.7.root // Attaching file higgsCombineHWW_FIT.MultiDimFit.mH125.7.root as _file0... root [1] limit>Scan("r"); //************************ //* Row * r * //************************ //* 0 * 0.5986483 * //************************ %ENDCODE%
Now, to compare different values of μ, we will use the negative log of the profile likelihood function λ
We can now use combine to compute q(μ) for different values of μ. Using the same input file as before, let's scan 100 values of mu along a line (i.e. a 1dimensional grid), in the range between 0 and 3. We will tell combine to call the output file combine M MultiDimFit comb_hww.root m 125.7 n HWW_SCAN_MU algo=grid points=100 setPhysicsModelParameterRanges r=0,3
This command will be slower than the previous one, since the code now has to maximise the likelihood 101 times: once first to get the best fit μ then 100 times one for each point along the grid. Some of those fits will not converge at the first attempt, so the putput could contain some lines like
We can inspect the output from ROOT: the output tree in the file %CODE{"cpp"}% // root l higgsCombineHWW_SCAN_MU.MultiDimFit.mH125.7.root limit>GetEntries() // (const Long64_t)101 limit>Scan("r:2*deltaNLL") //************************************ //* Row * r * 2*deltaNL * //************************************ //* 0 * 0.5986473 * 0 * //* 1 * 0.0149999 * 5.9730005 * //* 2 * 0.0450000 * 5.3444776 * //* 3 * 0.0750000 * 4.7531828 * //* 4 * 0.1049999 * 4.1995768 * //* 5 * 0.1350000 * 3.6845009 * limit>Draw("2*deltaNLL:r"); %ENDCODE%
The quality of the default plot looks awful, so we will retrieve that information as a TGraph object and make it a bit more pretty; also, we will zoom into the region closer to the minimum which is more interesting. %CODE{"cpp"}% // Draw limit>Draw("2*deltaNLL:r"); // Make a copy of the graph TGraph *graph = gROOT>FindObject("Graph")>Clone(); // sort the points for increasing values of r graph>Sort(); // set the style graph>SetLineWidth(2); graph>SetLineColor(214); graph>SetMarkerStyle(7); graph>SetTitle("Likelihood scan as function of #mu;#mu;2 #Delta ln L"); graph>GetXaxis()>SetRangeUser(0,2); graph>GetYaxis()>SetRangeUser(0,8); c1>SetGridy(); // draw as a smooth line plus the points graph>Draw("APC"); %ENDCODE%
Now, from the scan you could read the boundaries of the Â¡À1σ interval by looking at where q(μ) crosses the horizontal line at 1. However, since that's a very common task, for models with just one parameter combine can do that for you using the MaxLikelihoodFit module combine M MaxLikelihoodFit comb_hww.root m 125.7 n HWW_FIT_MU justFit robustFit=1
The two options on the command line instruct combine to do just a fit for μ ( >>> including systematics >>> method used to compute upper limit is MaxLikelihoodFit >>> random number generator seed is 123456 Computing limit starting from observation Searching for crossing at nll = 18788 in the interval 0.598648, 20 r lvlhere lvlthere stepping 2.538783 11.83519 +0.49447 1.940135 1.180689 1.63413 11.83519 0.582041 0.773260 +0.23757 1.63413 0.174612 0.947873 0.38172 +0.23757 0.174612 0.825644 +0.07707 0.38172 0.052384 0.878028 0.13010 +0.07707 0.052384 0.841359 +0.02136 0.13010 0.015715 0.857074 0.04154 +0.02136 0.015715 0.846074 +0.00325 0.04154 0.004715 0.850788 0.01552 +0.00325 0.004715 0.847488 0.00232 0.01552 0.001414 0.846498 +0.00158 0.00232 0.000424 0.846922 0.00008 +0.00158 0.000424 Searching for crossing at nll = 18788 in the interval 0.598648, 0 r lvlhere lvlthere stepping 0.538783 +0.45835 +0.49447 0.059865 0.478919 +0.35676 +0.45835 0.059865 0.419054 +0.19666 +0.35676 0.059865 0.359189 0.04147 +0.19666 0.059865 0.401094 +0.13250 0.04147 0.017959 0.383135 +0.06353 +0.13250 0.017959 0.365175 0.01304 +0.06353 0.017959 0.377747 +0.04198 0.01304 0.005388 0.372359 +0.01921 +0.04198 0.005388 0.366971 0.00477 +0.01921 0.005388 0.370743 +0.01215 0.00477 0.001616 0.369127 +0.00497 +0.01215 0.001616 0.367510 0.00232 +0.00497 0.001616 0.368642 +0.00280 0.00232 0.000485 0.368157 +0.00061 +0.00280 0.000485 0.367672 0.00158 +0.00061 0.000485 0.368011 0.00005 0.00158 0.000145  MaxLikelihoodFit  Best fit r: 0.598648 0.230684/+0.248194 (68% CL) nll S+B > 8.65939 nll B > 1 Done in 0.11 min (cpu), 0.12 min (real)
So, from this combination we have measured
Exercise for the readerDo a fit of the signal strength separately for the 0jet and 2jet datacards of H→WW, and compute the weighted average using the textbook formulas for averaging numbers accounting for their uncertainties (for this approximate computation, you can deal with the asymmetric uncertainties by just taking the one that goes in the direction of the average result). How different it is from what obtained from the combined datacard?
</twistyPlugin twikiMakeVisibleInline> First we convert the datacards in likelihoods text2workspace.py m 125.7 P HiggsAnalysis.CombinedLimit.PhysicsModel:strictSMLikeHiggs hwwof_0j_shape_8TeV.txt o hww0j.root text2workspace.py m 125.7 P HiggsAnalysis.CombinedLimit.PhysicsModel:strictSMLikeHiggs hwwof_2j_shape_8TeV.txt o hww2j.root Then we run the fits on them
The average computed by hand is
</twistyPlugin>
Expectations: the Asimov datasetDetermining what is the expected outcome of a fit if the data were to match the SM Higgs expectations can be useful for various purposes, e.g. which to quantify the sensitivity of a given analysis in a way that is independent of the presence of fluctuations in the observed data (or when the data is still blinded, or not yet collected). The behaviour ofa fit with a complex model on the Asimov is also easier to understand with respect to the fit to the observed data, especially if the data departs significantly from the expectations. Normally, the procedure used to determine these expectations is to generate a large sample of possible experimental outcomes, all under the SM Higgs hypothesis, and use those to determine the median outcome of the analysis. This however would require substantial computational resources. Fortunately, a much cheaper but good approximation exists, through what is called the "Asimov dataset" In a parallel reality imagined by the science fiction writer I. Asimov, politics was run in a peculiar way: instead of mobilizing millions of people to cast their vote to deliberate on something, an algorithm was used to select an individual "average" person, and then this person was asked to take the decision on that matter. In more practical terms, an Asimov dataset can be defined as the one that matches perfectly the median expectations from the SM Higgs (i.e. the event yields in each bin are exactly as expected). The idea is simple and intuitive, and apart from a few subtleties like the need of having a noninteger number of observed events in each bin and of dealing efficiently with multidimensional models, which we won't discuss in this exercise.
When running on a datacard, combine can be instructed to use an Asimov dataset instead of the observed one simply by providing the options We can for example produce the expected likelihood scan vs μ for H→WW with combine M MultiDimFit comb_hww.root m 125.7 n HWW_SCAN_MU_ASIMOV algo=grid points=100 setPhysicsModelParameterRanges r=0,3 t 1 expectSignal=1
A comparison of the two likelihood scans, observed and Asimov, is below: the observed data is shown as a thick line, the Asimov as a light dotted line. The shapes are similar, but the Asimov is centered at μ=1 (by construction, since that's the SM expectation), while the observed data is not. </twistyPlugin twikiMakeVisibleInline> %CODE{"cpp"}% /// root l higgsCombineHWW_SCAN_MU.MultiDimFit.mH125.7.root higgsCombineHWW_SCAN_MU_ASIMOV.MultiDimFit.mH125.7.root // Attaching file higgsCombineHWW_SCAN_MU.MultiDimFit.mH125.7.root as _file0... // Attaching file higgsCombineHWW_SCAN_MU_ASIMOV.MultiDimFit.mH125.7.root as _file1... _file0>cd(); limit>Draw("2*deltaNLL:r"); TGraph *graph = gROOT>FindObject("Graph")>Clone(); graph>Sort(); _file1>cd(); limit>Draw("2*deltaNLL:r"); TGraph *graphA = gROOT>FindObject("Graph")>Clone(); graphA>Sort(); graph>SetLineWidth(3); graph>SetLineColor(1); graphA>SetLineWidth(1); graphA>SetLineColor(4); graphA>SetLineStyle(2); graph>Draw("AC"); graphA>Draw("C SAME"); graph>SetTitle("Likelihood scan as function of #mu;#mu;2 #Delta ln L"); graph>GetXaxis()>SetRangeUser(0,2); graph>GetYaxis()>SetRangeUser(0,8); %ENDCODE%
</twistyPlugin>
Twodimensional fits: μ_{ggH,ttH} vs μ_{VBF,VH}Now we will look at a more elaborate model, which has two parameter:
%CODE{"python"}% class RvRfXSHiggs(SMLikeHiggsModel): "Float ggH and ttH together and VH and qqH together" def __init__(self): SMLikeHiggsModel.__init__(self) # not using 'super(x,self).__init__' since I don't understand it self.floatMass = False def setPhysicsOptions(self,physOptions): ## [ ... code for floating the Higgs mass, irrelevant for us now .... ] def doParametersOfInterest(self): """Create POI out of signal strength and MH""" #  Signal Strength as only POI  self.modelBuilder.doVar("RV[1,5,15]") self.modelBuilder.doVar("RF[1,4,8]") if self.floatMass: # [ ... not interesting ... ] else: # [ ... not interesting ... ] self.modelBuilder.doSet("POI",'RV,RF') def getHiggsSignalYieldScale(self,production,decay, energy): if production in ['ggH', 'ttH']: return 'RF' if production in ['qqH', 'WH', 'ZH', 'VH']: return 'RV' raise RuntimeError, "Unknown production mode '%s'" % production %ENDCODE%
Analogously to the onedimensional case, we can define a likelihood function that depends on μ_{ggH,ttH} ( Let's apply this analysis to the H→γγ search, to find first the best fit point and then do a twodimensional scan of q. text2workspace.py comb_hgg.txt m 125.7 P HiggsAnalysis.CombinedLimit.PhysicsModel:rVrFXSHiggs o rVrF_hgg.root combine M MultiDimFit rVrF_hgg.root m 125.7 n HGG_RVRF_FIT >>> including systematics >>> method used to compute upper limit is MultiDimFit >>> random number generator seed is 123456 ModelConfig 'ModelConfig' defines more than one parameter of interest. This is not supported in some statistical methods. Computing limit starting from observation  MultiDimFit  best fit parameter values: RV : +0.259 RF : +1.229 Done in 0.01 min (cpu), 0.01 min (real) Now, for the 2D scan we'll do 400 points (20x20), using the ranges [1,3] and [2,5] for the two parameters. This requires 400+1 fits, so it will take a few minutes. combine M MultiDimFit rVrF_hgg.root m 125.7 n HGG_RVRF_SCAN_2D algo=grid points=400 setPhysicsModelParameterRanges RF=1,3:RV=2,5 >>> including systematics >>> method used to compute upper limit is MultiDimFit >>> random number generator seed is 123456 ModelConfig 'ModelConfig' defines more than one parameter of interest. This is not supported in some statistical methods. Set Range of Parameter RF To : (1,3) Set Range of Parameter RV To : (2,5) Computing limit starting from observation Point 0/400, (i,j) = (0,0), RV = 1.825000, RF = 0.900000 Point 2/400, (i,j) = (0,2), RV = 1.825000, RF = 0.500000 Point 4/400, (i,j) = (0,4), RV = 1.825000, RF = 0.100000 Point 6/400, (i,j) = (0,6), RV = 1.825000, RF = 0.300000 Point 8/400, (i,j) = (0,8), RV = 1.825000, RF = 0.700000 Point 10/400, (i,j) = (0,10), RV = 1.825000, RF = 1.100000 Point 12/400, (i,j) = (0,12), RV = 1.825000, RF = 1.500000 [...] Point 388/400, (i,j) = (19,8), RV = 4.825000, RF = 0.700000 Point 390/400, (i,j) = (19,10), RV = 4.825000, RF = 1.100000 Point 392/400, (i,j) = (19,12), RV = 4.825000, RF = 1.500000 Point 394/400, (i,j) = (19,14), RV = 4.825000, RF = 1.900000 Point 396/400, (i,j) = (19,16), RV = 4.825000, RF = 2.300000 Point 398/400, (i,j) = (19,18), RV = 4.825000, RF = 2.700000 Done in 1.52 min (cpu), 1.53 min (real) A simple plot from the output of the bidimensional scan can be done with %CODE{"cpp"}% // plot the likelihood in the 20x20 grid, selecting the interesting region of small deltaNLL limit>Draw("2*deltaNLL:RV:RF>>h2(20,1,3,20,2,5)","deltaNLL > 0 && deltaNLL < 11","PROF COLZ"); // make the plot less horrible gStyle>SetOptStat(0); h2>SetContour(200); h2>SetTitle("Profile Likelihood in 2D;#mu_{ggH,ttH};#mu_{VBF,VH}"); /// Now we add the best fit point limit>Draw("RV:RF","deltaNLL == 0","P SAME"); TGraph *bestFit = gROOT>FindObject("Graph"); bestFit>SetMarkerStyle(34); bestFit>SetMarkerSize(3); c1>Update(); %ENDCODE%
Contours for different confidence levels can be constructed starting from the 2d likelihood scan, using the thresholds for a chisquare with two degrees of freedom (pdf review). In particular, the 1σ region is given by q ≤ 2.30, the 95% one by q ≤ 5.99 and the 3σ one by q ≤ 11.83; the C++ function ROOT provides a way to extract a smooth contour out of a twodimensional histogram, and this functionality is packaged in a script test/plotting/contours2D.cxx. The top level function, that you will call directly, is %CODE{"cpp"}% void contour2D(TString xvar, int xbins, float xmin, float xmax, TString yvar, int ybins, float ymin, float ymax, float smx=1.0, float smy=1.0, TFile *fOut=0, TString name="contour2D") ; %ENDCODE%
We can now use this macro interactively to make a nicer plot of the previous 2D scan. For convenience, copy the %CODE{"cpp"}% // root l higgsCombineHGG_RVRF_SCAN_2D.MultiDimFit.mH125.7.root contours2D.cxx // Attaching file higgsCombineHGG_RVRF_SCAN_2D.MultiDimFit.mH125.7.root as _file0... // Processing contours2D.cxx... // before we had done // limit>Draw("2*deltaNLL:RV:RF>>h2(20,1,3,20,2,5)","deltaNLL > 0 && deltaNLL < 11","PROF COLZ"); // now we do contour2D("RF",20,1.,3., "RV",20,2.,5., 1.,1.); %ENDCODE%
Fitting only a subset of the parameters of a modelSometimes, one is interested in determining confidence regions for one or two parameters in a model that has more parameters. There are two conceptually different ways in which this can be done: the other parameters can be either kept fixed at some nominal value (e.g. the SM prediction), or they can be profiled (i.e. left freely floating in the fit)
Still using the H→γγ analysis with the μ_{ggH,ttH} vs μ_{VBF,VH} model as an example, we can compare the two approaches by making 1d likelihood scans as function of μ_{ggH,ttH} keeping μ_{VBF,VH} fixed or profiled: we use the option combine M MultiDimFit rVrF_hgg.root m 125.7 algo=grid points=50 setPhysicsModelParameterRanges RF=1,3 P RF floatOtherPOI=0 n HGG_RVRF_SCAN_1D_RF_FIX combine M MultiDimFit rVrF_hgg.root m 125.7 algo=grid points=50 setPhysicsModelParameterRanges RF=1,3 P RF floatOtherPOI=1 n HGG_RVRF_SCAN_1D_RF_PROF %CODE{"cpp"}% // root l higgsCombineHGG_RVRF_SCAN_1D_RF_* /// Attaching file higgsCombineHGG_RVRF_SCAN_1D_RF_FIX.MultiDimFit.mH125.7.root as _file0... /// Attaching file higgsCombineHGG_RVRF_SCAN_1D_RF_PROF.MultiDimFit.mH125.7.root as _file1... _file0>cd(); limit>Draw("2*deltaNLL:RF"); TGraph *graphFix = gROOT>FindObject("Graph")>Clone(); graphFix>Sort(); _file1>cd(); limit>Draw("2*deltaNLL:RF"); TGraph *graphProf = gROOT>FindObject("Graph")>Clone(); graphProf>Sort(); graphProf>SetLineWidth(3); graphProf>Draw("AC"); graphFix>SetLineWidth(2); graphFix>SetLineStyle(9); graphFix>SetLineColor(206); graphFix>Draw("C SAME"); %ENDCODE%
You can see that the two give different results, both in terms of the preferred value and in terms of the uncertainties. The interpretation of the two results is also different from the physics point of view:
First fits to Higgs search results
</twistyPlugin twikiMakeVisibleInline> First, run the mkcomb.sh to create the combined datacards
bash mkcomb.sh . If you want to get also the combination of the ttH searches, you have to add a line combineCards.py $TTH > comb_ttH.txt Then, for each datacard you can do text2workspace.py m 125.7 P HiggsAnalysis.CombinedLimit.PhysicsModel:strictSMLikeHiggs datacard.txt o datacard.root combine M MaxLikelihoodFit justFit robustFit=1 m 125.7 datacard.root or directly combine M MaxLikelihoodFit justFit robustFit=1 m 125.7 datacard.txt
</twistyPlugin>
</twistyPlugin twikiMakeVisibleInline>
for X in h{bb,tt,gg,ww,zz}; do text2workspace.py comb_${X}.txt m 125.7 P HiggsAnalysis.CombinedLimit.PhysicsModel:rVrFXSHiggs o rVrF_${X}.root forfit ; done for X in h{bb,tt,gg,ww,zz}; do combine M MultiDimFit rVrF_${X}.root m 125.7 n ${X}_RVRF_SCAN_2D algo=grid3x3 points=400 setPhysicsModelParameterRanges RF=1,3:RV=2,5; done To see a single plot, interactively in ROOT, %CODE{"cpp"}% /// root l tdrstyle.cc contours2D.cxx higgsCombinehww_RVRF_SCAN_2D.MultiDimFit.mH125.7.root // Processing tdrstyle.cc... // Processing contours2D.cxx... // Attaching file higgsCombinehww_RVRF_SCAN_2D.MultiDimFit.mH125.7.root as _file0... contour2D("RF",60,1,3, "RV",60,2,5, 1.0,1.0); %ENDCODE%
A macro to make all the plots including the summary one is below: %CODE{"cpp"}% // run with root.exe b l tdrstyle.cc contours2D.cxx plotall_rvrf.cxx q void plotall_rvrf() { TCanvas *c1 = new TCanvas("c1","c1"); TFile *fOut = TFile::Open("rvrf_all.root", "RECREATE"); const int decays = 5; const char *decay[decays] = { "hww", "hzz", "hgg", "hbb", "htt" }; const int color[decays] = { 4 , 2 , 209 , 67 , 221 }; for (int i = 0; i < decays; ++i) { gFile = TFile::Open(Form("higgsCombine%s_RVRF_SCAN_2D.MultiDimFit.mH125.7.root", decay[i])); contour2D("RF",60,1,3,"RV",60,2,5,1.0,1.0,fOut,Form("rvrf_%s",decay[i])); c1>Print(Form("output_rvrf_%s.png",decay[i])); gFile>Close(); } fOut>cd(); TH2F *frame = new TH2F("frame",";#mu_{ggH,ttH};#mu_{VBF,VH}",60,1.,3.,60,2.,5.); frame>Draw("AXIS"); for (int i = 0; i < decays; ++i) { TGraph *best = fOut>Get(Form("rvrf_%s_best",decay[i])); best>SetMarkerColor(color[i]); best>SetMarkerStyle(34); // apparently ROOT forgets this best>SetMarkerSize(2.0); // so we do this again best>Draw("P SAME"); TList *c68 = fOut>Get(Form("rvrf_%s_c68",decay[i])); styleMultiGraph(c68, /*color=*/color[i], /*width=*/3, /*style=*/1); c68>Draw("L SAME"); } TMarker m; m.SetMarkerSize(3.0); m.SetMarkerColor(97); m.SetMarkerStyle(33); m.DrawMarker(1.0,1.0); m.SetMarkerSize(1.8); m.SetMarkerColor(89); m.SetMarkerStyle(33); m.DrawMarker(1.0,1.0); c1>Print("output_rvrf_all.png"); fOut>Close(); } %ENDCODE%
</twistyPlugin>
Higgs Coupling models
The previous physics models were considering only some common signal strength modifiers aff Under the hypothesis that the boson observed is a narrow scalar state with the same structure of couplings as the SM Higgs boson, the Higgs production and decay modes decouple and we can parameterize the expected yields in terms of production cross sections and decay widths: To search for possible deviations in the data from the expectations for the SM Higgs boson, we introduce some coupling modifiers κ_{X}, and express the cross sections and partial widths as function of these κ_{X} (where κ_{X}=1 corresponds to the SM Higgs). We will consider different models, each characterized by a different set of parameters and by different assumptions.
The κ_{V}, κ_{f} benchmark model.In this model, we introduce just two parameters κ_{V}, κ_{f} that scale the couplings between the Higgs boson and the electroweak gauge bosons and between the Higgs boson and the fermions. The loopinduced effective couplings of the Higgs boson to gluons and to photons are computed in terms of κ_{V}, κ_{f} by assuming that only the SM particles participate to the loop.
In this model we assume that there are no BSM decays, so
Exercises for the reader
</twistyPlugin twikiMakeVisibleInline>
It is instructive to note that in this benchmark model the search for VH, H→bb provides exactly the same kind of measurement as ggH, H→ZZ (or ggH, H→VV). The same is true also for another channel which is often intuitively associated with fermion couplings, VBF H→ττ. </twistyPlugin>
</twistyPlugin twikiMakeVisibleInline>
Among the channels considered, the ones with the strongest dependency on κ_{V}^{2} and κ_{f}^{2} are respectively VBF H→γγ and ttH, H→bb. More in general, you can deduce that the strongest dependency is achieved when considering final states with the same coupling both at production and decay (VBF and VH production with H→γγ,VV; ggH and ttH production with H→bb, τtau;), while the channels with mixed productions and decays have smaller coefficients since there is a partial cancellation between the κ_{f}^{2} at numerator and the Â¾κ_{f}^{2} at denominator (from κ_{H}^{2}). </twistyPlugin>
Implementation in combineThe PhysicsModel that implements the κ_{V} and κ_{f} model is CvCfHiggs (when the model was first concieved, κ_{V} was called C_{V} and κ_{f} was called C_{F}). The relevant python code is below (for clarity, the parts of the code which allow the possibility of leaving the Higgs mass floating, and of setting the ranges for CV and CF, as well have been hidden). A few more comments have also been added here to make it more easy to understand. %CODE{"python"}% class CvCfHiggs(SMLikeHiggsModel): def __init__(self): SMLikeHiggsModel.__init__(self) # not using 'super(x,self).__init__' since I don't understand it self.floatMass = False self.cVRange = ['0','2'] self.cFRange = ['2','2'] def doParametersOfInterest(self): """Create POI out of signal strength and MH""" #  Signal Strength as only POI  self.modelBuilder.doVar("CV[1,%s,%s]" % (self.cVRange[0], self.cVRange[1])) self.modelBuilder.doVar("CF[1,%s,%s]" % (self.cFRange[0], self.cFRange[1])) if self.floatMass: #[...] else: #[...] self.modelBuilder.doSet("POI",'CV,CF') ## Create an object "SMHiggsBuilder" that can compute the SM predictions for the Higgs ## (e.g. it encodes SM BRs and the the alpha and beta parameters for the diphoton decay) self.SMH = SMHiggsBuilder(self.modelBuilder) self.setup() def setup(self): ## Define the four possible modes according to which a decay can scale: the two simple ones (HVV and Hff, which also works for the gluon decay) ## and the two complex ones that require to know what particles contribute in the SM loops (diphoton, Zgamma) self.decayScaling = 'hgg':'hgg', 'hzg':'hzg', 'hww':'hvv', 'hzz':'hvv', 'hbb':'hff', 'htt':'hff', } ## Define which coupling scales which production cross section (always trivial) self.productionScaling = { 'ggH':'CF', 'ttH':'CF', 'qqH':'CV', 'WH':'CV', 'ZH':'CV', 'VH':'CV', } ## Use tables provided by the theory community to compute the scaling for the diphoton ## and Zgamma decay as function of the various couplings self.SMH.makeScaling('hgg', Cb='CF', Ctop='CF', CW='CV', Ctau='CF') self.SMH.makeScaling('hzg', Cb='CF', Ctop='CF', CW='CV', Ctau='CF') ## partial widths, normalized to the SM one, for decays scaling with F, V and total for d in [ "htt", "hbb", "hcc", "hww", "hzz", "hgluglu", "htoptop", "hgg", "hzg", "hmm", "hss" ]: self.SMH.makeBR(d) self.modelBuilder.factory_('expr::CvCf_Gscal_sumf("@0*@0 * (@1+@2+@3+@4+@5+@6+@7)", CF, SM_BR_hbb, SM_BR_htt, SM_BR_hcc, SM_BR_htoptop, SM_BR_hgluglu, SM_BR_hmm, SM_BR_hss)') self.modelBuilder.factory_('expr::CvCf_Gscal_sumv("@0*@0 * (@1+@2)", CV, SM_BR_hww, SM_BR_hzz)') self.modelBuilder.factory_('expr::CvCf_Gscal_gg("@0 * @1", Scaling_hgg, SM_BR_hgg)') self.modelBuilder.factory_('expr::CvCf_Gscal_Zg("@0 * @1", Scaling_hzg, SM_BR_hzg)') ## Total width self.modelBuilder.factory_('sum::CvCf_Gscal_tot(CvCf _Gscal_sumf, CvCf _Gscal_sumv, CvCf _Gscal_gg, CvCf _Gscal_Zg)') ## BRs, normalized to the SM ones: they scale as (coupling/coupling_SM)^2 / (totWidth/totWidthSM) self.modelBuilder.factory_('expr::CvCf_BRscal_hgg("@0/@1", Scaling_hgg, CvCf _Gscal_tot)') self.modelBuilder.factory_('expr::CvCf_BRscal_hzg("@0/@1", Scaling_hzg, CvCf _Gscal_tot)') self.modelBuilder.factory_('expr::CvCf_BRscal_hff("@0*@0/@1", CF, CvCf _Gscal_tot)') self.modelBuilder.factory_('expr::CvCf_BRscal_hvv("@0*@0/@1", CV, CvCf _Gscal_tot)') ## Print out everything self.modelBuilder.out.Print() def getHiggsSignalYieldScale(self,production,decay,energy): ## create a function that provides the scale factor for a given (production,decay) pair, unless it exists already name = "CvCf_XSBRscal_%s_%s" % (production,decay) if self.modelBuilder.out.function(name): return name XSscal = self.productionScaling[production] BRscal = self.decayScaling[decay] self.modelBuilder.factory_('expr::%s("@0*@0 * @1", %s, CvCf _BRscal_%s)' % (name, XSscal, BRscal)) return name ## This lines below are instead in HiggsCouplings.py from HiggsAnalysis.CombinedLimit.HiggsBenchmarkModels.VectorsAndFermionsModels import CvCfHiggs, CvCfXgHiggs, CfXgHiggs cVcF = CvCfHiggs() %ENDCODE%
Working example: H→WWLet's now investigate the constraints in the (κ_{V},κ_{f}) provided by the H→WW analysis separately for 0 jets and 2 jets, and for the combination of the two We can create the three workspaces: text2workspace.py hwwof_0j_shape_8TeV.txt m 125.7 P HiggsAnalysis.CombinedLimit.HiggsCouplings:cVcF o cVcF_hww0j.root forfits text2workspace.py hwwof_2j_shape_8TeV.txt m 125.7 P HiggsAnalysis.CombinedLimit.HiggsCouplings:cVcF o cVcF_hww2j.root forfits text2workspace.py comb_hww.txt m 125.7 P HiggsAnalysis.CombinedLimit.HiggsCouplings:cVcF o cVcF_hww.root forfits and then we run the scans on the Asimov dataset, since they are easier to interpret for X in hww hww0j hww2j; do ( combine M MultiDimFit cVcF_${X}.root m 125.7 n ${X}_CVCF_SCAN_2D_ASIMOV algo=grid3x3 points=225 setPhysicsModelParameterRanges CF=0,2:CV=0,2 t 1 expectSignal=1 > log.cvf$X 2>&1 & ); done
The resulting plot is the following: </twistyPlugin twikiMakeVisibleInline> %CODE{"cpp"}% // run with root.exe b l tdrstyle.cc contours2D.cxx plotcvcf_hww.cxx q void fillMultiGraph(TList tmg, int fillColor, int fillStyle) { for (int i = 0; i < tmg>GetSize(); ++i) { TGraph *g = (TGraph) tmg>At(i); g>SetFillColor(fillColor); g>SetFillStyle(fillStyle); } } void plotcvcf_hww() { TCanvas *c1 = new TCanvas("c1","c1"); TFile *fOut = TFile::Open("kvkf_plots_hww.root", "RECREATE"); const int states = 3; const char *state[states] = { "hww", "hww0j", "hww2j" }; for (int i = 0; i < states; ++i) { gFile = TFile::Open(Form("higgsCombine%s_CVCF_SCAN_2D_ASIMOV.MultiDimFit.mH125.7.root", state[i])); contour2D("CV",45,0.,2., "CF",45,0.,2., 1.0,1.0, fOut,Form("kvkf_%s",state[i])); gFile>Close(); } fOut>cd(); TH2F *frame = new TH2F("frame",";#kappa_{V};#kappa_{f}", 45,0.,2., 45,0.,2.); frame>GetXaxis()>SetDecimals(1); frame>GetYaxis()>SetDecimals(1); frame>Draw("AXIS"); TList c68_comb = (TList) fOut>Get("kvkf_hww_c68"); styleMultiGraph(c68_comb, /*color=*/1, /*width=*/3, /*style=*/1); fillMultiGraph(c68_comb, /*color=*/kGray, /*style=*/1001); TList c68_0j = (TList) fOut>Get("kvkf_hww0j_c68"); styleMultiGraph(c68_0j, /*color=*/4, /*width=*/3, /*style=*/1); fillMultiGraph(c68_0j, /*color=*/4, /*style=*/3005); TList c68_2j = (TList) fOut>Get("kvkf_hww2j_c68"); fillMultiGraph(c68_2j, /*color=*/2, /*style=*/3004); styleMultiGraph(c68_2j, /*color=*/2, /*width=*/3, /*style=*/1); c68_2j>Draw("F SAME"); c68_0j>Draw("F SAME"); c68_comb>Draw("F SAME"); c68_2j>Draw("L SAME"); c68_0j>Draw("L SAME"); c68_comb>Draw("L SAME"); TGraph *best = fOut>Get("kvkf_hww_best"); best>SetMarkerColor(1); best>SetMarkerStyle(34); // apparently ROOT forgets this best>SetMarkerSize(2.0); // so we do this again best>Draw("P SAME"); TMarker m; m.SetMarkerSize(3.0); m.SetMarkerColor(97); m.SetMarkerStyle(33); m.DrawMarker(1.0,1.0); m.SetMarkerSize(1.8); m.SetMarkerColor(89); m.SetMarkerStyle(33); m.DrawMarker(1.0,1.0); TLegend *leg = new TLegend(0.20,0.80,0.45,0.95); leg>SetTextFont(42); leg>SetShadowColor(0); leg>SetFillColor(0); leg>AddEntry(best, "Comb.", "P"); leg>AddEntry(c68_comb>At(0), "(68% CL)", "LF"); leg>AddEntry(c68_0j>At(0), "0 jets", "LF"); leg>AddEntry(c68_2j>At(0), "2 jets", "LF"); leg>Draw(); c1>Print("output_kvkf_hww.png"); fOut>Close(); } %ENDCODE%
</twistyPlugin>
The shape of the H→WW + 2 jets can be explained considering that the expected signal is mostly from VBF. As computed in the previous questions from the reader, the expected scale factor for VBF H→WW is The shape of the H→WW + 0 jets is also understandable, considering that the signal is mostly gluon fusion: if κ_{f} is large enough, we can again neglect the κ_{V} at denominator, but for this channel now the κ_{f} at denominator cancels with the one in the numerator (from gluon fusion) and what is left is a measure of κ_{V} alone, giving an allowed region shaped as a vertical band at constant κ_{V}. However if κ_{f} is instead small with respect to κ_{V}, as it happens in the bottomright part of the plot, it is now κ_{V} to dominate the denominator; this κ_{V} will cancel the one at the numerator (from the Γ_{WW}) leaving just a measure of κ_{f} (i.e. a horizontal band).
C6: a general sixparameter model for the couplings of a SMlike Higgs bosonThe κ_{V}, κ_{f} model is quite restrictive, assuming e.g. the same coupling strength modifier for both quarks and leptons, and for both uptype and downtype fermions, assumptions which are not true in many BSM scenarios (e.g. in supersymmetry the couplings to the top and bottom quarks have different modifiers). We will consider therefore a more generic model which relies on a much smaller set of assumptions, namely:
In this model we therefore have six parameters:
Exercise for the reader
</twistyPlugin twikiMakeVisibleInline> Γ_{tot}/Γ_{tot}_{SM} = 0.566Â¡¤κ_{b}^{2} + 0.2541Â¡¤κ_{V}^{2} + 0.0851Â¡¤κ_{g}^{2} + 0.0623Â¡¤κ_{τ}^{2} + 0.0285Â¡¤κ_{t}^{2} + 0.00388Â¡¤κ_{γ}^{2}
</twistyPlugin>
<
Implementation in combineThe PhysicsModel that implements the c6 model is C6. The relevant python code is below (for clarity, the parts of the code which allow the possibility of leaving the Higgs mass floating, as well have been hidden, and a few more comments have also been added here to make it more easy to understand (and some misleading obsolete comments removed...).
%CODE{"python"}% class C6(SMLikeHiggsModel): def __init__(self): SMLikeHiggsModel.__init__(self) # not using 'super(x,self).__init__' since I don't understand it self.floatMass = False ## This model has the option to have a separate coupling for the HZgamma vertex, we won't use it in this exercise self.doHZg = False def setPhysicsOptions(self,physOptions): for po in physOptions: if po.startswith("higgsMassRange="): # [...] if po ## Create the SMHiggsBuilder, to get the SM BRs self.SMH = SMHiggsBuilder(self.modelBuilder) self.setup() def setup(self): # SM BRs, needed to compute the modified total width as funciton of the partial widths for d in [ "htt", "hbb", "hcc", "hww", "hzz", "hgluglu", "htoptop", "hgg", "hzg", "hmm", "hss" ]: self.SMH.makeBR(d) ## Partial decay witdhs, normalized to the SM one self.modelBuilder.factory_('expr::c6_Gscal_Vectors("@0*@0 * (@1+@2)", kV, SM_BR_hzz, SM_BR_hww)') self.modelBuilder.factory_('expr::c6_Gscal_tau("@0*@0 * (@1+@2)", ktau, SM_BR_htt, SM_BR_hmm)') self.modelBuilder.factory_('expr::c6_Gscal_top("@0*@0 * (@1+@2)", ktop, SM_BR_htoptop, SM_BR_hcc)') self.modelBuilder.factory_('expr::c6_Gscal_bottom("@0*@0 * (@1+@2)", kbottom, SM_BR_hbb, SM_BR_hss)') self.modelBuilder.factory_('expr::c6_Gscal_gluon("@0*@0 * @1", kgluon, SM_BR_hgluglu)') if not self.doHZg: self.modelBuilder.factory_('expr::c6_Gscal_gamma("@0*@0 * (@1+@2)", kgamma, SM_BR_hgg, SM_BR_hzg)') else: self.modelBuilder.factory_('expr::c6_Gscal_gamma("@0*@0 *@1+@2*@2*@3", kgamma, SM_BR_hgg, kZgamma, SM_BR_hzg)') ## Modifier for the total width, from the sum of the partial widths self.modelBuilder.factory_('sum::c6_Gscal_tot(c6_Gscal_Vectors, c6_Gscal_tau, c6_Gscal_top, c6_Gscal_bottom, c6_Gscal_gluon, c6_Gscal_gamma)') ## BRs, normalized to the SM ones: they scale as (coupling/coupling_SM)^2 / (total/total_SM) self.modelBuilder.factory_('expr::c6_BRscal_hvv("@0*@0/@1", kV, c6_Gscal_tot)') self.modelBuilder.factory_('expr::c6_BRscal_htt("@0*@0/@1", ktau, c6_Gscal_tot)') self.modelBuilder.factory_('expr::c6_BRscal_hbb("@0*@0/@1", kbottom, c6_Gscal_tot)') self.modelBuilder.factory_('expr::c6_BRscal_hgg("@0*@0/@1", kgamma, c6_Gscal_tot)') if self.doHZg: self.modelBuilder.factory_('expr::c6_BRscal_hzg("@0*@0/@1", kZgamma, c6_Gscal_tot)')
def getHiggsSignalYieldScale(self,production,decay,energy): name = "c6_XSBRscal_%s_%s" % (production,decay) print '[LOFullParametrization::C6]' print name, production, decay, energy if self.modelBuilder.out.function(name) ## This lines below are instead in HiggsCouplings.py from HiggsAnalysis.CombinedLimit.LOFullParametrization import C5, C6, C7, PartialWidthsModel c6 = C6() %ENDCODE%
Working example 1: κ_{γ}
We will take the full combination but excluding the H→inv search, and extract the κ_{γ} modifier, profiling the others. However, to profile them correctly we need to select ranges that are large enough so that the parameter doesn't end on the boundary. We can look at the past CMS Higgs combination, HIG13005, to see what was the sensitivity to the other parameters, e.g from the summary plot In addition to the observed outcome, we will compute also the Asimov one text2workspace.py comb.txt m 125.7 P HiggsAnalysis.CombinedLimit.HiggsCouplings:c6 o c6_comb.root forfit RANGES="setPhysicsModelParameterRanges kbottom=0,5:ktop=0,8:ktau=0,4:kgluon=0,4:kgamma=0,3" combine c6_comb.root m 125.7 M MultiDimFit algo=grid P kgamma floatOtherPOI=1 $RANGES points=40 n C6_comb_SCAN_1D_kgamma /// note: 40 points take about 4 minutes (on lxplus) combine c6_comb.root m 125.7 M MultiDimFit algo=grid P kgamma floatOtherPOI=1 $RANGES points=40 n C6_comb_SCAN_1D_kgamma_ASIMOV t 1 expectSignal=1 /// note: 40 points take about 20 minutes (on lxplus)
Speeding up things: running scans in parallel
When the model becomes complex, evaluating all the points of the scan in a single job can be inefficient. For this purpose, MultiDimFit allows one to fraction the task by specifying a range of points to run: different ranges can be run in parallel, and at the end the root files can be merged with combine c6_comb.root m 125.7 M MultiDimFit algo=grid P kgamma floatOtherPOI=1 $RANGES points=40 n C6_comb_SCAN_1D_kgamma_ASIMOV.0 t 1 expectSignal=1 firstPoint 0 lastPoint 7 combine c6_comb.root m 125.7 M MultiDimFit algo=grid P kgamma floatOtherPOI=1 $RANGES points=40 n C6_comb_SCAN_1D_kgamma_ASIMOV.1 t 1 expectSignal=1 firstPoint 8 lastPoint 15 combine c6_comb.root m 125.7 M MultiDimFit algo=grid P kgamma floatOtherPOI=1 $RANGES points=40 n C6_comb_SCAN_1D_kgamma_ASIMOV.2 t 1 expectSignal=1 firstPoint 16 lastPoint 23 combine c6_comb.root m 125.7 M MultiDimFit algo=grid P kgamma floatOtherPOI=1 $RANGES points=40 n C6_comb_SCAN_1D_kgamma_ASIMOV.3 t 1 expectSignal=1 firstPoint 24 lastPoint 31 combine c6_comb.root m 125.7 M MultiDimFit algo=grid P kgamma floatOtherPOI=1 $RANGES points=40 n C6_comb_SCAN_1D_kgamma_ASIMOV.4 t 1 expectSignal=1 firstPoint 32 lastPoint 39 hadd higgsCombineC6_comb_SCAN_1D_kgamma_ASIMOV.MultiDimFit.mH125.7.root higgsCombineC6_comb_SCAN_1D_kgamma_ASIMOV.*.MultiDimFit.mH125.7.root where the first 5 commands can be executed in parallel. For convenience, there is even already a script that takes care of running all the jobs in parallel splitting the points, suitable for running on a multicore machine: test/parallelScan.py. For the previous case, we could for example do cp s ${CMSSW_BASE}/src/HiggsAnalysis/CombinedLimit/test/parallelScan.py . parallelScan.py j 5 c6_comb.root m 125.7 M MultiDimFit algo=grid P kgamma floatOtherPOI=1 $RANGES points=40 n C6_comb_SCAN_1D_kgamma_ASIMOV t 1 expectSignal=1 hadd higgsCombineC6_comb_SCAN_1D_kgamma_ASIMOV.MultiDimFit.mH125.7.root higgsCombineC6_comb_SCAN_1D_kgamma_ASIMOV.*.MultiDimFit.mH125.7.root
The option
Working example 2: κ_{g} vs κ_{b} degeneracyThe channels with the best sensitivity to the signal strength are mostly driven by the gluon fusion production cross section (e.g. H→WW and H→ZZ, H→γγboosted H→ττ). Conversely, the H→bb is the dominant decay mode, so that the total width is mostly sensitive to κ_{b}. The combination of these two features results in a strong correlation between the uncertainties of the two measurements of κ_{g} and κ_{b}: a simultaneous increase of both couplings would produce an increase in the gluon fusion production cross section but also a decrease of the branching ratio in the other modes (WW, ZZ, γγ and ττ), leaving the expected σÂ¡¤BR unchanged, and so provides still a good fit to the data. We can see this feature with a 2D likelihood scan: ./parallelScan.py c6_comb.root m 125.7 M MultiDimFit algo=grid P kgluon P kbottom floatOtherPOI=1 $RANGES points=400 n C6_comb_SCAN_2D_kgluon_kbottom
Coupling fits from the CMS combination
</twistyPlugin twikiMakeVisibleInline> The commands to run to create the workspaces and do the scans are below. Running a 2D scan takes about 2 minutes for a single channel, and about 20 minutes for the full combination text2workspace.py comb.txt m 125.7 P HiggsAnalysis.CombinedLimit.HiggsCouplings:cVcF o cVcF_comb.root forfit ; for X in h{bb,tt,gg,ww,zz}; do text2workspace.py comb_${X}.txt m 125.7 P HiggsAnalysis.CombinedLimit.HiggsCouplings:cVcF o cVcF_${X}.root forfit ; done for X in h{bb,tt,gg,ww,zz,comb}; do combine M MultiDimFit cVcF_${X}.root m 125.7 n ${X}_CVCF_SCAN_2D algo=grid3x3 points=225 setPhysicsModelParameterRanges CV=0,2:CF=0,2; done A macro to make the summary plot is below: %CODE{"cpp"}% // run with root.exe b l tdrstyle.cc contours2D.cxx plotcvcf_all.cxx q void fillMultiGraph(TList tmg, int fillColor, int fillStyle) { for (int i = 0; i < tmg>GetSize(); ++i) { TGraph *g = (TGraph) tmg>At(i); g>SetFillColor(fillColor); g>SetFillStyle(fillStyle); } } void plotcvcf_all() { TCanvas c1 = new TCanvas("c1","c1"); TFile *fOut = TFile::Open("kvkf_plots_all.root", "RECREATE"); const int decays = 6; const char *decay[decays] = { "hbb", "htt", "hzz", "hww", "hgg", "comb" }; const int color[decays] = { 67 , 221 , 2 , 4 , 209 , 1 }; const int fill[decays] = { 228 , 223 , 208 , 216 , 211 , 203 }; TList *c68[decays] = { 0,0,0,0,0, 0 }; for (int i = 0; i < decays; ++i) { gFile = TFile::Open(Form("higgsCombine%s_CVCF_SCAN_2D.MultiDimFit.mH125.7.root", decay[i])); contour2D("CV",45,0.,2., "CF",45,0.,2., 1.0,1.0, fOut,Form("kvkf_%s",decay[i])); gFile>Close(); } fOut>cd(); fOut>ls(); for (int i = 0; i < decays; ++i) { c68[i] = (TList) fOut>Get(Form("kvkf_%s_c68",decay[i])); styleMultiGraph(c68[i], color[i], 3, 1); fillMultiGraph(c68[i], fill[i], 1001); } TH2F *frame = new TH2F("frame",";#kappa_{V};#kappa_{f}", 45,0.,2., 45,0.,2.); frame>GetXaxis()>SetDecimals(1); frame>GetYaxis()>SetDecimals(1); frame>Draw("AXIS"); for (int i = 0; i < decays; ++i) { c68[i]>Draw("F SAME"); } for (int i = 0; i < decays; ++i) { c68[i]>Draw("L SAME"); } TGraph *best = fOut>Get("kvkf_comb_best"); best>SetMarkerColor(1); best>SetMarkerStyle(34); // apparently ROOT forgets this best>SetMarkerSize(2.0); // so we do this again best>Draw("P SAME"); TMarker m; m.SetMarkerSize(3.0); m.SetMarkerColor(97); m.SetMarkerStyle(33); m.DrawMarker(1.0,1.0); m.SetMarkerSize(1.8); m.SetMarkerColor(89); m.SetMarkerStyle(33); m.DrawMarker(1.0,1.0); c1>Print("output_kvkf_all.png"); fOut>Close(); } %ENDCODE%
</twistyPlugin>
</twistyPlugin twikiMakeVisibleInline> combine c6_comb.root M MultiDimFit m 125.7 $RANGES >>> including systematics >>> method used to compute upper limit is MultiDimFit >>> random number generator seed is 123456 ModelConfig 'ModelConfig' defines more than one parameter of interest. This is not supported in some statistical methods. Set Range of Parameter kbottom To : (0,5) Set Range of Parameter ktop To : (0,8) Set Range of Parameter ktau To : (0,4) Set Range of Parameter kgluon To : (0,4) Set Range of Parameter kgamma To : (0,3) Computing limit starting from observation  MultiDimFit  best fit parameter values: kV : +1.044 ktau : +1.274 ktop : +2.053 kbottom : +1.401 kgluon : +1.039 kgamma : +1.143 Done in 0.11 min (cpu), 0.11 min (real)
combine c6_comb.root M MultiDimFit algo=singles m 125.7 $RANGES >>> including systematics >>> method used to compute upper limit is MultiDimFit >>> random number generator seed is 123456 ModelConfig 'ModelConfig' defines more than one parameter of interest. This is not supported in some statistical methods. Set Range of Parameter kbottom To : (0,5) Set Range of Parameter ktop To : (0,8) Set Range of Parameter ktau To : (0,4) Set Range of Parameter kgluon To : (0,4) Set Range of Parameter kgamma To : (0,3) Computing limit starting from observation Running Minos for POI Real time 0:00:11, CP time 11.700 Running Minos for POI Real time 0:00:11, CP time 11.700 Running Minos for POI Real time 0:00:11, CP time 11.700 Running Minos for POI Real time 0:00:11, CP time 11.700 Running Minos for POI Real time 0:00:11, CP time 11.700 Running Minos for POI Real time 0:00:11, CP time 11.700  MultiDimFit  best fit parameter values and profilelikelihood uncertainties: kV : +1.044 0.234/+0.204 (68%) ktau : +1.274 0.497/+0.504 (68%) ktop : +2.053 0.897/+1.702 (68%) kbottom : +1.401 0.674/+1.227 (68%) kgluon : +1.039 0.324/+0.811 (68%) kgamma : +1.143 0.339/+0.388 (68%) Done in 1.55 min (cpu), 1.64 min (real)
</twistyPlugin>
Going beyond: introducing H→invis decays into C6The C6 model assumes that there are no BSM decays of the Higgs. We will now take one step forward, and allow the possibility of Higgs decays into invisible particles, relying on the searches in the VBF mode to constrain this. We will introduce a new parameter BR_{Inv}, representing the BR of the Higgs boson decays to invisible particles.
</twistyPlugin twikiMakeVisibleInline> The total width is given by the sum of the partial widths of the SMlike decays (dependent on κ) plus the invisible one, and by definition and consequently dividing by the SM total width we can express this in terms of SM BRs </twistyPlugin>
</twistyPlugin twikiMakeVisibleInline>
%CODE{"python"}% class C6I(SMLikeHiggsModel): "as C6 but including direct searches for H>invis" def __init__(self): SMLikeHiggsModel.__init__(self) # not using 'super(x,self).__init__' since I don't understand it self.floatMass = False self.doHZg = False def setPhysicsOptions(self,physOptions): for po in physOptions: if po.startswith("higgsMassRange="): self.floatMass = True self.mHRange = po.replace("higgsMassRange=","").split(",") print 'The Higgs mass range:', self.mHRange if len(self.mHRange) = 2: raise RuntimeError, "Higgs mass range definition requires two extrema" elif float(self.mHRange[0]) >= float(self.mHRange[1]): raise RuntimeError, "Extrama for Higgs mass range defined with inverterd order. Second must be larger the first" if po def setup(self): # SM BR for d in [ "htt", "hbb", "hcc", "hww", "hzz", "hgluglu", "htoptop", "hgg", "hzg", "hmm", "hss" ]: self.SMH.makeBR(d) ## total witdhs, normalized to the SM one self.modelBuilder.factory_('expr::c6i_Gscal_Vectors("@0*@0 * (@1+@2)", kV, SM_BR_hzz, SM_BR_hww)') self.modelBuilder.factory_('expr::c6i_Gscal_tau("@0*@0 * (@1+@2)", ktau, SM_BR_htt, SM_BR_hmm)') self.modelBuilder.factory_('expr::c6i_Gscal_top("@0*@0 * (@1+@2)", ktop, SM_BR_htoptop, SM_BR_hcc)') self.modelBuilder.factory_('expr::c6i_Gscal_bottom("@0*@0 * (@1+@2)", kbottom, SM_BR_hbb, SM_BR_hss)') self.modelBuilder.factory_('expr::c6i_Gscal_gluon("@0*@0 * @1", kgluon, SM_BR_hgluglu)') if not self.doHZg: self.modelBuilder.factory_('expr::c6i_Gscal_gamma("@0*@0 * (@1+@2)", kgamma, SM_BR_hgg, SM_BR_hzg)') else: self.modelBuilder.factory_('expr::c6i_Gscal_gamma("@0*@0 *@1+@2*@2*@3", kgamma, SM_BR_hgg, kZgamma, SM_BR_hzg)') ## SMlike sum of partial widths self.modelBuilder.factory_('sum::c6i_Gscal_totSM(c6i_Gscal_Vectors, c6i_Gscal_tau, c6i_Gscal_top, c6i_Gscal_bottom, c6i_Gscal_gluon, c6i_Gscal_gamma)') ## Divide by 1/(1BRInv) to get the total width self.modelBuilder.factory_('expr::c6i_Gscal_tot("@0/(1@1)",c6i_Gscal_totSM,BRInv)'); ## BRs, normalized to the SM ones: they scale as (partial/partial_SM)^2 / (total/total_SM)^2 self.modelBuilder.factory_('expr::c6i_BRscal_hvv("@0*@0/@1", kV, c6i_Gscal_tot)') self.modelBuilder.factory_('expr::c6i_BRscal_htt("@0*@0/@1", ktau, c6i_Gscal_tot)') self.modelBuilder.factory_('expr::c6i_BRscal_hbb("@0*@0/@1", kbottom, c6i_Gscal_tot)') self.modelBuilder.factory_('expr::c6i_BRscal_hgg("@0*@0/@1", kgamma, c6i_Gscal_tot)') if self.doHZg: self.modelBuilder.factory_('expr::c6i_BRscal_hzg("@0*@0/@1", kZgamma, c6i_Gscal_tot)') ## The invisible BR is by construction BRInv self.modelBuilder.factory_('expr::c6i_BRscal_hinv("@0", BRInv)')
def getHiggsSignalYieldScale(self,production,decay,energy): name = "c6i_XSBRscal_%s_%s" % (production,decay) if self.modelBuilder.out.function(name) c6i = C6I() %ENDCODE%
</twistyPlugin>
</twistyPlugin twikiMakeVisibleInline> text2workspace.py hinv_vbf_8TeV.txt m 125.7 P HiggsAnalysis.CombinedLimit.LOFullParametrization:c6i o c6i_vbfinv.root forfit combine m 125.7 M MultiDimFit c6i_vbfinv.root n C6I_vbfinv_SCAN_2D_kV_BRInv P kV P BRInv floatOtherPOI=0 points=400 algo=grid3x3 $RANGES
text2workspace.py comb_hww.txt m 125.7 P HiggsAnalysis.CombinedLimit.LOFullParametrization:c6i o c6i_hww.root forfit ./parallelScan.py M MultiDimFit c6i_hww.root n C6I_hww_SCAN_2D_kV_BRInv P kV P BRInv floatOtherPOI=0 points=100 algo=grid3x3 $RANGES hadd f higgsCombineC6I_hww_SCAN_2D_kV_BRInv.MultiDimFit.mH125.7.root higgsCombineC6I_hww_SCAN_2D_kV_BRInv.*.MultiDimFit.mH125.7.root
HWW="hww0j_8TeV=hwwof_0j_shape_8TeV.txt hww2j_8TeV=hwwof_2j_shape_8TeV.txt" HIN="hinv_8TeV=hinv_vbf_8TeV.txt" combineCards.py S $HWW $HIN > comb_test2.txt text2workspace.py comb_test2.txt m 125.7 P HiggsAnalysis.CombinedLimit.LOFullParametrization:c6i o c6i_test2.root forfit ./parallelScan.py m 125.7 M MultiDimFit c6i_test2.root n C6I_test2_SCAN_2D_kV_BRInv P kV P BRInv floatOtherPOI=0 points=100 algo=grid3x3 $RANGES hadd f higgsCombineC6I_test2_SCAN_2D_kV_BRInv.MultiDimFit.mH125.7.root higgsCombineC6I_test2_SCAN_2D_kV_BRInv.*.MultiDimFit.mH125.7.root
</twistyPlugin>
</twistyPlugin twikiMakeVisibleInline>
text2workspace.py comb.txt m 125.7 P HiggsAnalysis.CombinedLimit.LOFullParametrization:c6i o c6i_comb.root forfit; text2workspace.py combx.txt m 125.7 P HiggsAnalysis.CombinedLimit.LOFullParametrization:c6i o c6i_combx.root forfit; ./parallelScan.py m 125.7 M MultiDimFit c6i_comb.root n C6I_comb_SCAN_2D_kV_BRInv P kV P BRInv floatOtherPOI=1 points=400 algo=grid $RANGES hadd ./parallelScan.py m 125.7 M MultiDimFit c6i_combx.root n C6I_combx_SCAN_2D_kV_BRInv P kV P BRInv floatOtherPOI=1 points=400 algo=grid $RANGES hadd
</twistyPlugin>
</twistyPlugin twikiMakeVisibleInline> Best fit and onedimensional uncertainties (observed): takes O(1h) combine m 125.7 M MultiDimFit c6i_combx.root n C6I_comb_SINGLES algo=singles $RANGES robustFit=1  MultiDimFit  best fit parameter values and profilelikelihood uncertainties: kV : +1.146 0.207/+0.199 (68%) ktau : +1.399 0.466/+0.428 (68%) ktop : +2.000 0.683/+0.000 (68%) kbottom : +1.428 0.537/+0.572 (68%) kgluon : +1.075 0.265/+0.407 (68%) kgamma : +1.231 0.306/+0.345 (68%) BRInv : +0.177 0.177/+0.157 (68%) Done in 54.14 min (cpu), 55.02 min (real) Onedimensional expected uncertainties: one parameter at a time in parallel, since it takes O(1h) per parameter. for D in k{V,gamma,gluon,bottom,top,tau} BRInv; do ( combine m 125.7 M MultiDimFit c6i_combx.root n C6I_comb_SINGLES_${D}_Asimov algo=singles $RANGES robustFit=1 t 1 P $D floatOtherPOI=1 expectSignal=1 2>&1  tee log.C6I_Asimov_$D > /dev/null & ); done tail n 2 log.C6I_Asimov_* ==> log.C6I_Asimov_BRInv <== BRInv : +0.039 0.039/+0.190 (68%) Done in 58.08 min (cpu), 65.68 min (real) ==> log.C6I_Asimov_kV <== kV : +1.020 0.201/+0.196 (68%) Done in 62.85 min (cpu), 70.62 min (real) ==> log.C6I_Asimov_kbottom <== kbottom : +1.020 0.433/+0.585 (68%) Done in 66.84 min (cpu), 74.38 min (real) ==> log.C6I_Asimov_kgamma <== kgamma : +1.021 0.225/+0.255 (68%) Done in 61.08 min (cpu), 68.59 min (real) ==> log.C6I_Asimov_kgluon <== kgluon : +1.017 0.241/+0.457 (68%) Done in 74.38 min (cpu), 81.99 min (real) ==> log.C6I_Asimov_ktau <== ktau : +1.022 0.366/+0.323 (68%) Done in 74.65 min (cpu), 82.36 min (real) ==> log.C6I_Asimov_ktop <== ktop : +1.019 1.019/+0.750 (68%) Done in 57.58 min (cpu), 65.44 min (real) Onedimensional likelihood scans (observed and expected) for D in k{V,gamma,gluon,bottom,top,tau} BRInv; do ./parallelScan.py m 125.7 M MultiDimFit c6i_combx.root n C6I_combx_SCAN_1D_$D P ${D} floatOtherPOI=1 $RANGES algo=grid points 50 hadd; ./parallelScan.py m 125.7 M MultiDimFit c6i_combx.root n C6I_combx_SCAN_1D_${D}_ASIMOV P ${D} floatOtherPOI=1 $RANGES algo=grid points 30 t 1 expectSignal=1 hadd; done Results:
</twistyPlugin twikiMakeVisibleInline> %CODE{"cpp"}% void plot1ds_c6i() { TCanvas c1 = new TCanvas("c1","c1"); gStyle>SetOptStat(0); TFile *fOut = TFile::Open("plot1d_c6i.root", "RECREATE"); const int params = 7; const char *param[params] = { "kV", "kbottom", "ktop", "ktau", "kgluon", "kgamma", "BRInv" }; const float min[params] = { 0, 0, 0, 0, 0, 0, 0, }; const float max[params] = { 2, 5, 8, 4, 4, 3, 1, }; const char *label[params] = { "#kappa_{V}", "#kappa_{b}", "#kappa_{t}", "#kappa_{#tau}", "#kappa_{g}", "#kappa_{#gamma}","BR_{Inv}" }; for (int i = 0; i < params; ++i) { cout << param[i] << endl; gFile = TFile::Open(Form("higgsCombineC6I_combx_SCAN_1D_%s.MultiDimFit.mH125.7.root", param[i])); TTree *limit = (TTree) gFile>Get("limit"); limit>Draw(Form("2*deltaNLL:%s",param[i]),""); TGraph gO = gROOT>FindObject("Graph")>Clone(); gO>Sort(); gFile>Close(); gFile = TFile::Open(Form("higgsCombineC6I_combx_SCAN_1D_%s_ASIMOV.MultiDimFit.mH125.7.root", param[i])); limit = (TTree) gFile>Get("limit"); limit>Draw(Form("2*deltaNLL:%s",param[i]),""); TGraph *gA = gROOT>FindObject("Graph")>Clone(); gA>Sort(); gFile>Close(); gO>SetName(Form("c6i_1d_%s_obs", param[i])); gA>SetName(Form("c6i_1d_%s_asimov",param[i])); fOut>WriteTObject(gO); fOut>WriteTObject(gA); TH1F *frame = new TH1F("frame",Form(";%s; 2 #Delta ln L",label[i]), 10, min[i], max[i]); frame>Draw(); frame>GetYaxis()>SetRangeUser(0,7); gO>SetLineWidth(3); gO>SetLineColor(1); gA>SetLineWidth(2); gA>SetLineColor(1); gA>SetLineStyle(2); gO>Draw("C SAME"); gA>Draw("C SAME"); c1>Print(Form("output_C6I_1D_%s.png", param[i])); delete frame; } fOut>Close(); } %ENDCODE%
</twistyPlugin>
</twistyPlugin>
Higgs J^{CP}IntroductionThe narrow resonance with an invariant mass of approximately 125 GeV that has been discovered by ATLAS and CMS, are mainly via diboson decay modes γγ, WW and ZZ. The determination of the quantum numbers of this Higgslike particle, such as its spin and parity, is currently one of the most important tasks in high energy physics and thus is important for future studies.
KinematicsThe scattering amplitudes of Higgs or any exotic boson decays to the WW final state have been performed in several literatures. It was demonstrated that the WW chaneel has a good sensitivity to distinguish between the standard model Higgs boson hypothesis and a spin2 resonance, which couples to the dibosons through minimal couplings, referred to as 2+. Some sensitivity in this final state can also distinguish between the min0+ and the pseudoscalar 0− boson hypotheses. The typical experimental observables are m(ll), m_{T}(ll,E_{T}^{miss}), and Δφ(ll).
The tdefinition of transverse mass m_{T}(ll,E_{T}^{miss}) used in this analysis is
Exercise for the readerTwo datasets of simulated X→WW→2l2ν events have been produced, with X being a SM Higgs and spin 2 graviton, and a very basic event selection has already been applied to retain only the events with at least two highpt leptons, and no hadronic jets. What you can do is to run on those simulated events and make plots of the kinematic variables like the ones above (or even others, e.g. ΔR(ll)). You can do this exercise either starting from simple ROOT trees, or from PAT files (for the latter, use CMSSW_5_3_14), using any of the analysis tools you should have learned from the preexercises (root, fwlite, python, ...).
cmsStageIn /store/cmst3/group/das2014/HiggsProperties/FILE.root . (or, temporarily, copy them from /afs/cern.ch/work/g/gpetrucc/GENS/CMSSW_5_3_14/src/ ). The files have been produced from these two datasets:
/afs/cern.ch/user/g/gpetrucc/public/CMSDAS2014CERNtoy_HWW.py (note: this is just a toy example, the real H→WW selection has many more selection requirements, and use more objects and tools... see HIG13023 for more information)
</twistyPlugin twikiMakeVisibleInline> Here are many possible solutions, depending on whether you run on the Trees or on the PAT, and what software you want to use. Here we will give four.
Using Trees, in python %CODE{"python"}% from math import * import sys, ROOT from DataFormats.FWLite import Handle, Events def fillPlots(tree,histos): l1 = ROOT.TLorentzVector() l2 = ROOT.TLorentzVector() met = ROOT.TLorentzVector() print "Making plots for one sample: ", for i in xrange(tree.GetEntries()): # read tree.GetEntry(i) l1.SetPtEtaPhiM(tree.lep1_pt, tree.lep1_eta, tree.lep1_phi, tree.lep1_mass) l2.SetPtEtaPhiM(tree.lep2_pt, tree.lep2_eta, tree.lep2_phi, tree.lep2_mass) met.SetPtEtaPhiM(tree.met_pt, 0.0, tree.met_phi, 0.0) #  m(ll) p4ll = l1 + l2 histos['mll'].Fill(p4ll.M()) #  delta phi(ll) dphi = abs(l1.Phi()  l2.Phi()) if dphi > pi: dphi = 2*pidphi histos['dphill'].Fill(dphi) #  mT mT = sqrt(2*p4ll.Pt()*met.Pt()*(1cos(p4ll.Phi()met.Phi()))) histos['mT'].Fill(mT) #  2d histos['m2d'].Fill(mT,p4ll.M()) if (i+1) % 1000 == 0: sys.stdout.write("."); sys.stdout.flush() print " done." def stackPlots(plots): c1 = ROOT.TCanvas("c1") for var,plotsm in plots["sm"].iteritems(): plot2p = plots["2p"][var] if "TH1F" in plotsm.ClassName(): # common things for p in plotsm, plot2p: p.SetLineWidth(2) p.Scale(1.0/p.Integral()) # different things plotsm.SetLineColor(1) plot2p.SetLineColor(2) # plot plotsm.Draw("HIST ") plot2p.Draw("HIST SAME") plotsm.GetYaxis().SetRangeUser(0, 1.4*max(plotsm.GetMaximum(),plot2p.GetMaximum())) # legend leg = ROOT.TLegend(0.2, 0.77, 0.55, 0.92) leg.SetFillColor(0) leg.SetTextFont(42) leg.SetTextSize(0.05) leg.AddEntry(plotsm, "SM Higgs", "L") leg.AddEntry(plot2p, "2^{+}_{m} Graviton", "L") leg.Draw(); # print c1.Print("plot_jcp_"+var+".png") elif "TH2F" in plotsm.ClassName(): plotsm.Draw("COLZ") c1.Print("plot_jcp_"+var+"_sm.png") plot2p.Draw("COLZ") c1.Print("plot_jcp_"+var+"_2p.png") ROOT.gROOT.ProcessLine(".x tdrstyle.cc") ROOT.gROOT.SetBatch(1) ROOT.gStyle.SetOptStat(0) output = ROOT.TFile("plots.root","RECREATE") plots = {} for spin in "2p", "sm": plots[spin] = { "mll" :ROOT.TH1F("mll_" +spin, ";m(ll) (GeV)",40,0,120), "mT" :ROOT.TH1F("mT_" +spin, ";m_{T}(ll,E_{T}^{miss}) (GeV)",40,0,200), "dphill":ROOT.TH1F("dphill_"+spin, ";#Delta#phi(ll) (rad)",31,0,pi), "m2d" :ROOT.TH2F("m2d_"+spin, ";m_{T}(ll,E_{T}^{miss}) (GeV);m(ll) (GeV)",6,60,140,5,12,100), } smhiggs_file = ROOT.TFile.Open("HWW_Tree_SMHiggs.root") graviton_file = ROOT.TFile.Open("HWW_Tree_Graviton.root") smhiggs = smhiggs_file.Get("tree/probe_tree") graviton = graviton_file.Get("tree/probe_tree") fillPlots(graviton, plots["2p"]) fillPlots(smhiggs, plots["sm"]) stackPlots(plots) %ENDCODE%
Using PAT Objects, in python (with FWLite) %CODE{"python"}% from math import * import sys, ROOT from DataFormats.FWLite import Handle, Events def fillPlots(events,histos): muons = Handle("std::vector") electrons = Handle("std::vector") mets = Handle("std::vector") print "Making plots for one sample: ", for i,event in enumerate(events): event.getByLabel("cleanPatMuons", muons) event.getByLabel("cleanPatElectrons", electrons) event.getByLabel("patMETs", mets) # list of leptons sorted by pT leptons = [ mu.p4() for mu in muons.product() ] + [ el.p4(el.candidateP4Kind()) for el in electrons.product() ] leptons.sort(key = lambda lepton : lepton.Pt(), reverse=True) # missing energy met = mets.product()[0].p4() # reconstruct the various kinematic variables #  m(ll) p4ll = leptons[0] + leptons[1] histos['mll'].Fill(p4ll.mass()) #  delta phi(ll) dphi = abs(leptons[0].Phi()  leptons[1].Phi()) if dphi > pi: dphi = 2*pidphi histos['dphill'].Fill(dphi) #  mT mT = sqrt(2*p4ll.Pt()*met.Pt()*(1cos(p4ll.Phi()met.Phi()))) histos['mT'].Fill(mT) #  2d histos['m2d'].Fill(mT,p4ll.M()) if (i+1) % 1000 == 0: sys.stdout.write("."); sys.stdout.flush() print " done." def stackPlots(plots): c1 = ROOT.TCanvas("c1") for var,plotsm in plots["sm"].iteritems(): plot2p = plots["2p"][var] if "TH1F" in plotsm.ClassName(): # common things for p in plotsm, plot2p: p.SetLineWidth(2) p.Scale(1.0/p.Integral()) # different things plotsm.SetLineColor(1) plot2p.SetLineColor(2) # plot plotsm.Draw("HIST ") plot2p.Draw("HIST SAME") plotsm.GetYaxis().SetRangeUser(0, 1.4*max(plotsm.GetMaximum(),plot2p.GetMaximum())) # legend leg = ROOT.TLegend(0.2, 0.77, 0.55, 0.92) leg.SetFillColor(0) leg.SetTextFont(42) leg.SetTextSize(0.05) leg.AddEntry(plotsm, "SM Higgs", "L") leg.AddEntry(plot2p, "2^{+}_{m} Graviton", "L") leg.Draw(); # print c1.Print("plot_jcp_"+var+".png") elif "TH2F" in plotsm.ClassName(): plotsm.Draw("COLZ") c1.Print("plot_jcp_"+var+"_sm.png") plot2p.Draw("COLZ") c1.Print("plot_jcp_"+var+"_2p.png") ROOT.gROOT.ProcessLine(".x tdrstyle.cc") ROOT.gROOT.SetBatch(1) ROOT.gStyle.SetOptStat(0) output = ROOT.TFile("plots.root","RECREATE") plots = {} for spin in "2p", "sm": plots[spin] = { "mll" :ROOT.TH1F("mll_" +spin, ";m(ll) (GeV)",40,0,120), "mT" :ROOT.TH1F("mT_" +spin, ";m_{T}(ll,E_{T}^{miss}) (GeV)",40,0,200), "dphill":ROOT.TH1F("dphill_"+spin, ";#Delta#phi(ll)",31,0,pi), "m2d" :ROOT.TH2F("m2d_"+spin, ";m_{T}(ll,E_{T}^{miss}) (GeV);m(ll) (GeV)",6,60,140,5,12,100), } smhiggs = Events("HWW_SMHiggs.root") graviton = Events("HWW_Graviton.root") fillPlots(graviton, plots["2p"]) fillPlots(smhiggs, plots["sm"]) stackPlots(plots) %ENDCODE%
Using Trees, in C++ (with ROOT)
%CODE{"cpp"}% #include #include #include #include #include #include #include #include #include #include void fillPlots(TTree *tree, TH1 * hmll, TH1 * hdphill, TH1 * hmT, TH2 * hm2d) { Float_t lep1_pt, lep1_eta, lep1_phi, lep1_mass; Float_t lep2_pt, lep2_eta, lep2_phi, lep2_mass; Float_t met_pt, met_eta = 0, met_phi, met_mass = 0; tree>SetBranchAddress("lep1_pt", &lep1_pt); tree>SetBranchAddress("lep1_eta", &lep1_eta); tree>SetBranchAddress("lep1_phi", &lep1_phi); tree>SetBranchAddress("lep1_mass", &lep1_mass); tree>SetBranchAddress("lep2_pt", &lep2_pt); tree>SetBranchAddress("lep2_eta", &lep2_eta); tree>SetBranchAddress("lep2_phi", &lep2_phi); tree>SetBranchAddress("lep2_mass", &lep2_mass); tree>SetBranchAddress("met_pt", &met_pt); tree>SetBranchAddress("met_phi", &met_phi); TLorentzVector l1,l2,met; printf("Making plots for one sample: "); for (Long64_t i = 0, n = tree>GetEntries(); i < n; ++i) { tree>GetEntry(i); l1.SetPtEtaPhiM(lep1_pt, lep1_eta, lep1_phi, lep1_mass); l2.SetPtEtaPhiM(lep2_pt, lep2_eta, lep2_phi, lep2_mass); met.SetPtEtaPhiM(met_pt, met_eta, met_phi, met_mass); TLorentzVector p4ll = l1+l2; hmll>Fill(p4ll.M()); float dphi = fabs(l1.Phi()  l2.Phi()); if (dphi > TMath::Pi()) { dphi = 2*TMath::Pi()dphi; } hdphill>Fill(dphi); float mT = sqrt(2*p4ll.Pt()*met.Pt()*(1cos(p4ll.Phi()met.Phi()))); hmT>Fill(mT); hm2d>Fill(mT,p4ll.M()); if ((i+1) % 1000 == 0) { putchar('.'); fflush(stdout); } } printf("done.\n"); } void stackPlots(TH1 *plotsm, TH1 *plot2p, TString var) { TCanvas *c1 = new TCanvas("c1"); plotsm>SetLineWidth(2); plotsm>Scale(1.0/plotsm>Integral()); plot2p>SetLineWidth(2); plot2p>Scale(1.0/plot2p>Integral()); plotsm>SetLineColor(1); plot2p>SetLineColor(2); // plot plotsm>Draw("HIST "); plot2p>Draw("HIST SAME"); plotsm>GetYaxis()>SetRangeUser(0, 1.4*TMath::Max(plotsm>GetMaximum(),plot2p>GetMaximum())); // legend TLegend *leg = new TLegend(0.2, 0.77, 0.55, 0.92); leg>SetFillColor(0); leg>SetTextFont(42); leg>SetTextSize(0.05); leg>AddEntry(plotsm, "SM Higgs", "L"); leg>AddEntry(plot2p, "2^{+}_{m} Graviton", "L"); leg>Draw(); // print c1>Print("plot_jcp_"+var+".png"); c1>Close(); } void stackPlots(TH2 *plotsm, TH2 *plot2p, TString var) { TCanvas *c1 = new TCanvas("c1"); plotsm>Draw("COLZ"); c1>Print("plot_jcp_"+var+"_sm.png"); plot2p>Draw("COLZ"); c1>Print("plot_jcp_"+var+"_2p.png"); c1>Close(); } void makePlots_TREE() { gROOT>ProcessLine(".x tdrstyle.cc"); gROOT>SetBatch(1); gStyle>SetOptStat(0); TFile *output = TFile::Open("plots.root","RECREATE"); TH1F * hmll_sm = new TH1F("mll_sm", ";m(ll) (GeV)",40,0,120); TH1F * hmT_sm = new TH1F("mT_sm", ";m_{T}(ll,E_{T}^{miss}) (GeV)",40,0,200); TH1F * hdphill_sm = new TH1F("dphill_sm", ";#Delta#phi(ll) (rad)",31,0,TMath::Pi()); TH2F * hm2d_sm = new TH2F("m2d_sm", ";m_{T}(ll,E_{T}^{miss}) (GeV);m(ll) (GeV)",6,60,140,5,12,100); TH1F * hmll_2p = new TH1F("mll_2p", ";m(ll) (GeV)",40,0,120); TH1F * hmT_2p = new TH1F("mT_2p", ";m_{T}(ll,E_{T}^{miss}) (GeV)",40,0,200); TH1F * hdphill_2p = new TH1F("dphill_2p", ";#Delta#phi(ll) (rad)",31,0,TMath::Pi()); TH2F * hm2d_2p = new TH2F("m2d_2p", ";m_{T}(ll,E_{T}^{miss}) (GeV);m(ll) (GeV)",6,60,140,5,12,100); TTree smhiggs = (TTree) TFile::Open("HWW_Tree_SMHiggs.root")>Get("tree/probe_tree"); TTree graviton = (TTree) TFile::Open("HWW_Tree_Graviton.root")>Get("tree/probe_tree"); fillPlots(graviton, hmll_2p, hdphill_2p, hmT_2p, hm2d_2p); fillPlots(smhiggs, hmll_sm, hdphill_sm, hmT_sm, hm2d_sm); stackPlots(hmll_sm, hmll_2p, "mll"); stackPlots(hdphill_sm, hdphill_2p, "dphill"); stackPlots(hmT_sm, hmT_2p, "mT"); stackPlots(hm2d_sm, hm2d_2p, "m2d"); output>Close(); } %ENDCODE%
Using Trees, in C++ (with FWLite, always compiled) %CODE{"cpp"}% //// == rootlogon === { TString makeshared(gSystem>GetMakeSharedLib()); TString dummy = makeshared.ReplaceAll("W ", ""); dummy.ReplaceAll("g++ ","g++ std=c++0x "); dummy.ReplaceAll("Wshadow "," "); dummy.ReplaceAll("Wall ","Wall Wnodeprecateddeclarations "); gSystem>SetMakeSharedLib(dummy); gSystem>Load("libGenVector.so"); // strangely this is needed if (getenv("CMSSW_BASE")) { gSystem>Load("libFWCoreFWLite.so"); gSystem>Load("libDataFormatsFWLite.so"); gSystem>SetIncludePath("I$ROOFITSYS/include"); AutoLibraryLoader::enable(); } } %ENDCODE%
%CODE{"cpp"}% ////
// hide all from ROOT #if defined( CINT) && defined( MAKECINT) #include void fillPlots(TFile *file, TH1 * hmll, TH1 * hdphill, TH1 * hmT, TH2 * hm2d) { // hide all from ROOT #if defined( CINT) && defined( MAKECINT) using namespace std; fwlite::Event ev(file); edm::Handle > muons; edm::Handle > electrons; edm::Handle > mets; edm::InputTag muonsTag("cleanPatMuons"); edm::InputTag electronsTag("cleanPatElectrons"); edm::InputTag metsTag("patMETs"); printf("Making plots for one sample: "); int iev = 0; for( ev.toBegin(); ! ev.atEnd(); ++ev, ++iev) { ev.getByLabel(muonsTag, muons); ev.getByLabel(electronsTag, electrons); ev.getByLabel(metsTag, mets); // now we pick the two leading leptons, by making a list of (pt,pointer) vector > list; for (unsigned int i = 0, n = muons>size(); i < n; ++i) { const pat::Muon &mu = (*muons)[i]; list.push_back(pair(mu.pt(), &mu)); } for (unsigned int i = 0, n = electrons>size(); i < n; ++i) { const pat::Electron &el = (*electrons)[i]; list.push_back(pair(el.pt(), &el)); } sort(list.begin(), list.end()); reco::Particle::LorentzVector l1 = list[0].second>p4(); reco::Particle::LorentzVector l2 = list[1].second>p4(); reco::Particle::LorentzVector met = (*mets)[0].p4(); reco::Particle::LorentzVector p4ll = l1+l2; hmll>Fill(p4ll.M()); float dphi = deltaPhi(l1.Phi(), l2.Phi()); hdphill>Fill(dphi); float mT = sqrt(2*p4ll.Pt()*met.Pt()*(1cos(p4ll.Phi()met.Phi()))); hmT>Fill(mT); hm2d>Fill(mT,p4ll.M()); if ((iev+1) % 1000 == 0) { putchar('.'); fflush(stdout); } } printf("done.\n"); #endif } void stackPlots(TH1 *plotsm, TH1 *plot2p, TString var) { TCanvas *c1 = new TCanvas("c1"); plotsm>SetLineWidth(2); plotsm>Scale(1.0/plotsm>Integral()); plot2p>SetLineWidth(2); plot2p>Scale(1.0/plot2p>Integral()); plotsm>SetLineColor(1); plot2p>SetLineColor(2); // plot plotsm>Draw("HIST "); plot2p>Draw("HIST SAME"); plotsm>GetYaxis()>SetRangeUser(0, 1.4*TMath::Max(plotsm>GetMaximum(),plot2p>GetMaximum())); // legend TLegend *leg = new TLegend(0.2, 0.77, 0.55, 0.92); leg>SetFillColor(0); leg>SetTextFont(42); leg>SetTextSize(0.05); leg>AddEntry(plotsm, "SM Higgs", "L"); leg>AddEntry(plot2p, "2^{+}_{m} Graviton", "L"); leg>Draw(); // print c1>Print("plot_jcp_"+var+".png"); c1>Close(); } void stackPlots(TH2 *plotsm, TH2 *plot2p, TString var) { TCanvas *c1 = new TCanvas("c1"); plotsm>Draw("COLZ"); c1>Print("plot_jcp_"+var+"_sm.png"); plot2p>Draw("COLZ"); c1>Print("plot_jcp_"+var+"_2p.png"); c1>Close(); } void makePlots_PAT() { gSystem>Load("libFWCoreFWLite.so"); AutoLibraryLoader::enable(); gROOT>ProcessLine(".x tdrstyle.cc"); gROOT>SetBatch(1); gStyle>SetOptStat(0); TFile *output = TFile::Open("plots.root","RECREATE"); TH1F * hmll_sm = new TH1F("mll_sm", ";m(ll) (GeV)",40,0,120); TH1F * hmT_sm = new TH1F("mT_sm", ";m_{T}(ll,E_{T}^{miss}) (GeV)",40,0,200); TH1F * hdphill_sm = new TH1F("dphill_sm", ";#Delta#phi(ll) (rad)",31,0,TMath::Pi()); TH2F * hm2d_sm = new TH2F("m2d_sm", ";m_{T}(ll,E_{T}^{miss}) (GeV);m(ll) (GeV)",6,60,140,5,12,100); TH1F * hmll_2p = new TH1F("mll_2p", ";m(ll) (GeV)",40,0,120); TH1F * hmT_2p = new TH1F("mT_2p", ";m_{T}(ll,E_{T}^{miss}) (GeV)",40,0,200); TH1F * hdphill_2p = new TH1F("dphill_2p", ";#Delta#phi(ll) (rad)",31,0,TMath::Pi()); TH2F * hm2d_2p = new TH2F("m2d_2p", ";m_{T}(ll,E_{T}^{miss}) (GeV);m(ll) (GeV)",6,60,140,5,12,100); TFile *smhiggs = TFile::Open("HWW_SMHiggs.root"); TFile *graviton = TFile::Open("HWW_Graviton.root"); fillPlots(graviton, hmll_2p, hdphill_2p, hmT_2p, hm2d_2p); fillPlots(smhiggs, hmll_sm, hdphill_sm, hmT_sm, hm2d_sm); stackPlots(hmll_sm, hmll_2p, "mll"); stackPlots(hdphill_sm, hdphill_2p, "dphill"); stackPlots(hmT_sm, hmT_2p, "mT"); stackPlots(hm2d_sm, hm2d_2p, "m2d"); output>Close(); } /* Note: you can also compile it with gcc, though it's a bit a pain in the neck gcc std=c++0x Wnodeprecated Wnodeprecateddeclarations lstdc++ $(rootconfig cflags ldflags libs) lGenVector I $CMSSW_RELEASE_BASE/src L $CMSSW_RELEASE_BASE/lib/$SCRAM_ARCH lDataFormatsPatCandidates lDataFormatsFWLite lDataFormatsCommon lFWCoreUtilities lFWCoreFWLite lDataFormatsMath I$(scramv1 tool tag clhepheader INCLUDE) makePlots_PAT.cxx ; ./a.out */ int main(int argc, char **argv) { makePlots_PAT(); } %ENDCODE%
</twistyPlugin>
StatisticsThe statistics of hypotheses testing is based on the likelihood ratio between the hypotheses being tested.
Likelihood ratioConsider the null (N) and alternate (A) hypotheses which correspond in this case to the SMHlike 0^{+}+background (N) and the alternative J^{CP}+background (A). The likelihood ratio is defined as q = 2 ln( L(xA) / L(xN) ), where x represents a set of events. This set of events, x, can be:
You can find the example of such distributions below for the ZZ+WW combination performed in HIG13005:
Measures of separation and the CLs criteriumGiven the distributions above, one can then characterise the expectations and observation in a number of ways:
Prefit and postfit expectationsA final complication relates to the normalisation of the processes entering the N and A hypotheses, especially the signal normalisation. We consider the "prefit" and "postfit" cases:
It can be seen that the when going from left to right, the distributions P(qA) and P(qN) become closer, less discernible, and the separation is smaller. This is due to the fact that the WW channel has a deficit of signal. Finally, in the following table you can find the results of the hypothesis test performed for HIG13005, including all the quantities discussed above:
Exercise for the reader
What does it mean to show 0.9σ in the Table above? </twistyPlugin twikiMakeVisibleInline> The answer is deceivingly simple: since P( q < qobs  N ) < 0.5, one needs to have signed deviations to note that qobs is to the left of the median of P( q  N ). In this case, since P( q < qobs  N ) < 0.5 and P( q > qobs  A ) > 0.5, it means that the observation is somewhere in between the qexp(A) and qexp(N) as can be seen in the following figure:
which is part of the plots for the individual ZZ and WW channels.
</twistyPlugin>
PracticeFor the spin2 (2^{+}_{m} in this case) vs spin0 (0^{+}) hypotheses test, the full H→WW→2a„¡°2ν analysis includes 4 channels corresponding to {0jet,1jet}⊗{7 TeV, 8 TeV}. For simplicity we will perform the exercise on the 0jet 8 TeV data card.
head jcp_hwwof_0j_8TeV.txt [...] Observation 5747 shapes * * jcp_hwwof_0j.input_8TeV.root histo_$PROCESS histo_$PROCESS_$SYSTEMATIC shapes data_obs * jcp_hwwof_0j.input_8TeV.root histo_Data bin j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev j0of8tev process qqbarH_ALT ggH_ALT ZH WH qqH ggH qqWW ggWW VV Top Zjets WjetsE Wgamma Wg3l Ztt WjetsM qqWW2j process 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 rate 129.869 146.427 1.728 5.833 3.033 234.884 3973.542 210.835 131.912 499.856 0.000 279.465 115.142 166.923 46.430 327.212 0.288 You will note that besides the usual SM signal contributions (ggH, qqH, ZH, and WH) there are two new processes (qqbarH_ALT and ggH_ALT) corresponding to the production of 2^{+}_{m} via quarkantiquark annihilation and gluonfusion. Let's look at the shape of the templates:
root l jcp_hwwof_0j.input_8TeV.root %CODE{"cpp"}% //Compare main signal and background histo_ggH>GetXaxis()>SetRangeUser(1,1) histo_ggH>SetLineColor(kBlack) histo_ggH>DrawNormalized("H") histo_qqWW>SetLineColor(kGray) histo_qqWW>DrawNormalized("Hsame") %ENDCODE%
You will notice multiple peaks. These correspond to an unrolled 2D histogram encoding the information of mT and mll as explained in https://twiki.cern.ch/twiki/bin/view/CMSPublic/Hig12042TWiki Basically, inside each peak mll is increasing, while different peaks correspond to larger mT values. You can compare the plots you obtained with the following:
Let's now compare the different signal model shapes:
root l jcp_hwwof_0j.input_8TeV.root %CODE{"cpp"}% //Define a region where it is easy to see the differences between the alternate signals histo_ggH>;GetXaxis()>SetRangeUser(1,0) histo_ggH>SetLineColor(kBlack) histo_ggH>DrawNormalized("H") histo_ggH_ALT>SetLineColor(kBlue) histo_ggH_ALT>DrawNormalized("Hsame") histo_qqbarH_ALT>SetLineColor(kRed) histo_qqbarH_ALT>DrawNormalized("Hsame") %ENDCODE%
You can see that the shapes are different, with different hypotheses having more or less signal in the different "peaks" (which correspond to different mT values), as shown in:
Building a nested modelThe hypothesis test is performed by constructing a nested model which has two relevant parameters:
You can immediately see that this general (nested) signal model encodes several interesting limiting cases:
The PhysicsModel that encodes the signal model above is the twoHypothesisHiggs. Let's create the binary workspace. Suppose we blindly run:
text2workspace.py jcp_hwwof_0j_8TeV.txt P HiggsAnalysis.CombinedLimit.HiggsJPC:twoHypothesisHiggs m 125.7 PO verbose o jcp_hww.root MH (not there before) will be assumed to be 125.7 Process ggH_ALT will get norm x Process ZH will get norm not_x Process qqH will get norm not_x Process qqbarH_ALT will get norm x Process WH will get norm not_x Process ggH will get norm not_x Process ggH_ALT will get norm x Process ZH will get norm not_x Process qqH will get norm not_x Process qqbarH_ALT will get norm x Process WH will get norm not_x Process ggH will get norm not_x See how the ggH_ALT and qqbarH_ALT are both being scaled with x? That is not what we want, since we would like each of the processes to be differentiated via fqq. To do this properly we need to tell the model to also include fqq:
text2workspace.py jcp_hwwof_0j_8TeV.txt P HiggsAnalysis.CombinedLimit.HiggsJPC:twoHypothesisHiggs m 125.7 PO verbose PO fqqIncluded o jcp_hww.root Will consider fqq = fraction of qqH in Alt signal (signal strength will be left floating) MH (not there before) will be assumed to be 125.7 Process ggH_ALT will get scaled by r_times_x_times_not_fqq Process ZH will get scaled by r_times_not_x Process qqH will get scaled by r_times_not_x Process qqbarH_ALT will get scaled by r_times_x_times_fqq Process WH will get scaled by r_times_not_x Process ggH will get scaled by r_times_not_x Process ggH_ALT will get scaled by r_times_x_times_not_fqq Process ZH will get scaled by r_times_not_x Process qqH will get scaled by r_times_not_x Process qqbarH_ALT will get scaled by r_times_x_times_fqq Process WH will get scaled by r_times_not_x Process ggH will get scaled by r_times_not_x In this case you see how ggH_ALT is scaled with r_times_x_times_not_fqq, i.e. r*x*(1fqq), and qqbarH_ALT is scaled with r_times_x_times_fqq, i.e. r*x*fqq.
Running toysLet's dive immediately into throwing toys for the three cases of interest:
GEN_OPTS="m 125.7 jcp_hww.root seed 8192" WHAT_OPTS="testStat=TEV singlePoint 1 saveHybridResult" HOW_OPTS="M HybridNew T 500 i 2 fork 6 clsAcc 0 fullBToys" combine $GEN_OPTS $WHAT_OPTS $HOW_OPTS generateExt=1 generateNuis=0 expectedFromGrid 0.5 n jcp_hww_prefit_exp &> jcp_hww_prefit_exp.log combine $GEN_OPTS $WHAT_OPTS $HOW_OPTS frequentist expectedFromGrid 0.5 n jcp_hww_postfit_exp &> jcp_hww_postfit_exp.log combine $GEN_OPTS $WHAT_OPTS $HOW_OPTS frequentist n jcp_hww_postfit_obs &> jcp_hww_postfit_obs.log grep H "^CLs" jcp_hww_{pre,post}*.log While the commands are running (they'll take a few minutes), let's decode the command lines (taken from "combine help"). There is a first set of common parameters which relate to what we are doing:
jcp_hww_prefit_exp.log:CLs = 0.0582329 +/ 0.0108136 jcp_hww_postfit_exp.log:CLs = 0.206827 +/ 0.0203793 jcp_hww_postfit_obs.log:CLs = 0.089172 +/ 0.0171273 You can see two things:
The difference between the postfit expected and the prefit expected is related to fact that the data does not accommodate a large signal.
We can check this also by performing fits for the signal strength for the two hypotheses: we create a workspace that has the signal strength as an additional parameter of interest, and then run fits for the signal strength text2workspace.py jcp_hwwof_0j_8TeV.txt P HiggsAnalysis.CombinedLimit.HiggsJPC:twoHypothesisHiggs m 125.7 PO verbose PO fqqIncluded o jcp_hww_2poi.root PO muAsPOI combine jcp_hww_2poi.root M MultiDimFit m 125.7 P r floatOtherPOI=0 setPhysicsModelParameters x=0 n MU_SCALAR  MultiDimFit  best fit parameter values: r : +0.673 combine jcp_hww_2poi.root M MultiDimFit m 125.7 P r floatOtherPOI=0 setPhysicsModelParameters x=1 n MU_SPIN2 best fit parameter values: r : +0.988 This implies that our postfit expectation for the signal yield is less than the SM Higgs prediction, especially for the spin 0 case, and consequently also our postfit expectation for the separation between the two hypotheses is worse (with less events, it's harder to separate them).
While the approach with toys and CLs is statistically more sound (or at least more in line with the conventions used in particle physics), we can see how well the two models fit the data also by looking at the likelihood as function of combine jcp_hww_2poi.root M MultiDimFit m 125.7 P x floatOtherPOI=1 algo=grid n JCP_X_SCAN points=200 combine jcp_hww_2poi.root M MultiDimFit m 125.7 P x floatOtherPOI=1 algo=grid n JCP_X_SCAN_Asimov points=200 t 1 expectSignal=1 setPhysicsModelParameters x=0,r=1
In all the results above, f_{qq} is not floated: it is a constant parameter set at zero. This means that everything we have done up to now tests gg→2^{+}_{m} vs 0^{+}.
Exercise for the reader
Redo the same tests for the fqq=1 case? </twistyPlugin twikiMakeVisibleInline> The option allowing to set a model parameter value is setPhysicsModelParameters.
To rerun CLs, one can do combine $GEN_OPTS $WHAT_OPTS $HOW_OPTS generateExt=1 generateNuis=0 expectedFromGrid 0.5 n jcp_hww_prefit_exp1 setPhysicsModelParameters fqq=1 &> jcp_hww_prefit_exp1.log combine $GEN_OPTS $WHAT_OPTS $HOW_OPTS frequentist expectedFromGrid 0.5 n jcp_hww_postfit_exp1 setPhysicsModelParameters fqq=1 &> jcp_hww_postfit_exp1.log combine $GEN_OPTS $WHAT_OPTS $HOW_OPTS frequentist n jcp_hww_postfit_obs setPhysicsModelParameters fqq=1 &> jcp_hww_postfit_obs1.log
The output is for the two cases is:
This means that:  a 2^{+}_{m} graviton produced in qqbar annihilation would be easier to distinguish from the SM Higgs compared to a 2^{+}_{m} graviton produced in gluon fusion  also for the qqbar case the postfit sensitivity is worse than the prefit one (that's understandable since we had already seen that the postfit signal strength for the SM is below unity, and since the 2^{+}_{m} are not that different)  also for the qqbar case, the observed exclusion is stronger than the expectation, meaning that the data is more SMlike than gravitonlike (it's the same data, and the expected distributions for the graviton in the two production modes lie on the same side of the SM, both preferring smaller mT and larger m(ll) )
Again, one can see that also from looking at the likelihood as function of x in th two cases:
</twistyPlugin>
Plotting
For simplicity, let's work with the results of the postfit observed run above which are saved in To do this we need to use a macro that will extract the teststatistic values from the HypoTestResult and save them in a ROOT TTree:
root l q b higgsCombinejcp_hww_postfit_obs.HybridNew.mH125.7.8192.root "${CMSSW_BASE}/src/HiggsAnalysis/CombinedLimit/test/plotting/hypoTestResultTree.cxx(\"jcp_hww_postfit_obs.qvals.root\",125.7,1,\"x\")" &> jcp_hww_postfit_obs.qvals.log cat jcp_hww_postfit_obs.qvals.log Attaching file higgsCombinejcp_hww_postfit_obs.HybridNew.mH125.7.8192.root as _file0... Processing /Users/adavid/workspace/CMSSW611master/src/HiggsAnalysis/CombinedLimit/test/plotting/hypoTestResultTree.cxx("jcp_hww_postfit_obs.qvals.root",125.7,1,"x")...  HypoTestResult_mh125.7_x1_2618602738 Saved test statistics distributions for 996 signal toys and 996 background toys to jcp_hww_postfit_obs.qvals.root.
You can see that the macro found 996 toys in total, which correctly matches the parallel work of 6 processors (166*6=996 while 167*6=1002). Open the output file
root l jcp_hww_postfit_obs.qvals.root and let's explore: %CODE{"cpp"}% q>Print() %ENDCODE% ****************************************************************************** *Tree :q : Test statistics * *Entries : 1993 : Total = 42889 bytes File Size = 8799 * * : : Tree compression factor = 4.95 * ****************************************************************************** *Br 0 :q : q/F * *Entries : 1993 : Total Size= 8495 bytes File Size = 7539 * *Baskets : 1 : Basket Size= 32000 bytes Compression= 1.07 * *............................................................................* *Br 1 :mh : mh/F * *Entries : 1993 : Total Size= 8500 bytes File Size = 143 * *Baskets : 1 : Basket Size= 32000 bytes Compression= 56.21 * *............................................................................* *Br 2 :weight : weight/F * *Entries : 1993 : Total Size= 8520 bytes File Size = 146 * *Baskets : 1 : Basket Size= 32000 bytes Compression= 55.08 * *............................................................................* *Br 3 :type : type/I * *Entries : 1993 : Total Size= 8510 bytes File Size = 145 * *Baskets : 1 : Basket Size= 32000 bytes Compression= 55.45 * *............................................................................* *Br 4 :r : r/F * *Entries : 1993 : Total Size= 8495 bytes File Size = 141 * *Baskets : 1 : Basket Size= 32000 bytes Compression= 57.00 * *............................................................................*
The %CODE{"cpp"}% q>Draw("type") gPad>SetLogy() %ENDCODE%
And finally, you can quickly visualise the results of the hypothesis test by running: %CODE{"cpp"}% gPad>SetLogy(kFalse) q>Draw("2*q>>hnull(50,10,10)","type>0","goff") q>Draw("2*q>>halt(50,10,10)", "type<0","goff") q>Draw("2*q>>hobs(50,10,10)", "type==0","goff") hnull>SetLineColor(kBlue) halt>SetLineColor(kOrange) hobs>SetLineColor(kBlack) hobs>Scale(1e3) hobs>SetFillColor(kBlack) hnull>Draw() halt>Draw("same") hobs>Draw("same") %ENDCODE%
Note how the variable
Comments
