Reinterpretation of Data

Good data can be interpreted many ways. (So can bad data of course, but that’s boring.)

This is not a comment on any recent epidemiology modelling, though I suppose there may be some issues in common. Here in my little loft-space, with my various teleconference options now augmented by “Teams”, I am still carrying on with particle physics, while my daughter has online lessons downstairs and my son frets about cancelled A levels. And my wife is still at school teaching.

But yes, particle physics, if you fancy a break.

As experimental particle physicists, we spend a lot of resources (time, effort, money) collecting our data, and analysing it to learn about the fundamental constituents and forces of nature. Sociologically within the experiments at the Large Hadron Collider, there is something of a divide between those who make “measurements” and those who do “searches” with these data. In both cases though, you would like to make your analysis as enduringly useful, as possible.

Thus, if you make a measurement of, say, the production rate of W bosons in proton-proton collisions, you have a choice. W bosons are not directly measurable – they decay rapidly to other particles, which leave traces of one kind or another in our detectors.

For example, we often deduce the rate of W production by looking only at those W bosons which decay into an electron plus a neutrino. We know what fraction of W bosons do this, because that fraction has been measured rather precisely in other experiments. So if we could measure the total number of electrons and neutrinos, and be sure they all came from W bosons, we could just multiply up to get the total rate.

But that is a big “if”, right there. In fact we can’t.

First of all, we don’t detect neutrinos, all we know is there was some missing momentum transverse to the beam. It was probably a neutrino, if you assume the Standard Model is right, but if you are going to make that assumption you have already limited the generality of your result. (It could, in principle, be a dark matter particle you just made, for example!) Also, our detectors can’t measure missing momentum below some minimum value, and cannot see electrons if they have such a low momentum that they don’t pass our online selection, or they go down the beam-pipe, where we have no detectors.

The way out of this is to measure what we call a “particle-level fiducial cross section”, and then, if you want, reinterpret that in terms of a total W boson rate.

The measured cross section will give a rate for the production of electrons and missing momentum, with momentum values high enough that we can detect them. Defining a cross section like this makes very few assumptions about where the electron or missing momentum came from. Therefore one can indeed even reinterpret the measurement to get information about possible dark matter production too, or other beyond-the-Standard-Model effects; precisely because you have not assumed the Standard Model when making the measurement. For the same reason, your measurement can be used for rather precise tests of the Standard Model itself.

That’s a measurement. There are related considerations when doing a “search” too. Here we typically pick some theoretical idea (or class of ideas) which would extend the Standard Model, and look to see if there is any evidence for it in the data.

In this case, you would like your search to apply to as wide a variety of theoretical ideas as possible, because know that most (probably all) of them will be wrong, and generally no one wants to spend roughly year searching for each individual variation in the model-builders’ heated imagination. Make your search generic, and as independent of the specific model as possible, and you can cover more ground.

Problems can arise though if, for example, you use “control regions” to make estimates of the Standard Model background from the data. Using the Standard Model prediction itself sounds as though it would be more theory-dependent, and in some senses it is. But if you take the background expectation from a control region in the data, you are implicitly assuming that your new physics model didn’t contribute in that region, and you are still using the Standard Model to translate from the control region to your “signal” region anyway.

A good search, or a good measurement, will take these (and more) issues into account, as well as publishing enough information that the assumptions can be checked, or sometimes even modified, independently. The impetus for this first blog post of the year was the appearance on the arXiv today of a collection of recommendations and advice, born of experience, on how to do this. One big reason for the lack of posts so far this year is the fact that I have been having a lot of fun working on the measurement side of this challenge; see Contur, and lots of updates in the Les Houches proceedings.

I realise this is a bit of a technical post and perhaps not as clear as I would like, sorry. But it seems important to get started on writing again and click “publish”, after a few failed attempts over the past months. I will try to do better soon, perhaps good enough that I can write something worthy of the Cosmic Shambles network soon.

Meanwhile, if you’ve made it this far, let me advertise their upcoming Stay At Home Festival, which looks set to have plenty of interesting science and much more besides.





About Jon Butterworth

UCL Physics prof, works on LHC, writes (books, Cosmic Shambles and elsewhere). Citizen of England, UK, Europe & Nowhere, apparently.
This entry was posted in Particle Physics, Physics, Science and tagged , , , , , , , . Bookmark the permalink.