Unfolding#

Here we will setup and test an unfolding problem. You my use pyunfold (as in the lecture) or another piece of unfolding software you find for this exercise. You may not just do an efficiency correction or matrix inversion. Make sure to submit the executed version of your notebook. Make sure to label all plots clearly.

The samples you will use to unfold are below. They are from the pythia MC which generates random proton-proton collisions. The units are in GeV (billion electron volts). The two columns in dataset 1 are the truth momentum and the reconstructed momentum. In dataset 2 there is only the reconstructed momentum.

These are jets and the reconstructed momentum has been smeared in a realistic way compared to the truth momentum.

Generate the response matrix with:

(dataset 1) https://courses.physics.illinois.edu/phys398dap/fa2023/data/unf_response.csv

and use this as the ‘data’ to unfold:

(dataset 2) https://courses.physics.illinois.edu/phys398dap/fa2023/data/unf_data.csv

Note that the spectra fall steeply with increasing momentum, you might find it better to use logarithmically spaced bins in the histogram. You can do that straightforwardly by setting up the problem in terms of log(momentum). In any case, just be very clear about what you have done.

Part 1:

Write a paragraph summarizing what you’ve done. (This is the first part of the homework but do it last. This is for me to understand what you’ve done as I read through your work.)

Part 2:

Plot the truth and reconstructed quantities on the same plot; then take a ratio of those two histograms. Comment on what you see.

Generate a response matrix using the simulations provided. Plot the response matrix both before and after the normlization of the matrix.

Everything in this part is using (dataset 1).

Part 3:

Unfold the measured distribution with the response matrix (the technical closure). Justify the convergence criteria used in the unfolding. Either get convergence within the uncertainties on the sample statistics or explain what you tried to get the convergence to improve (showing plots with multiple convergence criteria would be very illustrative…).

Everything in this part is using (dataset 1).

Part 4: Use the setup from Part 4 to unfold the reconstructed sample (dataset 2). Is this unfolding adequate? Why or why not?