import math
import matplotlib.pyplot as plt
import numpy as np
import random
import seaborn as sns
import pandas as pd
!pip install lmfit
from lmfit import Model
import os.path
import subprocess
from numpy.fft import fft, ifft
def wget_data(url):
    local_path='./tmp_data'
    subprocess.run(["wget", "-nc", "-P", local_path, url])

Fourier Methods#

Fourier Methods involve two pieces:

  • Fourier Series

  • Fourier Transforms

Both are extremely powerful for finding patterns in data. This is an extremely broad topic.

Fourier Series

We can write a Fourier series as:

\[f(x) = \frac{a_0}{2} + \sum_{n=1}^{∞}a_n \cos nx + \sum_{n=1}^{∞}b_n \sin nx\]

for a periodic function \(f(x)\) which has a finite number of finite discontinuities and only a finite number of extrema over the range of 0 to \(2\pi\). The coefficients come from the integrals: $\(a_n = \frac{1}{\pi} ∫_0^{2\pi}f(t)\cos nt dt\)\( and \)\(b_n = \frac{1}{\pi} ∫_0^{2\pi}f(t)\sin nt dt\)\( Then the sums will form a complete orthoganal set and \)f(x)$ can be specified with any desired accuracy by knowing the coefficients here.

There are many such periodic functions in physics and so this has wide applicability.

Here we’re going to focus on extracting these correlations from data. I’m going to use the specific example of heavy-ion collisions (my field) but the idea of extracting information from a limited, fluctuating dataset is not unique.

Physics Motivation In relativistic collisions between two nuclei, the coefficents Fourier coefficients, \(v_n\), are proportional to the eccentricities of the initial state of the overlap region between two nuclei. Here is a simulation of such a collision projected onto the x-y plane.

Screenshot 2023-10-27 at 8.13.47 PM.png

The constant of proportionality is related to the viscosity of the matter created in these collisions (exact details due to relativistic viscous hydrodynamics). Measuring these \(v_n\) values, is the most direct experimental method to constrain the viscosity of the matter created in heavy-ion collisions.

You can think of expanding the shape of the overlap region in terms of \(n-\)th order eccentricities.

ecc_n_clean.png

In reality, determination of the Fourier coefficients can be unstable. Here is an example of how that can be…

Let’s say we’ve got \(N\) particles in an event. We can clearly describe the particle distribution by a Fourier series: $\(\frac{dN}{d\phi} ∝ 1 + \sum_{n=1}^{∞} 2v_n\cos(n(\phi-\Psi_n))\)\( Instead of a \)\sin\( and \)\cos\( term we now have \)n\( phases, \)\Psi_n\( and just the \)\cos$ term.

However, we don’t know what \(\Psi_N\) is so we have to determine it from the data itself. Determining \(v_n\) directly as above is difficult mainly because each event is differet from any other event and each single event has a finite number of particles.

Here is a plot from real data of \(\frac{dN}{d\phi}\) paper. Thre specific events are shown; each event has about a 1000 particles (error bars are the \(\sqrt{N}\) errors we have discussed). Impossible to constrain \(v_n\) and the \(\Psi_n\) values in these events.

Screenshot 2023-10-29 at 4.38.07 PM.png

Determining Coefficients from Data

A more direct way to deal with this issue is to use correlations. If we can describe the angular distribution of single particles as: $\(\frac{dN}{d\phi} ∝ 1 + \sum_{n=1}^{∞} 2v_n\cos(n(\phi-\Psi_n))\)\( then we can describe the distribution of pairs of particles as: \)\(\frac{dN^{ij}}{d\phi_i d\phi_j} ∝ [1 + \sum_{n=1}^{∞} 2v_n\cos(n(\phi_i-\Psi_n))][1 + \sum_{n=1}^{∞} 2v_n\cos(n(\phi_j-\Psi_n))]\)\( \)\(\frac{dN^{ij}}{d\phi_i d\phi_j} ∝ 1 + \sum_{n=1}^{\infty} 2v_2^i v_2^j \cos(n(\phi_i - \phi_j))\)\( Now this is in terms of things we can directly measure, the angles of individual particles. The rhs only depends on the difference between the angles so we can define \)\Delta \phi \equiv \phi_i - \phi_j\( and rewrite the equation as: \)\(\frac{dN^{ij}}{d\Delta\phi} ∝ 1 + \sum_{n=1}^{\infty} 2v_n^i v_n^j \cos(n(\Delta\phi))\)$

Here are the same three events from above, now showing the \(\frac{dN}{d\Delta\phi}\) distribution as well. This looks clearer (1000 data points yield about 1000000 pairs).

Screenshot 2023-10-29 at 4.37.06 PM.png

So we’ve eliminated a hard to determine phase by only looking at the angular difference between pairs of particles. Constructing a double loop over each pair of non-identical particles, we can construct \(\frac{dN^{ij}}{d\Delta \phi}\). This is a correlation function. This is well suited to constraining the coefficients of the Fourier series (as we’ll see below) but correlations functions are used in many scenarios (for a more formal districription see Pruneau Data Analysis Techniques for Physical Scientists.

For our purposes we don’t even care about the normalization of this correlation function. Exactly this quantity was measured in lead-lead collisions at the LHC and the first eight terms of the Fourier series were extracted (paper). Here \(V_{n\Delta} = v_n^i v_n^j\) from our notation.

Screenshot 2023-10-27 at 1.32.44 PM.png

This is amazing; under the assumption that \(v_n \propto \varepsilon_n\) then we are really measuring the shape of the nuclear overlap and we are decomposing it in terms of the various ordered harmonincs.

However, when two-particle correlations are measured, what they are actually sensitive to is \(v_n^iv_n^j\). This causes some complications in extracting \(v_n\).

First, let’s take a look at some data generated with:

  • 100 events

  • 50 particles in each event

  • every particle in event having exactly the same \(v_2\) value (\(v_2 = 0.25\))

  • all other \(v_n\) values are zero

Here is some (inefficient) code to pair every particle with every other particle from the same event. We only pair particles from the same event because those are the ones that the math above allowed for.


wget_data('https://courses.physics.illinois.edu/phys398dap/fa2023/data/vn_sigma00.txt')
data = pd.read_csv('tmp_data/vn_sigma00.txt', header=None)
print(data)
import itertools
dphi_vals = np.array([])
for row1, row2 in itertools.combinations(data.iterrows(),2):
    #print("row 1",row1[1])
    #index, value = row1[1].items()
    for index, value1 in row1[1].items():
      arr1 = value1.split(' ')
    #  print(value1,"smart",arr1[0],arr1[1],arr1[2])
    for index, value2 in row2[1].items():
      arr2 = value2.split(' ')
    #print("pair",arr1[2],arr2[2])
    if(arr1[0] != arr2[0]):
      continue
    dphi = float(arr1[2]) - float(arr2[2])
    #print(dphi)
    if dphi > np.pi:
      dphi = dphi - 2.0*np.pi
    if dphi < -np.pi:
      dphi = dphi + 2.0*np.pi
    dphi_vals = np.append(dphi_vals,dphi)

Going back to our least squares fitting from earlier in the semester, we can extract the modulation.


nbins = 65
bins = np.linspace(-np.pi, np.pi, nbins)
counts, bins, bars  = plt.hist(dphi_vals,bins=bins)

uncertainties = np.sqrt(counts)
bins = bins[:np.size(bins)-1]
num_bins = len(bins) - 1

x = bins
y = counts


def cos_mod(x, amp, v2):
    """1-d gaussian: gaussian(x, amp, cen, wid)"""
    return (amp*(1.0 + 2.0*v2*v2 * np.cos(2*x)))

gmodel = Model(cos_mod)
result = gmodel.fit(y, x=x, weights=1.0/uncertainties, amp = 3000, v2=0.15)

print(result.fit_report())

plt.plot(x, y, 'o')
plt.plot(x, result.init_fit, '--', label='initial fit')
plt.plot(x, result.best_fit, '-', label='best fit')
plt.legend()
plt.show()

The \(v_2\) value we extract is consistent with the input value of \(v_2 = 0.1\). This is good and reasonably convincing that we don’t have bugs in the code itself.

Now, let’s look at a more interesting scenario:

  • 100 events

  • 50 particles in each event

  • every particle in a specific event having exactly the same \(v_2\) value but the \(v_2\) value of each event drawn from a Gaussian centered at 0.25 with a \(\sigma\) of 0.10

  • all other \(v_n\) values are zero

wget_data('https://courses.physics.illinois.edu/phys398dap/fa2023/data/vn_sigma10.txt')
data = pd.read_csv('tmp_data/vn_sigma10.txt', header=None)
print(data)
import itertools
dphi_vals_sigma01 = np.array([])
for row1, row2 in itertools.combinations(data.iterrows(),2):
    #print("row 1",row1[1])
    #index, value = row1[1].items()
    for index, value1 in row1[1].items():
      arr1 = value1.split(' ')
    #  print(value1,"smart",arr1[0],arr1[1],arr1[2])
    for index, value2 in row2[1].items():
      arr2 = value2.split(' ')
    #print("pair",arr1[2],arr2[2])
    if(arr1[0] != arr2[0]):
      continue
    dphi = float(arr1[2]) - float(arr2[2])
    #print(dphi)
    if dphi > np.pi:
      dphi = dphi - 2.0*np.pi
    if dphi < -np.pi:
      dphi = dphi + 2.0*np.pi
    dphi_vals_sigma01 = np.append(dphi_vals_sigma01,dphi)


nbins = 65
bins = np.linspace(-np.pi, np.pi, nbins)
counts, bins, bars  = plt.hist(dphi_vals_sigma01,bins=bins)

uncertainties = np.sqrt(counts)
bins = bins[:np.size(bins)-1]
num_bins = len(bins) - 1


#cos_hist, edges = np.histogram(dphi_vals, bins=bins)
#print(edges)
x