Different Monte Carlo event generators, most notably Pythia and Herwig, employ distinct physics models for parton showering, hadronization, and the underlying event. These modeling choices lead to systematic differences in predicted distributions that directly impact physics measurements and searches for new phenomena at the LHC. Quantifying these inter-generator biases, and their discrepancies with data, is crucial for assigning robust systematic uncertainties. This project applies machine learning classifiers to perform a systematic, high-dimensional comparison of Pythia, Herwig, and (where available) unfolded experimental data. Rather than examining observables one at a time, ML classifiers can detect subtle multi-dimensional correlations that distinguish generators. The classifier output itself, and the features driving its decisions, reveal which regions of phase space carry the largest modeling uncertainties. The student will train classifiers (boosted decision trees, neural networks, or similar) to discriminate between generator samples and/or data, then extract interpretable information about the nature and location of biases using techniques such as SHAP values, feature importance, and learned reweighting.
Total project length: 175/350 hours.
Intermediate/Advanced
Please use this link to access the test for this project.
Please DO NOT contact mentors directly by email. Instead, please email ml4-sci@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.