ML-Based Simulation Bias Analysis - Pythia vs Herwig vs Data

Description

Different Monte Carlo event generators, most notably Pythia and Herwig, employ distinct physics models for parton showering, hadronization, and the underlying event. These modeling choices lead to systematic differences in predicted distributions that directly impact physics measurements and searches for new phenomena at the LHC. Quantifying these inter-generator biases, and their discrepancies with data, is crucial for assigning robust systematic uncertainties. This project applies machine learning classifiers to perform a systematic, high-dimensional comparison of Pythia, Herwig, and (where available) unfolded experimental data. Rather than examining observables one at a time, ML classifiers can detect subtle multi-dimensional correlations that distinguish generators. The classifier output itself, and the features driving its decisions, reveal which regions of phase space carry the largest modeling uncertainties. The student will train classifiers (boosted decision trees, neural networks, or similar) to discriminate between generator samples and/or data, then extract interpretable information about the nature and location of biases using techniques such as SHAP values, feature importance, and learned reweighting.

Duration

Total project length: 175/350 hours.

Task ideas

Expected results:

Difficulty level

Intermediate/Advanced

Requirements

Test

Please use this link to access the test for this project.

Mentors

Please DO NOT contact mentors directly by email. Instead, please email ml4-sci@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.

Corresponding Project

Participating Organizations