Monte Carlo event generators such as Pythia and Herwig are indispensable tools in high-energy physics, but their predictions do not perfectly reproduce experimental data. A common strategy is to apply event-level reweighting: assigning each simulated event a weight so that the reweighted simulation better matches observed distributions. Currently, these weights are often derived from binned ratios in only one or two observables, which can miss important correlations. This project aims to use symbolic regression to discover closed-form, interpretable reweighting functions that operate on event-level features (jet multiplicities, kinematic variables, angular correlations, etc.). Unlike black-box ML reweighting, symbolic regression returns human-readable mathematical expressions, enabling physicists to understand why certain regions of phase space require correction and potentially revealing deficiencies in the underlying physics modeling. The student will implement a symbolic regression pipeline (using frameworks such as PySR, gplearn, or equivalent) that ingests Monte Carlo truth-level event data alongside reference distributions (from higher-order calculations or unfolded data) and outputs analytic reweighting formulae. The pipeline should support configurable complexity penalties, dimensional analysis constraints, and validation through closure tests.
Total project length: 175/350 hours.
Intermediate
Please use this link to access the test for this project.
Please DO NOT contact mentors directly by email. Instead, please email ml4-sci@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.