Symbolic Regression for Observable Event-Level Reweighting

Description

Monte Carlo event generators such as Pythia and Herwig are indispensable tools in high-energy physics, but their predictions do not perfectly reproduce experimental data. A common strategy is to apply event-level reweighting: assigning each simulated event a weight so that the reweighted simulation better matches observed distributions. Currently, these weights are often derived from binned ratios in only one or two observables, which can miss important correlations. This project aims to use symbolic regression to discover closed-form, interpretable reweighting functions that operate on event-level features (jet multiplicities, kinematic variables, angular correlations, etc.). Unlike black-box ML reweighting, symbolic regression returns human-readable mathematical expressions, enabling physicists to understand why certain regions of phase space require correction and potentially revealing deficiencies in the underlying physics modeling. The student will implement a symbolic regression pipeline (using frameworks such as PySR, gplearn, or equivalent) that ingests Monte Carlo truth-level event data alongside reference distributions (from higher-order calculations or unfolded data) and outputs analytic reweighting formulae. The pipeline should support configurable complexity penalties, dimensional analysis constraints, and validation through closure tests.

Duration

Total project length: 175/350 hours.

Task ideas

Expected results:

Difficulty level

Intermediate

Requirements

Test

Please use this link to access the test for this project.

Mentors

Please DO NOT contact mentors directly by email. Instead, please email ml4-sci@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.

Corresponding Project

Participating Organizations