Transformer Models for Symbolic Regression

Description

Symbolic regression can be used to rapidly provide solutions to problems in science which may have large computational complexity or may even be intractable. It can be used to discover a symbolic expression describing data such as a physical law. Previous work has explored combinations of Transformer models combined with genetic algorithms or reinforcement learning. Future work on this project might extend those approaches but could also include explorations of alternative approaches such as incorporation of Kolmogorov-Arnold Layers or novel LLM-based approaches. As a concrete testbed for these new algorithms, the project will focus on predicting physical quantities, such as cross sections in high-energy physics, e.g a probability that a particular process takes place in the interaction of elementary particles. Its measure provides a testable link between theory and experiment. It is obtained theoretically mainly by calculating the squared amplitude.

Duration

Total project length: 175/350 hours.

Task ideas and expected results

Develop symbolic regression models based on next-gen transformer architectures
Benchmark these models on synthetic and high-energy physics datasets

Requirements

Significant experience with Transformer machine learning models in Python (preferably using pytorch).

Difficulty Level

Intermediate

Test

Please use this link to access the test for this project.

Mentors

Eric Reinhardt (University of Alabama)
Harrison Prosper (Florida State University)
Marco Knipfer (University of Alabama)
Dinesh Ramakrishnan (University of Alabama)
François Charton

Please DO NOT contact mentors directly by email. Instead, please email ml4-sci@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.

Corresponding Project

SYMBA