LM-JEPA for Symbolic Regression

Description

Symbolic regression can be used to rapidly provide solutions to problems in science which may have large computational complexity or may even be intractable. It can be used to discover a symbolic expression describing data such as a physical law. Previous work has explored combinations of Transformer models and genetic algorithms or reinforcement learning. This project will explore language model joint embedding predictive architectures for symbolic regression.

Duration

Total project length: 175/350 hours.

Task ideas and expected results

Requirements

Significant experience with developing models in Python (preferably using pytorch). Experience with Joint Embedding Predictive Architectures and/or LLM development is preferred.

Test

Please use this link to access the test for this project.

Difficulty Level

Advanced

Mentors

Please DO NOT contact mentors directly by email. Questions should instead be directed to ml4-sci@cern.ch which is forwarded to mentors. To submit your proposal, CV, and test task solutions, please use this Google form.

Corresponding Project

Participating Organizations