LM-JEPA for Symbolic Regression

Description

Symbolic regression can be used to rapidly provide solutions to problems in science which may have large computational complexity or may even be intractable. It can be used to discover a symbolic expression describing data such as a physical law. Previous work has explored combinations of Transformer models and genetic algorithms or reinforcement learning. This project will explore language model joint embedding predictive architectures for symbolic regression.

Duration

Total project length: 175/350 hours.

Task ideas and expected results

Develop symbolic regression models based on reduced size versions of LLM-JEPA model architectures.
Benchmark these models on synthetic symbolic regression datasets.
Document results formally or informally to support ongoing research.

Requirements

Significant experience with developing models in Python (preferably using pytorch). Experience with Joint Embedding Predictive Architectures and/or LLM development is preferred.

Test

Please use this link to access the test for this project.

Difficulty Level

Advanced

Mentors

Eric Reinhardt (University of Alabama)
Dinesh Ramakrishnan (University of Alabama)
Sergei Gleyzer (University of Alabama)
Nobuchika Okada (University of Alabama)
Ritesh Bhalerao

Please DO NOT contact mentors directly by email. Questions should instead be directed to ml4-sci@cern.ch which is forwarded to mentors. To submit your proposal, CV, and test task solutions, please use this Google form.

Corresponding Project

SYMBA

Participating Organizations

Alabama