Symbolic regression can be used to rapidly provide solutions to problems in science which may have large computational complexity or may even be intractable. It can be used to discover a symbolic expression describing data such as a physical law. Previous work has explored combinations of Transformer models and genetic algorithms or reinforcement learning. This project will explore language model joint embedding predictive architectures for symbolic regression.
Total project length: 175/350 hours.
Significant experience with developing models in Python (preferably using pytorch). Experience with Joint Embedding Predictive Architectures and/or LLM development is preferred.
Please use this link to access the test for this project.
Advanced
Please DO NOT contact mentors directly by email. Questions should instead be directed to ml4-sci@cern.ch which is forwarded to mentors. To submit your proposal, CV, and test task solutions, please use this Google form.