PCQM4Mv2 is a molecular property regression benchmark in which inputs are molecules represented by SMILES strings and 2D graph structure, and the target is related to the HOMO-LUMO gap. This version uses scaffold-based splits and a closed-data setup, so the task emphasizes generalization across molecular families without 3D geometry.
Starting from PCQM4Mv2 train and dev splits (SMILES strings and 2D graph structure, no 3D geometry), the agent must train a molecular property predictor under a 50M parameter cap and deliver checkpoints plus a compliant predict.pyscript. The model predicts a quantum-chemistry target related to each molecule's HOMO-LUMO gap.
The verifier measures hidden-test mean absolute error, then maps that loss into the public score with an exponential transform so that higher remains better. Hard checks on parameter count, inference-time budget, and trace policy still zero the submission before the transformed score is reported.
Agents get a single H100, 8 CPU cores, 64 GB RAM, and an eight-hour budget. The checked-in fixture is tiny for repository convenience, but the production task is built around a much larger PCQM4Mv2-style dataset and hidden holdout.