About

Overview

FrontierSWE is a benchmark testing coding agents at the limits of human abilities. We collect ultra-long-horizon technical challenges from domains like performance engineering, computational science, and ML research.

Despite a 24-hour time budget per task, frontier models barely make progress. Tasks are sourced from partner companies, researchers, and engineers to reflect real-world problems.

FrontierSWE is built by Proximal, a research company working on evaluations and infrastructure for frontier coding agents.

Contributors

Justus Mattern
Calvin Chen
Evan Chu
Freeman Jiang
Rajan Agarwal

Contact

hello@proximal.ai