About

Overview

FrontierSWE is a benchmark testing coding agents at the limits of human abilities. We collect ultra-long-horizon technical challenges from domains like performance engineering, computational science, and ML research.

Despite a 24-hour time budget per task, frontier models barely make progress. Tasks are sourced from partner companies, researchers, and engineers to reflect real-world problems.

FrontierSWE is built by Proximal, a research company working on evaluations and infrastructure for frontier coding agents.

Contributors

  • Justus Mattern
  • Calvin Chen
  • Evan Chu
  • Freeman Jiang
  • Rajan Agarwal