Pyright 1.1.400 is a production static analyzer implemented as a large TypeScript codebase. Its runtime is shaped by parsing, binding, type evaluation, caching, and large-codebase traversal, while its external contract is the exact set of diagnostics it reports.
This is a performance task on a real TypeScript codebase, not a clean room rewrite. The agent starts inside Pyright 1.1.400 and must make it faster while preserving exact diagnostic behavior.
The verifier applies a hard correctness gate first, then measures geometric-mean speedup on public and hidden workloads using ABBA-style paired timing. Build failure, Jest failure, diagnostic drift, anti-cheat failure, or missing benchmark data all zero the reward.
Agents work offline on an 8 CPU, 32 GB environment with all npm dependencies preinstalled. The task is intentionally CPU-bound: the engineering challenge is understanding and reshaping a real codebase, not outsourcing work to extra hardware.