dart_style is a production formatter whose behavior is defined by both syntax rules and a large accumulated golden corpus. The reference implementation includes separate short-style and tall-style pipelines tied to different Dart language versions.
In this task, the formatter source tree and test corpus act as the operative specification. Rewriting the tool in Haskell requires reproducing the formatter's decisions byte-for-byte across both formatting regimes.
Starting from the Dart dart_styleformatter source tree and a Haskell toolchain, the agent must rebuild the formatter as a standalone Haskell executable. The task is not limited to a handful of pretty-printer rules; it includes both of the formatter's modern pipelines, including the short-style and tall-style regimes used by different Dart language versions.
The verifier runs a large golden suite derived from formatter tests and additional corpus-sourced or fuzzed files. Anti-cheat, build, and formatter-discovery failures zero the result; otherwise the score is plain hidden pass rate.
No model was able to complete this task successfully, so we used overall test pass rate as a partial reward to rank models.
The task runs without internet access on a CPU-only environment with a preinstalled Haskell toolchain. Agents have the reference formatter source tree and enough tooling to build a standalone executable, but they cannot look up external grammar or library documentation while they work.