04 Implementation

Dart → Haskell

Results

#ModelSuccess RateAvgBest
1
GPT-5.4
Codex
0/514%21%
2
Claude Opus 4.6
Claude Code
0/53.0%3.8%
3
Gemini 3.1 Pro
Gemini CLI
0/52.2%11%
4
Kimi K2.5
Kimi CLI
0/51.1%3.8%
5
Qwen3.6-Plus
Qwen Code
0/50.2%0.8%

Background

dart_style is a production formatter whose behavior is defined by both syntax rules and a large accumulated golden corpus. The reference implementation includes separate short-style and tall-style pipelines tied to different Dart language versions.

In this task, the formatter source tree and test corpus act as the operative specification. Rewriting the tool in Haskell requires reproducing the formatter's decisions byte-for-byte across both formatting regimes.

Task

Starting from the Dart dart_styleformatter source tree and a Haskell toolchain, the agent must rebuild the formatter as a standalone Haskell executable. The task is not limited to a handful of pretty-printer rules; it includes both of the formatter's modern pipelines, including the short-style and tall-style regimes used by different Dart language versions.

  • Match the command line contract of the formatter.
  • Preserve byte-for-byte formatting behavior on a large hidden golden suite.
  • Handle the language-version split that changes how formatting decisions are made.

Evaluation

The verifier runs a large golden suite derived from formatter tests and additional corpus-sourced or fuzzed files. Anti-cheat, build, and formatter-discovery failures zero the result; otherwise the score is plain hidden pass rate.

  • Hidden files cover both short-style and tall-style formatting behavior.
  • Performance-oriented files are included, but the public score is still reported as pass rate rather than runtime speed.
  • Output must match the reference formatter byte-for-byte.

No model was able to complete this task successfully, so we used overall test pass rate as a partial reward to rank models.

Environment And Constraints

The task runs without internet access on a CPU-only environment with a preinstalled Haskell toolchain. Agents have the reference formatter source tree and enough tooling to build a standalone executable, but they cannot look up external grammar or library documentation while they work.