09 Performance

libexpat to x86-64 Assembly

Results

#ModelCorrectnessAvgBest
1
Claude Opus 4.6
Claude Code
0/50.1200.227
2
Kimi K2.5
Kimi CLI
0/50.0090.025
3
Qwen3.6-Plus
Qwen Code
0/50.0020.005
4
Gemini 3.1 Pro
Gemini CLI
0/50.0000.000
5
GPT-5.4
Codex
0/50.0000.000

Background

libexpat 2.6.4 is a mature C XML parser exposed as a shared library with a stable ABI. In this task, the library itself is the spec: existing C callers should be able to link against the submission without changing their code.

The task combines XML parsing semantics, exported symbol compatibility, shared-library behavior, and handwritten x86-64 assembly in one implementation problem.

Task

The agent receives the libexpat 2.6.4 reference library and headers and must deliver an ABI-compatible x86-64 assembly replacement. In other words, it is not writing a tiny XML parser from scratch; it is rebuilding a real shared library that existing C callers can load and use unchanged.

  • Match the exported C ABI of the reference library.
  • Preserve parsing behavior across the hidden correctness suite.
  • Do enough low-level optimization that the assembly implementation can compete on speed too.

Evaluation

The verifier computes a weighted correctness score, a capped performance score, and then blends them with 80% weight on correctness. If the assembly library never achieves any correctness, the final score is zero.

  • Correctness comes from an upstream-style parser test suite.
  • Performance comes from hidden XML parsing benchmarks against the reference C build.
  • Performance only matters after the parser earns nonzero correctness credit.

Environment And Constraints

The task runs in a Modal container with 4 CPUs, 8 GB RAM, no GPU, and no internet access. The container image includes the Expat source, assembler and linker toolchain, and timer files. There is little spare headroom beyond that, which keeps the focus on assembly and ABI engineering rather than on brute-force search or external dependencies.