PostgreSQL runs as its own process that clients connect to over TCP. After login, the client and server exchange queries using a fixed binary message format called the PostgreSQL wire protocol. It defines every byte that goes over the socket: how startup and authentication work, how SQL strings and parameters are sent, and how result rows, errors, and ready-for-query signals come back. On the server, every table, column, type, and function is described by rows in a set of system tables called the catalog (pg_class, pg_type, pg_proc, pg_namespace, etc.). Tools like psql, ORMs, and migration frameworks read the catalog constantly to figure out the schema, so reproducing its shape matters as much as answering queries.
SQLite skips all of that. It is a small C library your app links against and calls directly, with no network, no login, and no separate process. The whole database is a single file, and its internal schema lives in a much smaller table (sqlite_master). The SQL dialect also diverges from PostgreSQL in real ways. It has looser typing, different date/time functions, a different system catalog, and different rules for things like ALTER TABLE.
PostgreSQL also ships admin tools that deployments depend on: initdb creates a new data directory with the expected layout, and pg_ctl starts and stops the server, manages its PID file, and forwards signals. A drop-in has to act like these tools, not just like the server.
Starting from PostgreSQL 18.3 documentation, a tiny Zig scaffold, and access to SQLite3 and libc, the agent must build a single binary that PostgreSQL clients can connect to normally, while storing data in SQLite underneath. The same compiled binary is expected to stand in for postgres, initdb, and pg_ctl by switching behavior based on argv[0].
The hidden verifier combines three kinds of tests: PostgreSQL's own regression tests (which compare SQL output character-for-character), integration tests that stress lifecycle management and authentication, and 60 extra smoke tests covering ordinary database operations.
No model was able to complete this task successfully, so we used overall test pass rate as a partial reward to rank models. A submission that passes half of the hidden checks across those three buckets scores exactly 0.5.
The task runs in a Modal container with 8 CPUs, 32 GB RAM, and no internet access. The container image stages/app/postgres-sqlite, the offline PostgreSQL docs tree, and the stock psql client. PostgreSQL source code is not present, but the environment does provide the tiny Zig scaffold plussqlite3 and libc system linking, with no external packages or ready-made protocol libraries.
pg, libpq, pgcommon, and pgport, plus Zig imports like pgwire, postgres, postgresql, libpq, or pq.