Revideo v0.4.2 is an end-to-end video rendering stack that spans browser rendering, canvas work, FFmpeg encoding, and surrounding orchestration. The task uses the real TypeScript codebase rather than an isolated kernel, so changes can land anywhere in that pipeline as long as the produced video stays aligned.
The agent starts from the Revideo v0.4.2 source tree and a benchmark project with example scenes. The deliverable is a modified codebase that the verifier rebuilds from source, renders against hidden scenes, and measures for speedup. The stack includes browser rendering, canvas work, FFmpeg encoding, and surrounding orchestration, so there are several places where the submission can win or lose time.
The verifier builds the agent's modified Revideo from source and renders a set of scenes the agent never saw during development. To cancel cold-start bias it uses an ABBA protocol — baseline, candidate, candidate, baseline — so that each codebase gets one cold-cache run and one warm-cache run and the systematic advantage of going second washes out.
The public score is a performance score, but it is correctness-gated: visibly wrong video does not count no matter how fast it renders. The check runs on the rendered MP4s — frame-level SSIM (structural similarity, a perceptual metric where 1.0 means identical) and overall duration must both stay within tight bounds for the speedup to count.
Computed per frame; a scene passes only if every frame stays at or above 0.99 SSIM.
The practical implication is that SSIM is somewhat tolerant of tiny pixel-level noise but much less tolerant of changes to structure or alignment. An optimization can change how a frame gets produced and still pass if the same edges, shapes, and brightness relationships end up on screen, but blur, layout drift, or altered motion will pull the score down quickly.
Because the verifier computes SSIM frame by frame against the baseline, timing mistakes are also expensive. A dropped frame, a duplicated frame, or even a one-frame delay means the comparison is suddenly looking at different moments in the animation. Two videos can both look plausible when watched casually and still fail the gate if the frame sequence is no longer aligned.
Putting it together, the verifier follows a three-step pipeline. Only submissions that clear the correctness gate earn a speed score.
The task runs in a Modal container with 8 CPUs, 32 GB RAM, and no internet access. The container image includes the full Revideo monorepo, benchmark scenes, source media, and the preinstalled npm packages that matter for optimization. The work is to reshape a live rendering stack under /app/revideo, not to recreate it from scratch.