Optimize Granite Mamba2 model inference. The baseline is vLLM-optimized Triton SSM kernels running on NVIDIA B200. The agent must improve throughput while maintaining model correctness.
Evaluation
Metricgeometric mean speedup vs vLLM-optimized Triton SSM kernels on NVIDIA B200