The Two Models That Never Met. Both Measured at the Same Depth.
May 01, 2026
Read more Insights, updates, and thought leadership on artificial intelligence, RAG systems, and enterprise AI management.
z-lab's DFlash drafter was trained against stock Gemma 4 31B. We dropped it on top of our QLoRA fine-tune and it captured 92% of the published speedup with no drafter retraining. Here is the math, the vLLM patch we had to upstream to make it run, and the prod-cutover numbers (~15× faster, ~4× cheaper).
Read Article