Skip to main content
Latest research:When the Circuit Dissolves →12 vindexes on Hugging Face
Request demo

vLLM

Posts in tags: "vLLM" (1 post)

Speculative Decoding for Free: Pairing DFlash with our DFO-Tuned Gemma 4 31B

z-lab's DFlash drafter was trained against stock Gemma 4 31B. We dropped it on top of our QLoRA fine-tune and it captured 92% of the published speedup with no drafter retraining. Here is the math, the vLLM patch we had to upstream to make it run, and the prod-cutover numbers (~15× faster, ~4× cheaper).

Read More