vLLM

Posts in tags: "vLLM" (1 post)

May 09, 2026

Research

Speculative Decoding for Free: Pairing DFlash with our DFO-Tuned Gemma 4 31B

z-lab's DFlash drafter on our QLoRA fine-tune captured 92% of the published speedup with no retraining. ~15x faster, ~4x cheaper in prod.

DFlashSpeculative DecodingGemma 4vLLMInferenceH100QLoRADFO