Speculative Decoding for Free: Pairing DFlash with our DFO-Tuned Gemma 4 31B
z-lab's DFlash drafter on our QLoRA fine-tune captured 92% of the published speedup with no retraining. ~15x faster, ~4x cheaper in prod.
Read More about Speculative Decoding for Free: Pairing DFlash with our DFO-Tuned Gemma 4 31B