Divinci AI - Excellence, every time

May 09, 2026

Research

Speculative Decoding for Free: Pairing DFlash with our DFO-Tuned Gemma 4 31B

z-lab's DFlash drafter was trained against stock Gemma 4 31B. We dropped it on top of our QLoRA fine-tune and it captured 92% of the published speedup with no drafter retraining. Here is the math, the vLLM patch we had to upstream to make it run, and the prod-cutover numbers (~15× faster, ~4× cheaper).

DFlashSpeculative DecodingGemma 4vLLMInferenceH100QLoRADFO

April 26, 2026

Research

Inside the RAG Arena: When the Judges Don't Agree

We ran a 200-item RAG arena on the AskTheDoctor corpus across three models and two retrieval configurations. The headline (v2-atd ≈ Llama 4 Scout, both at ~0.58) is interesting. The methodology footnote is more interesting: we then re-judged 415 of those answers with two different LLM judges and got Spearman ρ = 0.55 between them. That number is the case for human calibration.

RAG-ArenaScoredQARAG RoutingEXITLLM-as-JudgeSpearmanEvaluationQLoRA