Research We Tested Headroom Against Our EXIT RAG Compressor 著者: Mike Mooring • June 14, 2026
Open-source Headroom compresses RAG context with an ONNX model. We wired it into our pipeline and raced it against a 50-line in-process extractor. Neither won outright.
記事を読む Research notes from the Divinci AI team The Divinci AI blog publishes original research, engineering deep-dives, and field notes from our work building production retrieval-augmented generation systems for regulated industries. We write about the vector indexes — we call them vIndexes — that power our RAG architecture, the routing logic that selects the right one for each question, the evaluation harness we use to benchmark them against frontier LLMs, and the hard-won lessons of deploying these systems where correctness, citations, and audit trails actually matter.
Our coverage spans three threads. First, technical work: how we build, score, and route across our open library of vIndexes published on Hugging Face; how we calibrate AI judges against human ground truth; what we learned shipping speculative decoding and FP8 quantization to production. Second, applied research: comparing retrieval strategies head-to-head in the RAG Arena, measuring grounding rates, and the economics of inference on commodity hardware. Third, perspective: where the field is going, what regulators are asking for next, and how teams should think about AI systems as the boundary between training and inference dissolves.
Most posts target an audience already fluent in AI engineering — engineers, researchers, and technical buyers — though we try to keep the framing accessible to anyone responsible for AI decisions at their organization. New posts publish irregularly when we have something genuinely worth saying. You can also find our open models, datasets, and vIndexes on our Hugging Face profile, and the source for our research tooling on GitHub.