Category: Research | Divinci AI

July 17, 2026

Research

WWW-RAG: Making the Open Web Chattable, From One Machine

92 websites, 129,931 chunks, each site its own AI assistant — crawled, embedded, and published by a Rust daemon running on one machine. Here's how WWW-RAG works and where it's going.

WWW-RAGWeb CrawlingRAGRustEmbeddingsTursolibSQLLocal-First AI

June 28, 2026

Research

Prominence Is Not Relevance: Teaching a Storefront Assistant What to Recommend

An e-commerce assistant linked products correctly but recommended the wrong ones. The fix was discovering that decoration, ranking, and recommendation are three different problems — and that prominence is only ever a tiebreaker on relevance.

RAGProduct RecommendationsE-commerceBM25ProminenceStorefrontLLM GenerationBuilding in Public

June 21, 2026

Research

We Made Our RAG Pipeline Parse PDFs 20–50× Faster

We swapped OpenParse for LiteParse in our RAG ingestion pipeline. The headline is 20–50× faster parsing. The useful part is the four ways we got it wrong first.

LiteParseOpenParseRAGPDF ParsingDocument IngestionCloud RunCloudflare WorkersEvaluation

June 14, 2026

Research

We Tested Headroom Against Our EXIT RAG Compressor

Open-source Headroom compresses RAG context with an ONNX model. We wired it into our pipeline and raced it against a 50-line in-process extractor. Neither won outright.

HeadroomRAGContext CompressionEXITLLM-as-JudgeCloud RunEvaluation

May 09, 2026

Research

Speculative Decoding for Free: Pairing DFlash with our DFO-Tuned Gemma 4 31B

z-lab's DFlash drafter on our QLoRA fine-tune captured 92% of the published speedup with no retraining. ~15x faster, ~4x cheaper in prod.

DFlashSpeculative DecodingGemma 4vLLMInferenceH100QLoRADFO

May 01, 2026

Research

The Two Models That Never Met. Both Measured at the Same Depth.

Two natively-trained 1-bit LLMs converge on the same activation anomaly without ever sharing weights. A note on convergence under pressure.

LarQLInterpretabilityCKACross-ModelMechanistic InterpretabilityUniversal Constants

April 27, 2026

Research

When the Circuit Dissolves

Two natively-trained 1-bit LLMs lose the four-stage circuit that organizes fp16 transformers. Behavior survives; structure dissolves.

LarQLInterpretabilityQuantizationBitNetBonsaiMechanistic Interpretability

April 26, 2026

Research

Inside the RAG Arena: When the Judges Don't Agree

A 200-item RAG arena tied at the mean, but two LLM judges only agreed at Spearman ρ=0.55. They aren't measuring the same thing.

RAG-ArenaScoredQARAG RoutingEXITLLM-as-JudgeSpearmanEvaluationQLoRA

April 25, 2026

Research

Deleting Paris from a Language Model

A LarQL patch deletes 'Paris is the capital of France' from a frozen LLM. The receipt is a 100-byte JSON file with a SHA-256 checksum.

LarQLInterpretabilityKnowledge EditingUnlearningMechanistic Interpretability

April 23, 2026

Research

The Architecture Every Language Model Converges To

Every modern LLM converges on the same four-stage circuit. We map the architecture and what its absence in 1-bit models means.

LarQLInterpretabilityTransformersMachine LearningMechanistic Interpretability