Spearman

Posts in tags: "Spearman" (2 posts)

April 27, 2026

Product

Calibrating the Judge: The Grader get Graded

ScoredQA Calibration: a domain expert rates 50 answers, we compute Spearman ρ vs each LLM judge, and pick the judge that actually agrees.

ScoredQACalibrationEvaluationSpearmanRAG RoutingLLM-as-JudgeHuman-in-the-Loop

April 26, 2026

Research

A 200-item RAG arena tied at the mean, but two LLM judges only agreed at Spearman ρ=0.55. They aren't measuring the same thing.

RAG-ArenaScoredQARAG RoutingEXITLLM-as-JudgeSpearmanEvaluationQLoRA