Inside the RAG Arena: When the Judges Don't Agree
A 200-item RAG arena tied at the mean, but two LLM judges only agreed at Spearman ρ=0.55. They aren't measuring the same thing.
Read More about Inside the RAG Arena: When the Judges Don't AgreePosts in tags: "RAG-Arena" (1 post)
A 200-item RAG arena tied at the mean, but two LLM judges only agreed at Spearman ρ=0.55. They aren't measuring the same thing.
Read More about Inside the RAG Arena: When the Judges Don't Agree