LLM-as-judge

Definition

Using a strong language or multimodal model to grade another model's output automatically, scaling evaluation beyond what human review can cover.

LLM-as-judge asks a capable model to score or compare outputs against criteria, giving fast, cheap, repeatable evaluation for transcripts, summaries, or video answers. It needs careful prompts and spot-checks against humans, since the judge can be biased or wrong.

Also known as

LLM as a judge, model-based evaluation