Evaluations - The Floating Droid

The limits of black-box evaluations: two hypotheticals

Apr 11, 2025 6 min read Evaluations

A prominent approach to AI safety goes under the name of "evals" or "evaluations". These are a critical component of plans that various major labs have, such as Anthropic&