Emma moves from President to Executive Director, Beth moves to Head of Research
A collection of resources for evaluating potentially dangerous autonomous capabilities of frontier models.
METR has published a standard way to define tasks for evaluating the capabilities of AI agents.
A summary of what METR accomplished in 2023 – our first full year of operation.
METR (formerly ARC Evals) is looking for (1) ideas, (2) detailed specifications, and (3) well-tested implementations for tasks to measure performance of autonomous LLM agents.
ARC Evals is wrapping up our incubation period at ARC, and spinning off into our own standalone nonprofit.