DeepMind Wants to Measure AGI Like a Psychologist — And Offers $200K to Help

Everyone in AI talks about AGI. Almost nobody agrees on what it means, let alone how to measure progress toward it. Google DeepMind just dropped a paper that tries to change that — and they're putting $200,000 behind it.

The Problem: We Don't Know What We're Measuring

Right now, the AI industry measures intelligence with benchmarks like MMLU, HumanEval, and ARC. But these tests are narrow by design. Acing a coding benchmark doesn't mean a system can plan a dinner party, understand sarcasm, or know when it's wrong about something. It's like judging human intelligence solely on SAT scores — you'll catch some smart people, but you'll miss the full picture entirely.

DeepMind's new paper, Measuring Progress Toward AGI: A Cognitive Taxonomy, argues that we need to borrow from the field that's been studying intelligence for over a century: cognitive science.

Ten Cognitive Abilities That Matter

The framework identifies ten key cognitive abilities that general intelligence requires:

Perception — extracting information from the environment
Generation — producing text, speech, and actions
Attention — focusing on what matters
Learning — acquiring knowledge from experience
Memory — storing and retrieving information over time
Reasoning — drawing valid logical conclusions
Metacognition — knowing what you know (and don't know)
Executive Functions — planning, flexibility, inhibition
Problem Solving — finding effective domain-specific solutions
Social Cognition — reading social situations and responding appropriately

Think of it like a spider diagram of intelligence. Current AI models are incredibly elongated in some directions (reasoning, generation) and barely register in others (metacognition, social cognition). The framework doesn't just list these abilities — it proposes a three-stage evaluation protocol: test AI systems across all ten dimensions, collect human baselines from representative adult populations, and map AI performance relative to the human distribution.

The $200K Kaggle Hackathon

Theory is nice, but DeepMind wants action. They've partnered with Kaggle to launch a hackathon called Measuring Progress Toward AGI: Cognitive Abilities, with a $200,000 prize pool. Participants are invited to design evaluations for the five abilities where the evaluation gap is largest: learning, metacognition, attention, executive functions, and social cognition.

This is a clever move. Building good cognitive evaluations is genuinely hard — it requires expertise that spans AI, psychology, and experimental design. By crowdsourcing it through Kaggle's community of over 15 million users, DeepMind gets diverse perspectives while distributing the workload.

Why This Matters More Than Another Benchmark

We're at a weird inflection point in AI development. Models keep getting better at benchmarks, but the benchmarks themselves are becoming less meaningful. When GPT-5, Claude Opus, and Gemini all score 90%+ on the same tests, those tests stop telling us much. We need evaluations that probe qualitatively different capabilities, not just harder versions of the same tasks.

DeepMind's cognitive framework is the most serious attempt yet to build that kind of evaluation infrastructure. Instead of asking "can AI solve this specific problem?" it asks "does AI exhibit this fundamental cognitive ability?" — a much more useful question for tracking real progress.

The AI industry has been measuring models like we measure racehorses — by speed alone. DeepMind wants to measure them like psychologists measure humans — across the full spectrum of cognition.

Key Takeaways

DeepMind proposes a 10-ability cognitive framework for measuring AGI progress
$200,000 Kaggle hackathon invites the community to build evaluations
Focus areas: learning, metacognition, attention, executive functions, social cognition
Three-stage protocol benchmarks AI against representative human performance
Addresses the growing inadequacy of narrow AI benchmarks

Our Take

This is exactly the kind of work the AI field needs and rarely does. Everyone's so busy racing to the next model release that the question of how we evaluate progress gets treated as an afterthought. DeepMind's cognitive taxonomy won't settle the AGI debate — nothing will — but it provides a structured, scientifically grounded way to track capabilities that current benchmarks completely miss. The focus on metacognition and social cognition is particularly important; these are the abilities that separate "impressive tool" from "general intelligence," and we currently have almost no way to test for them. The Kaggle hackathon is a smart play: decentralize the hard work of building evaluations while getting buy-in from the research community. If even a handful of good cognitive evaluations come out of this, the entire field benefits.

The Problem: We Don't Know What We're Measuring

Ten Cognitive Abilities That Matter

The $200K Kaggle Hackathon

Why This Matters More Than Another Benchmark

Key Takeaways

Our Take

Sources

Related Articles

Your GPUs Are Idle 60% of the Time — Hugging Face Surveyed 16 RL Libraries to Fix That

NVIDIA's SPEED-Bench Finally Gives AI Inference Benchmarking a Reality Check

NVIDIA's JPEG Trick Shrinks AI Memory Usage by 20x