DeepMind Wants to Measure AGI Like a Psychologist — And Offers $200K to Help
Everyone in AI talks about AGI. Almost nobody agrees on what it means, let alone how to measure progress toward it. Google DeepMind just dropped a paper that tries to change that — and they're putting $200,000 behind it.
The Problem: We Don't Know What We're Measuring
Right now, the AI industry measures intelligence with benchmarks like MMLU, HumanEval, and ARC. But these tests are narrow by design. Acing a coding benchmark doesn't mean a system can plan a dinner party, understand sarcasm, or know when it's wrong about something. It's like judging human intelligence solely on SAT scores — you'll catch some smart people, but you'll miss the full picture entirely.
DeepMind's new paper, Measuring Progress Toward AGI: A Cognitive Taxonomy, argues that we need to borrow from the field that's been studying intelligence for over a century: cognitive science.
Ten Cognitive Abilities That Matter
The framework identifies ten key cognitive abilities that general intelligence requires:
- Perception — extracting information from the environment
- Generation — producing text, speech, and actions
- Attention — focusing on what matters
- Learning — acquiring knowledge from experience
- Memory — storing and retrieving information over time
- Reasoning — drawing valid logical conclusions
- Metacognition — knowing what you know (and don't know)
- Executive Functions — planning, flexibility, inhibition
- Problem Solving — finding effective domain-specific solutions
- Social Cognition — reading social situations and responding appropriately
Think of it like a spider diagram of intelligence. Current AI models are incredibly elongated in some directions (reasoning, generation) and barely register in others (metacognition, social cognition). The framework doesn't just list these abilities — it proposes a three-stage evaluation protocol: test AI systems across all ten dimensions, collect human baselines from representative adult populations, and map AI performance relative to the human distribution.
The $200K Kaggle Hackathon
Theory is nice, but DeepMind wants action. They've partnered with Kaggle to launch a hackathon called Measuring Progress Toward AGI: Cognitive Abilities, with a $200,000 prize pool. Participants are invited to design evaluations for the five abilities where the evaluation gap is largest: learning, metacognition, attention, executive functions, and social cognition.
This is a clever move. Building good cognitive evaluations is genuinely hard — it requires expertise that spans AI, psychology, and experimental design. By crowdsourcing it through Kaggle's community of over 15 million users, DeepMind gets diverse perspectives while distributing the workload.
Why This Matters More Than Another Benchmark
We're at a weird inflection point in AI development. Models keep getting better at benchmarks, but the benchmarks themselves are becoming less meaningful. When GPT-5, Claude Opus, and Gemini all score 90%+ on the same tests, those tests stop telling us much. We need evaluations that probe qualitatively different capabilities, not just harder versions of the same tasks.
DeepMind's cognitive framework is the most serious attempt yet to build that kind of evaluation infrastructure. Instead of asking "can AI solve this specific problem?" it asks "does AI exhibit this fundamental cognitive ability?" — a much more useful question for tracking real progress.
The AI industry has been measuring models like we measure racehorses — by speed alone. DeepMind wants to measure them like psychologists measure humans — across the full spectrum of cognition.
Key Takeaways
- DeepMind proposes a 10-ability cognitive framework for measuring AGI progress
- $200,000 Kaggle hackathon invites the community to build evaluations
- Focus areas: learning, metacognition, attention, executive functions, social cognition
- Three-stage protocol benchmarks AI against representative human performance
- Addresses the growing inadequacy of narrow AI benchmarks
Our Take
This is exactly the kind of work the AI field needs and rarely does. Everyone's so busy racing to the next model release that the question of how we evaluate progress gets treated as an afterthought. DeepMind's cognitive taxonomy won't settle the AGI debate — nothing will — but it provides a structured, scientifically grounded way to track capabilities that current benchmarks completely miss. The focus on metacognition and social cognition is particularly important; these are the abilities that separate "impressive tool" from "general intelligence," and we currently have almost no way to test for them. The Kaggle hackathon is a smart play: decentralize the hard work of building evaluations while getting buy-in from the research community. If even a handful of good cognitive evaluations come out of this, the entire field benefits.