Intuitions vs. Formulas
Key Takeaway: Simple statistical formulas — even crude equal-weight models scribbled on the back of an envelope — consistently outperform expert clinical judgment across ~200 studies spanning medicine, criminal justice, hiring, wine pricing, and infant survival, because formulas are perfectly consistent while human judges are incorrigibly variable; but intuition retains value when it follows disciplined data collection, as Kahneman's own army interview redesign demonstrated.
Chapter 21: Intuitions vs. Formulas
← Chapter 20 | Thinking, Fast and Slow - Book Summary | Chapter 22 →
Summary
Paul Meehl's 1954 "disturbing little book" — Clinical vs. Statistical Prediction — is one of the most consequential and most resisted findings in the history of social science. Meehl reviewed 20 studies comparing clinical predictions (subjective impressions of trained professionals) against #statisticalprediction (simple formulas combining a few scores). In roughly 200 studies now available, about 60% show algorithms significantly outperforming experts, and the rest show ties — which are effectively algorithm wins because formulas cost almost nothing to apply. "There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one." No reliably documented exception exists.
The range of domains is staggering: cancer patient longevity, hospital stays, cardiac diagnosis, sudden infant death syndrome, new business success, credit risk, foster parent suitability, juvenile recidivism, violent behavior, scientific presentation quality, football game winners, and — most memorably — the future prices of Bordeaux wine. Princeton economist Orley Ashenfelter built a formula using three weather variables (summer temperature, harvest rain, winter rain) that predicts wine prices with a correlation above .90 — vastly better than the world's most prestigious wine experts. The French wine establishment responded with "violent and hysterical" hostility.
Two reasons explain the superiority of #algorithmsvsexperts. First, experts try to be clever: they think outside the box, consider complex feature interactions, and weigh contextual nuances — all of which reduce rather than increase validity in low-predictability environments. "Human decision makers are inferior to a prediction formula even when they are given the score suggested by the formula" — because they override it with additional information that is more often harmful than helpful. Meehl's "broken-leg rule" identifies the rare exception: you can override a formula that predicts whether someone will go to the movies if you learn they broke their leg today. But broken legs are very rare and decisive — most "override" situations are neither.
Second, and more fundamentally, humans are #incorrigiblyinconsistent. Experienced radiologists contradict themselves 20% of the time when re-evaluating the same chest X-ray. Auditors, pathologists, psychologists, and organizational managers show similar inconsistency. "Unreliable judgments cannot be valid predictors of anything." The inconsistency stems from System 1's extreme context dependence: a cool breeze, the time since lunch (the Israeli parole judges study), and countless unnoticed environmental primes shift judgments from moment to moment. Formulas are perfectly consistent: same input, same output, always.
Robyn Dawes's landmark finding about #equalweighting elevates this from interesting to revolutionary. You don't even need optimal statistical weights. Simple formulas that give equal weight to a handful of valid predictors perform just as well as — and often better than — optimally weighted regression equations, because equal-weight models aren't distorted by accidents of sampling. Dawes's marital stability formula is unforgettable: frequency of lovemaking minus frequency of quarrels. "You don't want your result to be a negative number." The practical implication: you can build a useful algorithm on the back of an envelope without any prior statistical research.
The Apgar score is the chapter's most inspiring example. Before 1953, physicians used subjective clinical judgment to assess newborn distress — different practitioners focused on different cues, danger signs were often missed, and babies died. Virginia Apgar jotted down five variables (heart rate, respiration, reflex, muscle tone, color) with scores of 0-2 each. The resulting 10-point scale gave delivery rooms a consistent standard. The #apgarscore is credited with saving hundreds of thousands of infant lives and is still used in every delivery room today. It exemplifies the principle: simple, standardized scoring beats even well-intentioned expert judgment.
Kahneman's own army interview redesign provides the chapter's most nuanced lesson. Applying Meehl's principles, he replaced the old unstructured interview (which was "almost useless") with a #structuredinterview: six traits evaluated independently using factual questions, each scored on a 1-5 scale before proceeding to the next, with a formula combining the scores. The interviewers protested: "You are turning us into robots!" So Kahneman compromised: after completing the structured protocol, interviewers could "close your eyes" and give a global intuitive score. The results showed the structured scores dramatically outperformed the old method — and, surprisingly, the "close your eyes" intuitive score performed equally well. The lesson: "Intuition adds value even in the justly derided selection interview, but only after a disciplined collection of objective information and disciplined scoring of separate traits." Intuition is rehabilitated — but only as the final step in a structured process, never as the first.
The #hostilitytoalgorithms section explains why resistance persists despite overwhelming evidence. Clinicians described the statistical method as "mechanical, atomistic, cut and dried, artificial, dead, pedantic, sterile" while lauding clinical judgment as "dynamic, global, meaningful, holistic, subtle, rich, deep, genuine, sensitive, living." The moral dimension is revealing: "the story of a child dying because an algorithm made a mistake is more poignant than the story of the same tragedy occurring as a result of human error." We prefer human judgment not because it's better but because human error feels more forgivable than algorithmic error. Meehl and others argued the opposite: "it is unethical to rely on intuitive judgments for important decisions if an algorithm is available that will make fewer mistakes."
For the library, this chapter provides the most direct operational framework yet: the six-step hiring procedure in the "Do It Yourself" section is immediately implementable. Select six independent traits, compose factual questions for each, score each on a 1-5 scale sequentially (never skip around — this prevents halo effects), sum the scores, and hire the highest scorer. "You are much more likely to find the best candidate if you use this procedure than if you do what people normally do." This maps directly to Wickman's people management in The EOS Life and to Hormozi's hiring frameworks across $100M Leads.
Key Insights
Simple Formulas Beat Expert Judgment in Low-Validity Environments — Across ~200 studies spanning decades and domains, algorithms win 60% of the time and tie the rest. No reliable exception exists. The finding is the most robust in social science. Equal-Weight Models Are Nearly As Good As Optimal Ones — You don't need regression analysis. Simply identify 4-6 valid predictors, standardize them, and weight them equally. The resulting back-of-envelope formula will outperform most experts and match most optimized models. Human Inconsistency Is the Fatal Flaw — Even experts contradict themselves 20% of the time on identical cases. Inconsistency destroys predictive validity regardless of expertise. Algorithms eliminate inconsistency entirely. Intuition Has Value — But Only After Structure — The "close your eyes" exercise in Kahneman's interview system performed well — but only because it followed disciplined, structured data collection. Intuition as the first and only step fails; intuition as the capstone of a structured process succeeds. Hostility to Algorithms Is Emotional, Not Rational — People prefer human judgment to algorithmic judgment because human error feels more forgivable, not because it's less frequent. This moral preference perpetuates inferior decision processes.Key Frameworks
Clinical vs. Statistical Prediction (Meehl) — Clinical: holistic, subjective impressions of trained professionals. Statistical: simple formulas combining a few scores or ratings. Across ~200 studies, statistical predictions match or exceed clinical predictions in every domain tested. The finding has been consistent for 70+ years. The Equal-Weight Model (Dawes) — Select a set of valid predictors, standardize them, and combine with equal weights. This "improper linear model" performs nearly as well as optimally weighted regression and dramatically outperforms expert judgment. Implication: useful algorithms require no statistical training to build. The Structured Interview Protocol (Kahneman) — Six steps: (1) Select 4-6 independent traits relevant to the role. (2) Compose factual questions for each trait. (3) Score each trait on a 1-5 scale sequentially — never skip around. (4) Complete all traits before moving to the next candidate. (5) Optionally, add a "close your eyes" global intuitive score at the end. (6) Hire the candidate with the highest total score, resisting the urge to override the formula. The Broken-Leg Rule (Meehl) — The only justified reason to override a formula is information that is both very rare and decisively relevant — like learning someone broke their leg when the formula predicts they'll go to the movies. Most "overrides" don't meet this standard and make predictions worse.Direct Quotes
[!quote]
"Whenever we can replace human judgment by a formula, we should at least consider it."
[source:: Thinking, Fast and Slow] [author:: Daniel Kahneman] [chapter:: 21] [theme:: algorithmsvsexperts]
[!quote]
"Intuition adds value even in the justly derided selection interview, but only after a disciplined collection of objective information."
[source:: Thinking, Fast and Slow] [author:: Daniel Kahneman] [chapter:: 21] [theme:: structuredinterview]
[!quote]
"Unreliable judgments cannot be valid predictors of anything."
[source:: Thinking, Fast and Slow] [author:: Daniel Kahneman] [chapter:: 21] [theme:: consistency]
[!quote]
"Do not simply trust intuitive judgment — your own or that of others — but do not dismiss it, either."
[source:: Thinking, Fast and Slow] [author:: Daniel Kahneman] [chapter:: 21] [theme:: intuition]
Action Points
- [ ] Build a structured scoring system for your next hire: Select 5-6 traits, compose factual questions, score each 1-5 sequentially, sum the scores, and hire the highest scorer. This single change will dramatically improve hiring quality over unstructured interviews.
- [ ] Replace holistic "gut feel" evaluations with trait-level scoring across all assessments: Whether evaluating vendors, partnerships, investment opportunities, or marketing campaigns, decompose the assessment into independent dimensions, score each separately, and combine with equal weights. The formula will beat your holistic impression.
- [ ] Resist the urge to override formulas with "additional information": When a scoring system says candidate A is best but your gut says candidate B, remember that overriding formulas with intuition makes predictions worse, not better, except in broken-leg situations (very rare, decisively relevant information).
- [ ] Create your own Apgar scores for recurring decisions: Identify the 3-5 most diagnostic variables for decisions you make repeatedly (evaluating content, assessing leads, prioritizing projects), assign simple scoring criteria, and apply consistently. Consistency alone will improve decision quality.
- [ ] Add a "close your eyes" step at the END of structured processes: After completing all objective scoring, allow yourself one holistic intuitive assessment — and give it weight equal to (not greater than) the structured scores. Intuition is valuable when it follows structure, not when it replaces it.
Questions for Further Exploration
- If equal-weight models match optimally weighted ones, what does this imply about the entire field of predictive analytics? Are we overinvesting in algorithmic complexity when simplicity would suffice?
- The Apgar score transformed neonatal medicine. What other domains have obvious "Apgar score" opportunities — simple standardized scoring systems that could replace subjective expert judgment and save lives?
- Kahneman's interviewers protested that structured scoring made them "robots." How should organizations manage the psychological resistance to algorithmic decision-making among skilled professionals?
- If overriding formulas with additional information usually makes things worse, what are the characteristics of the rare "broken-leg" exceptions? Can we identify them in advance rather than relying on post-hoc judgment about when the exception applies?
- The hostility to algorithms is partly moral: algorithmic errors feel worse than human errors. As AI-driven decision-making expands, how should society renegotiate this moral intuition?
Personal Reflections
Space for your own thoughts, connections, disagreements, and applications.
Themes & Connections
Tags in this chapter:- #algorithmsvsexperts — Simple formulas consistently outperform expert clinical judgment across ~200 studies
- #clinicalprediction — Holistic, subjective expert assessment; inferior to statistical approaches in low-validity environments
- #statisticalprediction — Formula-based combination of a few scores; superior to clinical prediction
- #equalweighting — Dawes's finding that equal-weight formulas match optimally weighted ones
- #apgarscore — The paradigmatic example of a simple scoring system saving lives
- #structuredinterview — Kahneman's army interview: factual questions, trait-level scoring, sequential assessment
- #hostilitytoalgorithms — Emotional and moral resistance to replacing human judgment with formulas
- #consistency — The fatal advantage of algorithms: same input always produces same output
- Algorithms vs Experts — New major concept: the clinical vs. statistical prediction debate
- Structured Decision Making — New concept: the practical framework for decomposed, scored evaluation
- Consistency — The meta-principle: reliability is a prerequisite for validity
- The EOS Life Ch 2-3 — Wickman's People Analyzer tool (core values + GWC scoring) is essentially a Kahneman-style structured evaluation: decompose assessment into independent traits, score each separately, combine for a decision
- $100M Leads Ch 12-14 — Hormozi's hiring and team evaluation benefits from the structured interview protocol: replace "I liked them" with trait-level scoring
- Getting to Yes Ch 4-5 — Fisher's objective criteria framework is the negotiation equivalent of algorithmic decision-making: replace subjective impressions with standardized evaluation
- Six-Minute X-Ray Ch 1-5 — Hughes's behavioral profiling uses structured observation categories (comfort/discomfort displays, illustrators, manipulators) — essentially a behavioral Apgar score that decomposes "reading people" into scorable components
- What Every Body Is Saying Ch 2-4 — Navarro's emphasis on baselining and systematic observation of specific body regions mirrors the structured interview principle: observe specific traits independently, don't form global impressions
- Influence — Cialdini's six principles function as an equal-weight model for predicting compliance: assess each principle's presence, combine scores, predict the outcome