The history of IQ testing is short, controversial, and ongoing. The first scientific intelligence test was published in 1905. Within 15 years, IQ testing was being used to make life-altering decisions about millions of people. By the 1970s, it was being used as a basis for forced sterilisations and immigration restrictions in multiple countries β practices later condemned as ethical disasters. Today the test is used both responsibly (diagnosing learning disabilities, identifying gifted students) and disputed (across-group comparisons, employment screening). This guide walks through the major figures, the milestone tests, the science that emerged, and the harms that came with them.
The First Test: Binet and Simon, 1905
French psychologist Alfred Binet was commissioned in 1904 by the French government to develop a method to identify schoolchildren who needed special educational support. Working with his colleague ThΓ©odore Simon, Binet published the first scientific intelligence test in 1905.
The Binet-Simon test was designed for a specific, narrow purpose: identifying children whose cognitive development was lagging so they could get extra help. It used age-appropriate tasks (basic reasoning, memory, vocabulary) and compared each child's performance to what was typical for their chronological age. The result was a "mental age" β the age level at which the child was performing.
Binet himself was cautious about what his test measured. He explicitly warned that intelligence was too complex to be captured by a single number and that the test should never be used to rank children definitively. He died in 1911, before his test was adopted by others who ignored these warnings.
The IQ Concept: Stern, 1912
German psychologist William Stern proposed in 1912 that "mental age" could be expressed as a ratio to "chronological age." A child whose mental age was 10 but chronological age was 8 would be performing above their years β Stern's formula made this a single number:
IQ = (Mental Age / Chronological Age) Γ 100
This gave the "Intelligence Quotient" β a single comparable number across children. The 100 baseline (mental age equals chronological age) is where the modern "average IQ = 100" convention started.
Stern's ratio worked well for children but broke down for adults β there's no obvious "mental age" past about 18. Later tests replaced the ratio formula with the modern standardised-score approach (mean 100, standard deviation 15), but the term "IQ" stuck.
Americanisation: Terman and the Stanford-Binet, 1916
Stanford psychologist Lewis Terman adapted Binet's test for American populations and published the Stanford-Binet Intelligence Scales in 1916. The test became the dominant IQ instrument in the English-speaking world for decades.
Terman was less cautious than Binet had been. He believed strongly that intelligence was largely innate, largely heritable, and largely fixed for life. He championed using IQ to track students into educational paths, identify gifted children for special programs, and screen workers for cognitive demands.
Terman's 1916 book The Measurement of Intelligence contains passages that would be unprintable today β speculations about race, immigration, and the cognitive future of America that reflected the eugenicist consensus of his era and were later thoroughly discredited. The science he helped establish outgrew his views; the harm those views caused is part of the historical record.
Mass Testing: The Army Alpha and Beta, World War I
The first mass administration of IQ tests came in 1917-1918, when the U.S. Army commissioned American psychologists (led by Robert Yerkes) to develop tests to classify millions of new recruits.
Two tests were created:
- Army Alpha β for literate, English-speaking recruits
- Army Beta β a non-verbal, picture-based version for illiterate or non-English-speaking recruits
About 1.75 million American soldiers were tested. The results β and the way they were interpreted β drove much of the next two decades of social policy. The Army findings were used to argue (incorrectly, on shaky methodology) that recent immigrants from southern and eastern Europe had lower cognitive ability than those from northern Europe. These conclusions fed directly into the U.S. Immigration Act of 1924, which sharply restricted immigration from those regions.
This is one of the most consequential β and most damaging β uses of IQ testing in history. The methodology has been thoroughly debunked: the tests measured cultural familiarity and English fluency as much as cognitive ability, and many of the recruits had been in the U.S. for weeks or months when tested. The harm from the policies that followed lasted generations.
The Wechsler Tests: Reframing IQ for Adults, 1939
David Wechsler, working at Bellevue Hospital in New York, recognised that the Stanford-Binet β designed primarily for children β was a poor fit for adult clinical work. In 1939 he published the Wechsler-Bellevue Intelligence Scale, the predecessor of today's WAIS (Wechsler Adult Intelligence Scale).
Wechsler's innovations shaped modern IQ testing:
- Separated verbal IQ from performance (non-verbal) IQ, allowing different cognitive profiles to emerge
- Replaced Stern's age-ratio formula with the modern standardised-score approach (mean 100, SD 15)
- Built in multiple subtests measuring different specific abilities β recognising that intelligence isn't unitary
- Designed specifically for adults, with norms across the adult lifespan
The WAIS (now in its fourth edition) is the standard adult IQ test today. Modern tests like the WISC (Wechsler Intelligence Scale for Children) extend the same approach across ages.
The Eugenics Disaster
The early 20th century saw IQ testing weaponised in service of eugenicist policies in multiple countries:
- U.S. forced sterilisation laws. 32 American states passed laws permitting compulsory sterilisation of people deemed "feebleminded" β often determined by IQ scores. About 60,000 Americans were sterilised between the 1900s and 1970s, disproportionately women, poor people, and people of colour. The Supreme Court upheld these laws in Buck v. Bell (1927), with Justice Oliver Wendell Holmes's notorious line "three generations of imbeciles are enough."
- Nazi Germany. The 1933 Nazi sterilisation law, modelled partly on U.S. eugenics legislation, used IQ-style assessments to identify "lives unworthy of life," eventually leading to the murder of hundreds of thousands of disabled people.
- Sweden, Switzerland, and other Nordic countries. Forced sterilisation programs running into the 1970s, with intellectual assessment as one criterion.
The science of IQ did not require these policies β the policies were chosen for ideological reasons and IQ scores were used as justification. But the history is part of why modern psychometricians are deeply cautious about how IQ research is communicated and used.
The Cyril Burt Fraud and the Heritability Wars
British psychologist Cyril Burt was one of the most influential figures in mid-20th-century intelligence research. He published twin studies through the 1940s-1960s that purported to show very high heritability of IQ. After his death in 1971, investigators discovered that many of his data points appeared fabricated β his "co-authors" couldn't be found, and his correlations stayed suspiciously constant across studies that should have varied.
The Burt scandal triggered decades of intense scrutiny of heritability research. Modern twin studies β done with proper transparency and pre-registration β replicate the basic finding (IQ heritability ~0.5-0.8 in modern adults), but Burt's specific contributions are now disregarded as unreliable.
The Bell Curve Controversy, 1994
Charles Murray and Richard Herrnstein's The Bell Curve (1994) was the most controversial book about intelligence published in the late 20th century. Its empirical claims about IQ score distributions were largely uncontroversial. Its claims about group differences, social policy implications, and the unalterable nature of cognitive ability were heavily contested by other psychologists and have not held up well.
The book triggered a major public reassessment of IQ research and led to important methodological responses (the APA's 1995 "Intelligence: Knowns and Unknowns" report) that clarified what the field actually agreed on. The controversy is still cited as a cautionary case in how scientific findings about cognition can be politicised.
Modern IQ Testing in 2026
Where IQ testing stands today:
- Standard tools. The Wechsler scales (WAIS-IV for adults, WISC-V for children) are the most-used clinical IQ tests. Stanford-Binet (now in its 5th edition) is still administered. Raven's Progressive Matrices is widely used for culture-fair fluid-intelligence measurement.
- Appropriate uses. Diagnosing learning disabilities, identifying gifted students for enrichment, assessing cognitive impairment after head injury or stroke, and certain neuropsychological evaluations are all considered appropriate.
- Contested uses. Employment screening (still common but increasingly questioned for fairness), educational tracking, and across-group comparison remain controversial.
- Online tests. Most free online "IQ tests" use unvalidated formats and inflate scores. They're entertainment, not measurement. Genuine cognitive assessment requires professional administration.
- Ongoing research. The frontier includes genome-wide association studies (polygenic scores currently explain only ~10-15% of IQ variance), neuroimaging correlates of g (some specific brain regions and white-matter tracts show consistent links), and the Flynn-effect reversal in some countries.
What the History Teaches
- The test is more useful than the interpretation. Binet's original purpose β identifying children needing extra support β is still one of the cleanest uses. Many of the harms came from extending IQ scores far beyond what the data actually supported.
- Within-group findings don't transfer to between-group claims. This was the methodological error behind 1920s immigration policy and 1990s controversy alike, and it's still misunderstood.
- The score is an estimate, not a truth. Even modern professionally administered tests have Β±5 IQ points of measurement error. Treating any specific score as a precise measurement is a mistake the field has been making β and gradually unlearning β for a century.
- Science can be misused without being wrong. The IQ data Terman collected was largely valid. The policies he advocated were not. Separating findings from their applications is a core ethical task in the social sciences.
For a structured contemporary measure of your reasoning across multiple cognitive subscales β built on the Wechsler tradition of measuring multiple specific abilities rather than collapsing them into a single number β our free IQ test takes 20 questions and gives an instant breakdown across numerical, verbal, logical, and pattern-recognition reasoning.
Frequently Asked Questions
Who invented IQ testing?
Alfred Binet (with ThΓ©odore Simon) developed the first scientific intelligence test in 1905 in France. The "IQ" concept itself was introduced by William Stern in 1912.
What was the first IQ test used for?
Binet designed it specifically to identify French schoolchildren who needed extra educational support. He explicitly warned against using it to rank children definitively β a warning that was ignored once the test was adopted elsewhere.
Why is the Stanford-Binet called that?
Lewis Terman at Stanford University adapted Binet's French test for American populations and published the Stanford-Binet Intelligence Scales in 1916. The name reflects both the institution and the original author.
Was IQ testing used by the Nazis?
Yes. Nazi Germany's 1933 sterilisation law and later "T4" euthanasia program used intellectual assessments to identify victims. The U.S. eugenics movement, which heavily used IQ testing, partly inspired Nazi legislation.
Is IQ testing still controversial?
Within psychology, the basic measurement is well-established. Specific applications β employment screening, group comparisons, public-policy uses β remain contested. The history of misuse keeps the field cautious about how findings are communicated.
