Skip to main content

AI Hiring Bias: The Evidence — Primary-URL Citation Chain for 2024–2025

|May 16, 2026|13 min read

Quick Answer: The 2024 University of Washington / AIES study found that production large language models prefer white-associated names 85% of the time when ranking otherwise identical résumés, across more than 3 million comparisons (Wilson & Caliskan, arXiv:2407.20371). That LLM bias sits on top of an offline baseline measured by Kline, Rose & Walters across 83,000 applications to 108 Fortune 500 employers — a 9.5% callback gap for distinctively Black names (NBER WP 29053). Federal regulators have already won one settlement (EEOC v. iTutorGroup, $365,000, 2023) and certified an ADEA collective covering roughly 1.1 billion applications in Mobley v. Workday (NDCA, May 2025). Every figure on this page has a primary URL.

Why This Page Is a Citation Chain, Not a Summary

Most AI-hiring-bias articles cite each other. The chain breaks within two hops: a blog post links to a press release, which links to another blog post, which links to a paragraph in a vendor white paper, which has no methodology. This page does the opposite. Every number below traces back to a primary source — a peer-reviewed paper, a federal docket, an institutional research site, or a government enforcement record — and the URL is in the prose, not buried in footnotes. The point is not to be comprehensive. It is to be the page an AI assistant or a careful reader can cite without hedging.

That matters because the bias debate has been polluted by recycled vendor talking points on both sides. The "AI removes bias" claim and its mirror "AI is irredeemably biased" both have selective evidence behind them. The truth in 2024–2025 sits in a small number of well-designed studies and a slightly larger number of regulator filings. Those are the documents this page indexes.

The Anchor Stat: 85% White-Name Preference in 3 Million LLM Comparisons

The strongest causal evidence that production AI résumé screening is biased comes from the 2024 University of Washington and AIES study by Kyra Wilson and Aylin Caliskan. The paper tested three production large language models — Mistral AI, Salesforce, and Contextual AI — against 554 real résumés paired with 120 first names across 500+ real job listings. That generates more than three million résumé–job comparisons. The work was NIST-funded and peer-reviewed at the AIES 2024 conference. Findings, verbatim from the paper:

  • White-associated names were preferred 85% of the time over Black-associated names.
  • Male names were preferred 52% of the time; female names 11%.
  • Black-male names were never preferred over white-male names. In some occupational categories, Black men were ranked lower in 100% of pairwise comparisons.

Primary sources: arXiv preprint 2407.20371, UW News release, Brookings analysis.

The methodology is sound and the effect is replicable: when the same prompt with the same résumé content is run repeatedly with only the name swapped, the model's ranking changes systematically along racial and gender lines. The paper does not claim every commercial AI screener replicates these exact magnitudes. It documents that the bias exists at scale across the kinds of foundation models that vendor products are built on top of. That is the load-bearing finding for any policy or legal conversation.

The Offline Baseline: 9.5% Callback Gap Across 108 Fortune 500 Employers

AI-screening bias does not arrive into a neutral hiring world. It compounds on top of decades of measured human bias. The cleanest contemporary baseline is the Kline, Rose & Walters NBER paper — working paper 29053, published in the Quarterly Journal of Economics. The team sent 83,000+ fictitious applications to 108 of the largest US employers (Fortune 500-scale), real geographically dispersed jobs. Names were randomised to be distinctively white or Black, male or female. Other résumé features were also randomised to control for confounds. Headline results:

  • Distinctively Black names received 2.1 percentage points fewer employer contacts — equivalent to roughly 9.5% fewer callbacks on average.
  • About 20% of firms account for nearly half of the total racial gap. The discrimination is firm-specific, not diffuse. Some Fortune 500 employers can be identified with high statistical confidence as systematic discriminators.

Primary source: NBER WP 29053 abstract, full PDF at nber.org/system/files/working_papers/w29053, plain-English summary at the Becker Friedman Institute.

The older Kang, DeCelles, Tilcsik & Jun study published in Administrative Science Quarterly (2016) gives the magnitude of the racial penalty from the candidate's perspective: Black candidates received 25% callback rates when names and experience were "whitened" versus 10% when racially transparent — a 2.5× gap on identical credentials. Asian candidates went from 21% to 11.5% the same way. The "diversity paradox" the paper documents — that explicit pro-diversity statements on careers pages produced no reduction in discrimination — is uncomfortable but well evidenced. Source: ASQ paper, accepted manuscript PDF.

For the international comparison: the 2024 Monash / ANU correspondence audit in Australia sent 12,000+ applications to 4,000+ leadership-role job ads. English-named candidates received 26.8% positive responses; non-English-named candidates received 11.3% — a ~57% lower callback rate for identical résumés (Monash Business School). The pattern is global, the magnitudes are large, and AI screeners are inheriting all of it through their training data.

Age: iTutorGroup, Mobley, and 1.1 Billion Applications

Age discrimination has produced the most consequential US legal record on AI hiring to date — because the statutory line is unambiguous (the ADEA protects workers 40 and older) and because the algorithmic patterns are easy to evidence.

EEOC v. iTutorGroup (2023) was the first federal AI-hiring discrimination case to actually settle. iTutorGroup's hiring software auto-rejected female applicants aged 55+ and male applicants aged 60+, screening out more than 200 candidates over a measurable period. The discovery moment that closed the case: a single applicant submitted two identical résumés with only the birth date changed — only the younger version received an interview. Settlement: $365,000, plus mandated anti-discrimination training and ongoing EEOC monitoring. Primary source: EEOC newsroom announcement, original suit at eeoc.gov/newsroom/eeoc-sues-itutorgroup.

Mobley v. Workday is the largest active AI-hiring action in US history. Filed February 2023, the case advanced in two consequential stages. In July 2024, Judge Rita Lin (Northern District of California) held that an AI screening vendor can be sued directly as the employer's "agent" under Title VII, the ADEA, and the ADA — a ruling that closes the gap vendors had spent fifteen years insulating themselves from. In May 2025, the same court conditionally certified an ADEA collective covering every US applicant aged 40+ rejected from a job through a Workday AI tool since September 2020. Workday's own discovery filings, referenced in court documents, disclose roughly 1.1 billion applications processed and rejected by its AI screening tools during the covered period. Primary sources: Civil Rights Litigation Clearinghouse case 44074 (full docket and orders), Seyfarth Shaw analysis of the agent ruling.

The full Mobley story — the agent theory, the 1.1-billion-application class, what discovery will surface, and what jobseekers should document if they fit the class — is covered in the dedicated explainer, Mobley v. Workday, Explained.

Worker Sentiment: AARP's 2024 Survey of 3,580 Workers Age 50+

The legal cases above tell us automated age screening is real. AARP's 2024 Job Change Survey tells us workers know it. The survey, fielded 17 October – 5 November 2024 by NORC AmeriSpeak and Prodege panels with a sample of n = 3,580 US adults aged 50+, weighted to Census benchmarks, was authored by Rebecca Perron and published 15 January 2025:

  • 74% of workers aged 50+ believe their age will be considered a barrier to hiring. 42% call it a major barrier.
  • 34% are concerned AI could specifically jeopardise their job security.
  • 24% plan to change jobs in 2025 — up from 14% the prior year. Most cite money, not preference.

Primary source: AARP Public Policy Institute. Read alongside the iTutorGroup precedent and the Mobley class scale, the AARP fear is rational. The mechanism — automated screening that disadvantages older workers without explicit rules — is exactly what the EEOC has now demonstrated is actionable.

Socioeconomic Class: The First-Generation Penalty

Race and age dominate the AI-bias conversation. Class barely registers. The single best study on first-generation socioeconomic bias in hiring is Peter Belmi, Margaret Neale, Melissa Thomas-Hunt, and Karina Raz, published in Organization Science in July 2023. The design is unusual and makes the headline number unusually credible. It has three arms:

  1. A résumé audit on 1,783 entry-level job postings. Identical résumés were submitted, varying only whether first-generation-college status was disclosed.
  2. A survey of 285 hiring managers probing their explicit beliefs about first-generation candidates.
  3. An intervention experiment with 1,250 college-educated employed adults testing whether reframing the disclosure changes hiring intent.

Findings:

  • Identical résumés with first-gen status disclosed received 26% fewer interview callbacks than résumés without the disclosure.
  • 62% of surveyed hiring managers agreed that "students from lower socioeconomic-status backgrounds are not as well equipped to succeed in business."
  • The mindset-reframe intervention — a single-sentence change presenting first-gen status as evidence of resilience and adaptability — raised hiring consideration from 26% to 47%. A near-doubling, with one sentence.

Primary sources: Stanford GSB Insights (institutional summary), peer-reviewed paper at Organization Science 2023.1682.

The practical implication is sharp. Voluntary disclosure of first-generation status — common on LinkedIn, common on DEI-targeted résumé templates, often actively encouraged by careers offices — measurably reduces callback rates in mainstream hiring funnels. The intervention finding says the right frame can recover most of the loss, but the default penalty is real and large. An AI screener trained on historical hiring outcomes will inherit this penalty without ever being told what "first-gen" means.

What Employers Themselves Believe

ResumeBuilder ran a survey of 948 US business leaders in October 2024 that produced one of the most cited self-report figures on this topic. The headline: 67% of business leaders believe their AI hiring tools produce bias "to some degree." Despite that, 51% of US companies use AI in hiring, projected to rise to 68% by end of 2025, and 82% of AI-using companies use the tools specifically to review résumés. Primary source: ResumeBuilder.

The gap is the story. Employers know the tools are biased and use them anyway, because the volume problem is more immediate than the bias problem. Workday Recruiting customers processed 173 million applications in H1 2024, up 31% year on year, while job openings grew only 7% — applications now scale at roughly four times the rate of openings (Workday Global Workforce Report). Filtering harder is the path of least resistance. AI screeners are the filter.

Gender, Disability, and the Vendor Track Record

The full bias picture extends beyond race and age. Three vendor-side incidents fill in the rest.

Amazon's internal AI recruiter (2014–2018). The system was trained on a decade of historical résumés submitted to Amazon. Because the historical pool skewed male, the model learned to downgrade résumés that contained the word "women's" (as in "women's chess club captain") and to penalise graduates of two all-women's colleges. Amazon scrapped the tool before deployment. Source: Reuters, October 2018. It is the canonical example of historical-data bias replicating into algorithmic output.

HireVue (2021). The company dropped facial-expression analysis from its core product after sustained academic criticism by Cathy O'Neil's ORCAA and others. The replacement is text-based and structured-interview-based. Full bias audits remain internal. Vendor announcement: HireVue press release.

ACLU charge against Aon (December 2023). The ACLU filed an EEOC charge alleging Aon's AI-driven personality and cognitive assessments — sold to hundreds of large employers — screen out autistic applicants and Black applicants in violation of Title VII and the ADA. The case is at the EEOC investigation stage. Filing: ACLU news release.

On the regulator side, the EEOC and DOJ issued joint guidance in May 2022 warning that algorithmic decision tools — résumé scanners, chatbots, video-interview AI, and gamified assessments — can violate the Americans with Disabilities Act by screening out qualified applicants, even when the bias is unintentional. The EEOC's strategic enforcement plan for fiscal years 2024–2028 names AI in hiring as an explicit priority. Sources: EEOC ADA + AI technical assistance, DOJ AI & ADA resource.

What Counts as a Primary URL — and What Doesn't

The single biggest reason articles on this topic mislead is poor citation hygiene. Three patterns to watch:

1. The "Cornell 20–60%" myth. A widely shared LinkedIn claim asserts that Cornell research shows candidates with manual résumés lose in 20–60% of selection cases. A search across the Cornell ILR School, Cornell Career Services, NBER, arXiv, and Google Scholar finds no such study. Treat any citation without a paper title and DOI as fabrication. The detailed take-down lives in our sibling article, The Cornell 20–60% ATS Myth, Debunked.

2. The "75% ATS auto-rejection" myth. Traced to a 2012 sales pitch by Preptel, a résumé-optimisation startup that went out of business in 2013. No methodology, no sample, no peer-reviewed source — only a marketing claim that survived its source. Independent investigations and the largest ATS-optimisation vendor (Jobscan) both state plainly that ATS systems do not auto-reject on formatting or content. Detail: The 75% ATS Rejection Myth, Debunked.

3. Vendor surveys as the only anchor. Resume.io, Resume Now, Resume Genius, and ResumeBuilder all sell résumé-writing services. Their data is usable — and we cite three of them above — but each has a self-interest in the AI-rejection narrative because that narrative drives demand for their product. The discipline is to pair vendor surveys with academic anchors (NBER, arXiv, peer-reviewed journals) and federal filings (EEOC, court orders), never to use them standalone.

The two AI résumé detector vendor claims are also worth flagging here: Originality.ai's published "99% accuracy" figure is measured on its own curated test set; Scribbr's independent August 2024 evaluation found GPTZero correct only 52% of the time and Originality 76% on the same materials. Stanford HAI's analysis of 10,000+ samples found false-positive rates can exceed 20% on non-native English writers (Stanford HAI). For full deconstruction: The Originality.ai 99% Accuracy Myth.

What the Evidence Means If You Are the Jobseeker

Five practical conclusions follow from the evidence, in descending order of confidence:

  1. If you are 40 or older and applying through Workday-screened employers, document every rejection. The Mobley collective action covers every US applicant aged 40+ rejected from a job through a Workday AI tool since September 2020. Class counsel's contact details are on the Civil Rights Litigation Clearinghouse docket. Save timestamps, employer names, and the rejection emails.
  2. If you are a non-native English writer worried about AI detectors, the Stanford finding is your evidence. A documented >20% false-positive rate on your writing class is precisely the kind of disparate-impact evidence regulators are looking for. Keep drafts and timestamps.
  3. If you are first-generation college and applying through any tech-enabled hiring funnel, the Belmi data is mixed news. Default disclosure is a 26% callback penalty. Reframed disclosure (resilience, adaptability, named accomplishments) recovers most of the loss. The choice is not "hide or disclose" — it is which frame to use.
  4. If you are Black and applying to large US employers, the firm-level discrimination is concentrated. NBER 29053 shows ~20% of firms account for half the gap. The same firms keep showing up in academic and regulatory action. Pick employers by their measured behaviour, not their careers-page diversity statement (the Kang ASQ finding is that the statement is uncorrelated with actual treatment).
  5. Verify the role fits before optimising the résumé. A polished AI-assisted résumé for the wrong role still fails. Run a structured assessment — Career Match for role shortlist, our methodology page for how the matching works, and the AI in Hiring guide for the broader funnel context. The full résumé-side playbook for working with AI without getting flagged: How to Write a Résumé with AI Without Getting Rejected. The companion citation hub: AI Résumé Statistics 2026.

Headline Evidence — Primary URLs in One Table

#FindingSource (n)Primary URL
185% white-name LLM preference; 11% female-nameUW & AIES (3M+ comparisons)arxiv.org/abs/2407.20371
29.5% callback gap for Black names across Fortune 500Kline/Rose/Walters NBER (n=83,000)nber.org/papers/w29053
3"Whitening" doubled Black callbacks (10% → 25%)Kang et al., ASQ 2016journals.sagepub.com/doi/abs/10.1177/0001839216639577
4iTutorGroup auto-rejected 200+ on age; $365K EEOC settlementEEOC enforcementeeoc.gov/newsroom/itutorgroup-pay-365000-settle-eeoc-discriminatory-hiring-suit
5Mobley v. Workday: ~1.1B rejections, ADEA collective certifiedNDCA case 3:23-cv-00770 (May 2025)clearinghouse.net/case/44074/
674% of workers 50+ see age as a hiring barrierAARP (n=3,580)aarp.org/pri/topics/.../older-job-seekers-age-discrimination-artificial-intelligence
7First-gen disclosure cuts callbacks 26%; reframe recovers 26% → 47%Belmi et al., Stanford GSB / Org Sciencegsb.stanford.edu/insights/do-first-gen-college-grads-face-bias-job-market
867% of US business leaders say their AI tools are biasedResumeBuilder (n=948)resumebuilder.com/7-in-10-companies-will-use-ai-in-the-hiring-process-in-2025-despite-most-saying-its-biased
911.3% callback for non-English names (vs 26.8% English) in AU leadership rolesMonash / ANU (n=12,000+)monash.edu/news/articles/study-confirms-english-sounding-names-get-more-call-backs
10EEOC + DOJ joint guidance: AI hiring tools can violate ADAFederal guidance, May 2022eeoc.gov/laws/guidance/americans-disabilities-act-and-use-software-algorithms-and-artificial-intelligence
11AI detector false-positives >20% on non-native English writersStanford HAI (n=10,000+)hai.stanford.edu/news/ai-detectors-biased-against-non-native-english-writers

What This Page Doesn't Claim

For symmetry: the evidence above does not support the claim that every commercial AI screener is equally biased, that all hiring algorithms produce identical disparate impacts, or that human screening would be unbiased in their place. The Kline–Rose–Walters paper makes the second point explicit — discrimination is concentrated in a minority of firms. The Kang–DeCelles paper makes the first point empirically — different signals produce different bias magnitudes. And the historical record makes the third point uncomfortable — the offline baseline is what AI screeners learned from.

The honest summary is that documented AI-screening bias is real, large, concentrated in some vendors and some employers more than others, legally actionable under existing federal statutes, and now sitting under the discovery process of the largest employment class action ever filed. Whether your specific application was screened out for the right reasons or the wrong ones is empirically unanswerable for any single case. The aggregate pattern, by contrast, is now a matter of court record.

A note on freshness. Every figure on this page traces to a publication or filing dated 2022 or later, and three of the load-bearing items — Mobley collective certification (May 2025), AARP survey (January 2025), and UW/AIES (October 2024) — are 2024–2025 vintage. Anything older has been retained because the underlying research has held up under replication or because the regulatory record is what it is. This page is reviewed against the EEOC docket and the Civil Rights Litigation Clearinghouse on a rolling basis; the Last Updated stamp at the top reflects the most recent re-verification of the primary URLs. If a link below has rotted, the canonical Wayback Machine snapshot is the fallback, never a vendor blog substitute.

For teams & organizations
Free assessments for HR teams

57 tests for hiring, onboarding, and team development. No per-employee fees, no certification.

Learn more

References

  1. Wilson, K. & Caliskan, A. (2024). Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval
  2. University of Washington News (2024). AI tools show biases in ranking job applicants by name
  3. Kline, P., Rose, E. K. & Walters, C. R. (2021). Systemic Discrimination Among Large U.S. Employers (NBER WP 29053)
  4. Becker Friedman Institute (2024). A Discrimination Report Card — research summary
  5. Kang, S. K., DeCelles, K. A., Tilcsik, A. & Jun, S. (2016). Whitened Résumés: Race and Self-Presentation in the Labor Market
  6. US Equal Employment Opportunity Commission (2023). EEOC v. iTutorGroup — $365,000 Settlement Over AI Age Discrimination
  7. US District Court, ND California (3:23-cv-00770) (2025). Mobley v. Workday — Civil Rights Litigation Clearinghouse case 44074
  8. Seyfarth Shaw (2024). Mobley v. Workday — Court Holds AI Service Providers Could Be Directly Liable as Agents
  9. Perron, R. — AARP Public Policy Institute (2025). Older Job Seekers, Age Discrimination, and Artificial Intelligence (n=3,580)
  10. Belmi, P., Neale, M., Thomas-Hunt, M. & Raz, K. (2023). The Social Advantage of Miraculous Resilience: First-Generation Status as Stigma in the Workplace
  11. Stanford Graduate School of Business Insights (2023). Do First-Gen College Grads Face Bias in the Job Market? — institutional summary
  12. ResumeBuilder (2024). 7 in 10 Companies Will Use AI in Hiring in 2025 — Despite Most Saying It's Biased
  13. Monash Business School / Australian National University (2024). Study confirms English-sounding names get more call-backs from job applications
  14. Workday (2024). Workday Global Workforce Report (H1 2024)
  15. Dastin, J. — Reuters (2018). Amazon scraps secret AI recruiting tool that showed bias against women
  16. HireVue (2021). HireVue Leads the Industry with Commitment to Transparent and Ethical Use of AI in Hiring
  17. ACLU + EEOC (2023). EEOC Charges Aon Hiring Practices Discriminate Against Workers with Autism and of Color
  18. US Equal Employment Opportunity Commission (2022). The Americans with Disabilities Act and the Use of Software, Algorithms, and AI to Assess Job Applicants
  19. US Department of Justice (2022). Algorithms, Artificial Intelligence, and Disability Discrimination in Hiring
  20. Stanford HAI (2023). AI detectors are biased against non-native English writers
  21. Scribbr (2024). Best AI Detector — Independent Comparison Test (GPTZero 52%, Originality 76%)

Ready when you are

Find your ideal career match in 3 minutes.

12 questions, instant result, free forever. No email, no signup — just answer and see your type.