The Reliability Question
You take a personality test on Monday and discover you are an INTJ. You retake it on Friday and discover you are an INFP. Did your personality change in four days? Of course not. What changed was the test's measurement — and this inconsistency points to a fundamental issue called test-retest reliability.
Test-retest reliability is perhaps the most important quality metric for any personality assessment. If a test cannot give you consistent results, nothing it tells you can be trusted. Understanding this concept helps you evaluate which assessments deserve your attention and which are entertainment at best.
How Test-Retest Reliability Is Measured
Researchers administer the same test to the same group of people at two different time points (usually 2-8 weeks apart), then calculate the correlation between the two sets of scores. A perfect correlation (1.0) means identical results every time. A zero correlation means the results are completely random.
In personality testing, reliability coefficients above 0.80 are considered good, above 0.70 acceptable, and below 0.60 concerning. These numbers might seem abstract, so here is what they mean in practice: at 0.90 reliability, your scores will be very similar each time. At 0.70, there will be noticeable variation. At 0.50, the test is essentially flipping a coin for some of your scores.
How Major Tests Compare
Big Five (NEO-PI-R)
Test-retest reliability: 0.79-0.83 over six months. This is strong — your Big Five scores today will be very similar to your scores six months from now. Importantly, the Big Five measures traits on continuous scales, so small score variations do not change your overall profile. Moving from the 72nd to the 68th percentile in Extraversion is not a meaningful change.
MBTI
Test-retest reliability is the MBTI's most cited weakness. Studies report that 36-76% of test-takers receive a different four-letter type when retested after five weeks. This is because the MBTI forces continuous traits into binary categories. If your Thinking-Feeling preference is 51% Thinking, you are labeled T. Next week, if it shifts to 49% Thinking, you are suddenly labeled F — even though virtually nothing changed.
Enneagram
Research on Enneagram reliability is more limited, but available studies suggest moderate test-retest reliability (0.65-0.85 depending on the instrument). The Riso-Hudson Enneagram Type Indicator (RHETI) has the strongest data, with about 72% of test-takers receiving the same core type on retest.
DISC
DISC reliability varies significantly by provider, as there is no single standard DISC instrument. High-quality versions (like Wiley's Everything DiSC) report reliability above 0.80. Lower-quality versions may have much weaker reliability. The lack of standardization makes general DISC reliability claims difficult.
What Causes Inconsistent Results?
Mood effects: Your current emotional state can shift your responses. Feeling anxious? Your Neuroticism score rises. Had a great social weekend? Your Extraversion score may increase.
Context effects: Where and when you take the test matters. Taking a career-focused personality test at work versus at home can produce slightly different results because you are thinking about different versions of yourself.
Measurement error: All psychological measurement contains some random noise. Well-designed tests minimize this, but it is never zero.
Actual change: Over longer time periods (years, not weeks), personality can genuinely shift — particularly Neuroticism (which tends to decrease with age) and Agreeableness (which tends to increase).
How to Get the Most Reliable Results
- Take the test when you are in a neutral emotional state — not during a crisis or celebration
- Answer based on your general patterns, not your current moment
- Choose tests with continuous scoring (Big Five) over categorical systems (MBTI) for important decisions
- Take the test twice, a few weeks apart, and look for consistent patterns
- Use multiple different assessments and trust the convergent findings
Take Reliable Assessments
- Big Five Personality Test — highest reliability of any major framework
- MBTI Assessment — use for exploration, not definitive classification
- Enneagram Test — moderate reliability with rich qualitative insight