Dunning-Kruger Effect

The Dunning-Kruger Effect May Be a Statistical Illusion

Research finds the effect is statistically due to other psychological factors.

Posted April 12, 2020 | Reviewed by Gary Drevitch

The Dunning-Kruger Effect enjoys a similar kind of fame as the Marshmallow Experiment, capturing the fancy of people interested in predictors of performance and success and shaping our ideas and expectations in these important areas.

While the Marshmallow Experiment, taken as truth but recently questioned, suggests that future achievement is predicted by early-life ability to delay gratification for future reward, the Dunning-Kruger Effect (DK effect) asserts that experts are more likely to have an accurate read on their own capabilities, whereas less experienced people are more likely to overestimate.

It is generally taken as a cautionary tale to dissuade the less experienced from hubris and cement humility and respect for uncertainty in the advanced. The presumed explanation is that experts are better able to self-assess as a result of their expertise. But what if it isn't true?

Psych 101

In a recent study in the journal Intelligence, authors Gignac and Zajenkowski (2020) were skeptical of the DK effect. They noted that the original study doesn’t control for two common sources of distortion from other psychological and statistical influences.

First, the better-than-average effect. Research shows that regardless of ability, the majority (95 percent) judge themselves as better than they actually are. For instance, average IQ is, by definition, 100 points. If you sample the general population and ask them to guess their own IQs, and then take the average, it doesn’t come out to 100; it comes out to 115. We judge ourselves as better-than-average regardless of intelligence. If we err on the side of believing in self-efficacy, it helps performance because we’ll be more persistent.

The second statistical concept which could confound the DK effect is regression toward the mean. Regression toward the mean happens because of the bell curve. There are a lot more data points in the middle than at either extreme, so if you take a few measurements they will naturally appear to move to the center because random sampling will find more average values.

We tend to ascribe intention to random events, because it helps us make sense of the world by being able to predict deliberate actions. This bias backfires when mathematical patterns are taken to be something they are not and we read psychological meaning where there is none, often harmfully.

For the DK effect to hold, there would have to be more error at the lower range of the measurements and less at the higher end, taking regression into account. The regression toward the mean would have to change across scores to prove there is less error at higher scores. This can be looked at statistically by calculating “heteroscedasticity," a measure of variation in regression residual differences. A residual is the difference between the observed score and the predicted score. Residuals would be smaller if the DK effect is there. If the residuals are the same, the DK effect does not appear to be present.

Likewise, if the DK effect is there, the correlation between scores will smoothly rise going from low to high values as accuracy increases. A nonlinear regression would effectively remove false rises in the curve due to statistical issues. If no rise is seen after this analysis, the DK effect is not likely.

Two Studies

In order to address these concerns, the researchers conducted two studies to test the DK effect. One study used simulated data, bearing no relation to actual measurements on human subjects, to see if they could replicate the DK effect where it couldn't be. In the second study, they measured real subjects' intelligence and analyzed the data as it has been in most DK studies, and with controls for the above factors.

Simulated data: Could standard statistical approaches create a false DK effect? They simulated the greater-than-average effect to make false IQ scores, using one pool of scores with a higher average, like peoples' high guesses about IQ, and comparing them with a simulated data set with a lower average, representing the real scores.

When they did the same kinds of tests in classical DK experiments on the simulated data, breaking data into categories and looking at accuracy at each level, they seemed to see a DK effect, but it was due to the statistical approach used on simulated greater-than-average data. The outcome was the same, but it was impossible that it was due to a psychological factor because the data wasn't real.

Dunning-Kruger Effect Essential Reads

How to Outsmart Yourself

Scientific Expertise vs. the Dunning-Kruger Effect

Actual data: They then conducted experiments with real scores from people to see if the DK effect persisted after controlling for heteroscedasticity and linearity, to factor in the greater-than-average effect and regression toward the mean.

They tested 929 subjects with an objective measure of IQ (Advanced Progressive Matrices) and a 25-item self-assessment. They analyzed these data first using the same kind of approach used in most DK studies, and also using additional techniques to see if the DK effect held up after controlling for factors discussed above.

Does the Effect Hold Up to Statistical Rigor?

Using the traditional approach, the DK effect seemed to be present. At higher IQs, self-assessment and objective measure were more highly correlated. The curve was smooth from low to high end, and it appeared that smarter people were more accurate in assessing their own intelligence.

However, looking at the residuals showed that the error in guessing was actually the same regardless of intelligence. They found that the residuals themselves had a normal distribution. If the DK effect were actually there, the error would decrease at higher IQs, but it does not.

Finally, they looked to see whether there was a smooth upswing in accuracy, but after nonlinear regression, they showed that the correlation between self-assessed and measured intelligence across low, middle, and high scores was a straight line. If the DK effect were there, it would curve upward accuracy relative to score increased.

Don't Be So Sure

Future research on the DK effect, the authors conclude, could include statistical controls like the ones they use here. Take the DK effect with a grain of salt, especially if important decisions are being made on the assumption that it is an established psychological law.

Like the Marshmallow test, which could bias decision-making by categorizing a child negatively with ensuing self-fulfilling prophecy, the DK effect casts doubt on the assumption that less intelligent, less skillful people are not attuned and exaggerates the accuracy of more experienced people in self-assessing. Doing so without basis could cause issues with how training is conducted and performance evaluated, and undermine the confidence of less experienced people in directing themselves.

For those on the higher end, having a false sense of being more accurate in self-appraisal could interfere with learning and give a false sense of security and insightfulness, reinforcing negative personality traits and problematically biasing perceptions of oneself and others.