Tom and Sally throw a ball through a hoop over two days. In total, they each throw the ball the same number of times.
Over Day 1 and Day 2 respectively, Sally’s success rate is greater than Tom’s. When the two days’ totals are combined, Tom walks away a clear winner.
This is an example of Simpson’s paradox (also known as the Yule-Simpson effect). It’s confusing, and usually met with disbelief; part of us instinctively rejects the mathematical truth that this can occur - and that’s because it jolts against our personal systems for interpreting data.
But, it is worth noting, that ‘Simpson’s paradox’ is not a paradox at all; it is simply the result of misconceptions about causality, and poor interpretation of data.
University College Los Angeles has produced research on the paradox, and it is worthwhile highlighting a distinction made by their computer science professor Judea Pearl.
He assesses that ‘Simpson’s Paradox’ is a psychological matter, and a reflection of the surprise associated with the data - that feeling of ‘what?’ when you read the riddle at the start.
‘Simpson’s reversal’ is a mathematical phenomenon relating to calculus - the mathematical truth that testifies to Tom’s victory.
Below is the data of Tom and Sally’s competition:
When you see the raw data, the problem unravels. The ‘reversal’ in data comes about because of the unequal distribution of balls thrown by each person on each day. As already noted, Sally’s success rate - or the percentage of balls she put through the hoop - is greater than Tom’s over Day 1 and Day 2, but, in total, Tom wins.
In other words, comparing Tom and Sally on either given day is not an equal comparison, and the ‘success’ needs to be understood in that context.
The alternative is that you see Sally’s victories as signifying more than they should. The ‘paradox’ comes by way of my presentation of the data, and the narrative I created around the competition; a narrative, nonetheless, that was empirically upheld.
This points us to the importance of Simpson’s paradox: there is data, and then there is data.
Why Simpon’s Paradox matters: data and decision-making
The obvious antidote to all this is to look at your data more carefully. However, in the business world, the conditions for such a paradox become more complicated, and the paradoxes themselves more opaque.
Here’s a hypothetical example. A women’s shoe retailer is undergoing market research for product validation: they have two shoes at the beta phase, and only have the budget to release one. They need to make sure they release the shoe that will sell best.
To reach their target market, the company draws up four demographic qualifiers. Shoe A performs better than Shoe B in all four categories.
The percentages and parentheses represent the people who responded that they would purchase the shoe.
Yet in total across the four demographics, 1,068 (35.6%) people would buy Shoe A, and 1,790 (59.6%) would buy Shoe B. That is significant swing in favour of the shoe that performed poorly on each category at an individual level.It looks like Shoe A is the clear winner here.
Across the market in aggregate, Shoe B dominates.
The solution resides in distinction. Do not conflate each subsection or each demographic’s performance with the overall outcome; equally, do not dismiss the subsection results as unrepresentative or skewed.
Shoe A’s performance is relative and valuable, and the comparison with Shoe B is mutually illuminating - it serves to reveal where Shoe A performs well, and its competitor less so.
It is about understanding the data in front of you. What is helpful is to work out the story behind the data, and the reasons for why you have these results in front of you.
For example, it is important to recognise that if the demographic sample sizes were the same for each demographic category, Shoe A would walk out a clear winner (for more on how sample sizes and data representation can skew data interpretation, see our article on denominator neglect).
It is vital, too, to recognise that, relatively speaking, Shoe A is more popular with Over 25s, but Shoe B is popular with more Over 25s.
If you can make that kind of distinction, then you will avoid the pitfalls and costly business decisions that Simpson’s paradox causes.