So around a year ago, a friend of mine was looking for a house to buy somewhere around Amsterdam. Something that turned into a great hobby of his, sometimes spending hours (and hours) on housing websites. During one of these searches he found a single street with 6 houses for sale. So he told me: “That is not a good sign, there is probably something wrong with that neighborhood.”
Statistics is not something that everyone is equally enthusiastic about and not somethign that comes naturally to us all. However, what does come naturally and what is much easier for our brains to understand are stories.
To to help my team understand and remember statistical constructs I've started telling them stories about statistics and the first one is about the Texas Sharshooter.
The human brain is basically a pattern recognition machine: even when we have deficient data, our brain tries to find patterns so it can make decisions on that. This strategy made sense in a world where we lived in caves and didn’t have that much data. Our brain had a “better safe than sorry” strategy. But that doesn’t mean our brain is right. The clustering of houses for sale could also just be a coincidence without a cause.
This line of reasoning is known as the texas sharpshooter fallacy. This 'false cause' fallacy is coined after a marksman (who I assume was in or from Texas) shooting randomly at barns and then painting bullseye targets around the spot where the most bullet holes appear, making it appear as if he's a really good shot. But clusters naturally appear by chance and don't necessarily indicate that there is a causal relationship.
The same thing happened to my friend: his brain saw the clustering of houses for sale, assumed a pattern and even conjured up a reason for it: namely that there would be something wrong with the street. But just as with the bullet holes on the barn, the clustering of houses for sale could also be a coincidence. And even if it’s not, there can be many other reasons for it other than a bad neighborhood.
So this is the Texas sharpshooter fallacy, also known as the clustering illusion. And it appears everywhere around us. You’ve probably seen many of these cases in the newspaper. For example when journalists find clusters of people having cancer they often quickly assume it has to be something in the environment like water or air pollution. Or when McDonald's shows you their research that shows that out of their top 5 countries, 3 of them are in the top 10 healthiest countries on earth, and therefore their food is also healthy.
To our brain, this makes perfect sense, which makes this a very tricky fallacy. Without any hypotheses these journalists take data, look for clusters and make up a reason. But like with all data, outliers will always randomly appear and that doesn’t necessarily mean that there is a significant link.
So don’t go around looking for bullet holes and paint bullseyes around them. So when you are working in conversion optimization, this means for example that when we look at Google Analytics, there will always be outliers and clusters. If you have no hypothesis and weren’t looking for these clusters, you will always need a follow-up study with new data to confirm if you are looking at a significant connection or just a random fluke.
So I hope this story will help you be critical about the data you see in your work. Next time I will have another fallacy story for you.
And by the way: when my friend asked the estate agent about the street with 6 houses for sale the estate agent started laughing and she said: “It’s a very long street” :)
Most of my content is published on LinkedIn, so make sure to follow me there!
Recently I've seen some (often absolute) statements going around, generally in the line of "open source commerce platforms are a terrible idea". Now of course different solutions always have different pros and cons.
A hierarchy of evidence (or levels of evidence) is a heuristic used to rank the relative strength of results obtained from scientific research. I've created a version of this chart/pyramid applied to CRO which you can see below. It contains the options we have as optimizers and tools and methods we often use to gather data.