Darrell Huff first published How to Lie with Statistics in 1954, in which he explained how statistics were employed to intentionally deceive the inattentive consumer of information. This wasn’t a new idea; several of Huff’s predecessors had arrived at the same conclusion:
“There are three kinds of lies: lies, damned lies, and statistics.”– attributed to British Prime Minister Benjamin Disraeli (1804-1881)
“Facts are stubborn things, but statistics are pliable.”– Mark Twain (1835-1910)
[Some individuals] use statistics as a drunk man uses lamp-posts — for support rather than for illumination.– Andrew Lang (1844-1912), Scottish writer and anthropologist
Some were even more skeptical of the use of statistics (sorry, statisticians):
“If your experiment needs a statistician, you need a better experiment.”– Ernest Rutherford (1871-1937), best known for establishing the planetary model of the atom
With that in mind, let’s look at some of the lessons from Huff’s book by picking out forms of misleading statistics:
Biased samples (also called ascertainment bias or systematic bias)
Samples can be biased in various ways, either intentionally or unintentionally, but all “systematically favor some outcomes over others.”
Huff describes the three averages: mean, median, and mode. With a normally distributed (think “bell curve”) sample, these three averages are similar. But get into irregularly distributed samples and they can vary significantly.
By skewing elements of graphs and charts, like the scale, you can paint a picture very different from the underlying truth. Look at these graphs below, and you’ll spot what’s wrong.
“Significant findings” and throwing out data
Statistical significance is determined by hypothesis testing and numeric calculation. However, “insignificant findings” are often conflated (perhaps intentionally) with “didn’t support my stance.”
In the words of Huff: “If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing.”
Correlation vs. Causation
Just because things are correlated, doesn’t necessarily mean one causes the other. Look here for some fun examples.
So, how can you avoid falling victim to the statistical charlatans? Start with these questions:
- Who is making the claim?
- How do they know/where is the evidence?
- Are we provided the whole picture?
- Has someone changed the topic? Are we still discussing the original question?
- Does the argument make sense?
Now you, too, can fight the good fight against misleading statistics!