Dieser Text ist eine Replik auf eine englischsprachige TV-Sendung und daher in Englisch verfasst. Der Inhalt ist aber auch in Europa von Relevanz.

The article is long, I know that, but please consider reading it whole before jumping to conclusions. Thank you.

I am German but I have some rudimentary interest in US politics. The last episode of »Last Week Tonight with John Oliver« ran a story that made me want to discuss statistics a bit. We all have heard about the tragic events in Ferguson and its background of racial discrimination by the local police, especially the racial profiling. I once asked some random people on 4chan who defended this practice, why they think it was justified. I got a lot of answers that used very compelling arguments rooted in freely available data. The following argument I want to present is not what I got, but it is what I made of it with data I could find on my own.

Let us take a look at the FBI criminal statistics, e.g. the murder statistics. The argument would also work with other statistics, but I think we can agree that murder is one of the (if not the) most serious crimes. As we can see in 2011 most of the murder offenders whose race was recorded are either white or black, which is no wonder, since those are the two biggest racial demographics in the US. There were 4729 white murder offenders and 5486 black ones. This means for every 100 white murder offenders, there were 116 black ones. These numbers get far worse, if you compare them to their respective base populations[1]. This means 0.002% of whites and 0.014% of blacks became murder offenders, i.e. blacks are about 7 times more likely to become murderers than whites. So it is justified that blacks are watched more by the police than whites.

If you think this argument, which we shall call the »murder argument«, is (racist) nonsense, you are right. Most people who opposed the argument then on 4chan tried to make the (IMHO weak) point, that even if blacks are more likely to commit crimes, they should not be watched more. The point I want to make is, that the premise itself is false because it makes three basic statistical errors, which I want to explain now.

The overvaluation of single indicators

People often search for single indicators to assess problems. We see that in companies using so called »key performance indicators« and in political discussions. The murder argument is such a single indicator. But this single indicator fails to paint a comprehensive picture of the situation. If hypothetically the black offenders would mainly be repeat offenders and the white ones mainly first time offenders, that would mean that white people are way more prone to murder than previously estimated. So this one single indicator, tells us little to nothing.

The Correlation-causation-fallacy

One of the most basic statistical errors, that is even made by many scientist, is the idea, that correlation implies causation. Simply speaking, two attributes are correlated, if they often show up together, like in the murder argument, the skin color and the fact if the person is a murder offender. But just because two attributes are correlated does not mean, there is a causation between these two.

Most people would think, the idea that the murder argument implys that being likely to kill someone causes you to have black skin, is ridicoulus. But for some reason many people find it perfectly reasonable to conclude, that having black skin causes you to be more likely to kill someone.

In my own political experience, people like simple answers to complex problems. They like to hear simple cause and effect relations, like being black makes you more likely to kill someone. But reality is often not that simple and there is a vast number of different intertwined causes for each different effect. Another possible (and still way too simple) explanation for the numbers in the murder argument could be, that black people are more likely to be poor and that poor people are more likely to commit murder. In this case suddenly the cause is not race, but wealth. This can be best illustrated by and therefore leads us to the last error.

Simpson’s paradox

Simpson’s paradox is an apparent statiscal anomaly caused by the two errors we have seen so far. To explain this, we use a simple, fictional text book example, which was slighty modified to fit the murder argument. Let’s assume there are 200000 people living in a county, all of which are either black or white. The murder statistics are as follows:

TOTAL population offenders offenders/population
white 140000 59 0.042%
black 60000 51 0.085%

Looks like this county has a murder problem and that blacks are more than twice as likely to commit murder than whites. But let’s add another factor into the analysis and roughly group the people into a poor and a rich class. Now the data looks as follows:

POOR population offenders offenders/population
white 50000 50 0.1%
black 50000 50 0.1%
RICH population offenders offenders/population
white 90000 9 0.01%
black 10000 1 0.01%

As we can see, in both groups the relative frequency of offenders is identical. This seems to be a contradiction to the total numbers, but it is not. We see a strong correlation between wealth and number of murder offenders in the detailed data, but no correlation between race and number of murder offenders. If we aggregate to the total data and therefore lose the wealth information, the latter correlation suddenly seems to exist. This is because there is also a strong correlation between wealth and race (90% of rich people are white). So the total data falsely attributes the correlation of the number of murder offenders with wealth to race instead.

So even when we speak only of correlations and avoid the correlation-causation-fallacy, we can still draw false conclusions about correlations, when we aggregate the data too much, for example into a single indicator.

Interim conclusion

In the murder argument we saw an example for bad statistics. Even though the numbers are sound, the conclusions drawn from them are not. But this example is not an argument, that using statistics for fact based policy making is not possible. To the contrary, I am all for fact based policy making. But we cannot aggregate complex problems into simple numbers to draw simple solutions from, but rather have to understand the problems‘ complex cause and effect relationships. Therefore it is necessary to determine the most important causes (the »signal«) and distinguish them from weak causes / pure randomness (the »noise«). After that these determined causes can be the topic of political debate.

The second example

After all the talk about the example why crime statistics do not justify racial profiling you might be surprised, that John Oliver’s segment on Ferguson, was not the reason why I wanted to write this article. The actual reason was the later segment on equal pay.

I wrote the entire first half of this article to illustrate the statiscal fallacies on an example, that would make it easy for a more liberal audience (which I assume to be the major part of Oliver’s crowd, me included, I am a big fan) to follow. But not only hard right-wing conservatives like those guys I met on 4chan misuse statistics if it fits their political narrative, liberals do too. Oliver’s equal pay segment is the best example.

The »77 cents on the dollar«-figure which is widely used in the media, is a comparison of median income values and therefore a simplified single indicator, that has the same meaning as the murder argument (few to nothing). Oliver actively discourages analysis of the causes for the wage gap, mocks journalists trying to report more sophisticated figures and basically advocates for measuring everything on the single indicator (which he adjusts to an 83 cents figure).

For example he uses a metaphor to explain why it does not matter how big the pay gap is. If someone had taken a dump on his desk, he would not care if it is 6 inches or 2 inches long, he does not want crap on his desk. This propagates the single cause fallacy (on person taking one dump on his desk). To be more realistic, there would have to be a huge pile of stuff on his desk, some of it being crap of different shapes and sizes from different people, some being just dust (noise) and some being choclate (stuff that looks like crap but is not inherently bad). Part of a statistical analysis would be to seperate all the different stuff on the desk, finding out what is actual crap and what we would have to do for each crapping person to stop crapping. Sounds disgusting? Well, complex problems need complex solutions.

In another segment a fake ad for 83 cent bills to pay female employes was shown. This is statistical nonsense on so many levels, since this 83 cent figure is just an average single point estimator. To be statistically sound, each employe would need a special bill, one being paid in 83 cent bills, another in a 123 cent bill, the next with a 98 cent bill and so on for every employe male or female.

Advocating to measure policy on this one single indicator, is scientifically speaking at least careless. Going back to the murder argument, we could tell white people to commit more murders. This would straighten the figure, but would probably be crowned the most stupid politcal idea in human history. A less stupid but IMHO still very bad idea would be to straighten the pay gap figure by introducing legislation that would oblige employers to discriminate men.

The simple fact is, that the whole pay gap issue is a very complex problem, that has a lot of different causes. We have a very good idea about some of those, but still about 5 to 8 cents of the pay gap figure is unexplained. And by the way, unexplained means that we do not know where it came from, not that this is the result of gender discrimination. To properly get rid of the pay gap, we have to address a lot of different issues, many of which are hardly solvable by legislation, e.g.

  • we have to encourage girls and women to be willing to take more risk in their career decision, e.g. majoring in computer science and engeneering instead of social sciences and education
  • we have to distribute the burden of child care onto both genders[2]

Final conclusion

I actually think John Oliver and his writers are trying to make a very important point. But advocating oversimplification of statistical numbers just because it fits a »good« political narrative is wrong. I do not think Last Week Tonight is hostile to science, the segment were they debunked the idea that climate change is a scientifically disputed fact shows otherwise. But that means I was all the more disappointed, when they repeated the statistically unsound approach to the pay gap problem. But as I said, even scientist often fail with statistics.

Nevertheless, I think the problem of wrong interpretation of statical figures needs to be addressed and this article is my personal take on it.

[1] The demographic data is from 2010 while the crime data is from 2011, so this number is not exact, but should be a pretty good estimator.
[2] Talking about gender in this context means the two major genders. No offense to non-male-female gender people intended.