TL;DR
In this post, we examine several statistical and logical fallacies that seem to arise often in discussions of race and race relations. We will just mention a few in passing, but three of the more problematic fallacies (slippery population, sample demographics, and asymmetry of conditional probability) will be examined in more detail with some examples.
Non-exhaustive list
- Sample size of one – mentioned before here: knowing a mean characteristic of a population tells you precisely nothing about an individual (or the average of a nonrandom sample) drawn from that population.
- Arguing from sample to population – also mentioned before here: knowing a characteristic of an individual (or the average of a nonrandom sample) tells you precisely nothing about the population from which it is drawn.
- Slippery population: a form of argumentation in which the population in question is “stealth-adjusted” to make the results of a sample appear to be more or less significant than they are (discussed below).
- Sample demographics: ascribing false “significance” to a sample having significantly different demographics than the population from which it is drawn (discussed below). The actual conclusion is that the sample is likely nonrandom (systematic).
- Asymmetry of conditional probability (see below): using the false intuition that “the probability of A given B” and “the probability of B given A” are similar in magnitude. If A and B are events of widely differing absolute probability, the two conditional probabilities will differ widely as well.
- naturalistic fallacy: “is implies ought,” the world as it exists is morally correct (“Natural foods” fallacy).
- reverse naturalistic fallacy: “ought implies is,” what is morally correct is what exists in the world (Fallacy of wishful thinking).
- composition fallacy: “moving from part to whole,” what is true of a part (even of all the parts) must be true of the whole – “atoms are not alive, therefore nothing made of atoms is alive.”
- ad hominem: the fallacious argument that a statement is false because of some intellectual or moral failing of the maker (fallacy because even fools and knaves can make a correct statement).
- ad auctoritatem: the argument that a statement is true because the maker is an authority (fallacy because even experts can be, and often are, wrong or misunderstood).
- post hoc ergo propter hoc: the argument that when one event follows another chronologically, that the first caused the second (sometimes used even when the two occur at nearly the same time).
- traitorous critic (ergo decedo): a member of a group who criticizes the group is a traitor, and should leave, or perhaps should have left already if the criticism is true (if the criticism is particularly severe, and leaving is particularly easy, one may reasonably wonder why they stay, but it doesn’t invalidate their criticism).
- others – there are many lists of such, see here and here, for example.
More details –
Here are a few detailed looks at the most common fallacies (specifically statistical in nature) that occur fairly frequently in discussions about race relations.
Slippery population
What I refer to as the slippery population fallacy is not a formal fallacy. Rather, it’s a trick of statistical argumentation or persuasion that, in many cases, is intentional.
First, take the example of Geico and Allstate insurance companies. These two companies have ads that seem to suggest that switching from your current inferior automobile insurance carrier to their vastly superior one will save you significant money on your premiums. In fact, they both tout that there is a substantial average annual amount of money saved by making this switch. So, why not repeatedly switch between the two and drive your annual premium to zero? The answer is that the “average savings” does not apply to all people who check their rates – no, it applies to “customers who switched.” So, some population of people checked their rates – some found that their current insurer was cheaper – they didn’t switch; others found that there was some amount of savings, but not enough to warrant the trouble of switching insurers – they didn’t switch; others found that there was enough savings to warrant changing insurers, so they switched. The average is taken only over them. It’s not surprising that the average savings of a group of people who (by definition) were quoted substantial enough savings to change carriers is, well, substantial! No evidence is given by either insurer as to what percentage of inquiries results in a switch. Nor do they report the average among the phone calls that resulted in, “you must be nuts! That’s $1000 more than I’m paying now!”
Second, consider the example of the racial demographics of head coaches in the NBA; specifically the proportion of Black head coaches. This situation is rather fluid (I believe that as I write this, there are 11 out of 30; last year there were 4; nine years ago there were 14) but my point is that even at a low ebb for the league (there was a LOT of criticism at the point where there were only 4) the percentage was 13.3% – very slightly lower than, say, the 2016 general population at 13.4%. And other years (like now at 36.7%) it’s been much higher than the general population. One might think that Ibram Kendi et al. would be ecstatic about the state of the NBA coaching world, but they’re not! Because, though the percentage of Black head coaches compares favorably with the general population, NBA head coaches are not normally pulled out of the general population. Most of them (though not all) played in the NBA. And the demographics (2020 – most recent stats I can find) of NBA players are 74.2% Black. Seen in that light, 36.7% doesn’t look so great. It’s really a matter of which population you wish to use for comparison, though – another population that has been adduced for this purpose is the NBA fan base, which appears to be somewhere between 25% and 40% Black. Measured against that yardstick, the NBA head coaching ranks are a mixed bag – some years are OK, others aren’t.
Choice of the population for comparisons like this will almost always be determined by a narrative, and it isn’t obvious which population choice is “correct.” As a similar case, I mention the sex breakdown among math professors. It’s certainly never been close to general population demographics (approximately 50.5% female at the moment), but having chaired a number of hiring committees over my years in the profession, I can testify that the general population demographics are completely irrelevant to a hiring committee’s work. The demographics that matter are the demographics of new Ph.D.’s at a certain point in the past (how far back is dependent on the rank of the position being filled). A hiring committee can’t hire people who haven’t applied for the position (or who don’t possess the minimum qualifications for the position) and yet, pointing this out was always seen as an excuse for failure in the hiring process, rather than a reason to be patient and focus on grad school recruiting and completion rates.
A third example is a bit different, in that statistics are almost never used – it’s a matter of selective reporting and emphasis. This has to do with police killings of unarmed suspects. I will deal elsewhere with the issue of demographic proportionality – what I want to point out here is that, in any given year, more White unarmed suspects than Black unarmed suspects are killed by police (in 2016, for example, the numbers were 88 and 51, respectively; I have no reason to believe that this year was atypical in any way), and yet almost no press coverage is given to the Whites who are killed in this way. I am not downplaying in any way the injustice done in any of these cases I am about to mention, but consider: we all know the names of George Floyd and Breonna Taylor; have you heard of Tony Timpa, Duncan Lemp or Nicholas Gilbert? If not, why do you suppose that is? It’s not any significant difference in the facts of the case, as you will see if you examine them – the population of interest in reporting and decrying police killings of unarmed suspects is clearly not the entire population.
Sample Demographics
The fallacy here is reading too much into a statistically significant difference between the demographics of a sample and the demographics of the population. The ratio of the demographic percentage of a random sample to the demographic percentage of the population has a hypergeometric distribution (usually well-approximated by a binomial distribution where the population size is large) and being significantly different from 1 is easy to assess to any desired confidence level. A significant difference in this test is evidence against randomness of the sample (said differently, evidence in favor of a “systematic” that is, a nonrandom, sample). It does not, however, by itself, tell you anything about the system by which the sample was chosen other than that it likely wasn’t random. It’s very, very difficult to guard against over-interpreting this sort of test, especially if there is any narrative in place regarding the system.
One of the best ways to avoid this fallacy mentally is to do a mutatis mutandis revision of the test – that is, use the same data but slice it a different way. For example, looking at the current racial demographics of the population in Federal prison in the US, and comparing to the current racial demographics of the US population, we find that 12.4% of the population is Black*, compared to 38.1% of the prison population, for a relative risk of 3.072, while 61.6% of the population is White, compared to 57.9% of the prison population for a relative risk of 0.940, and a relative risk factor of 3.27 for Federal prison incarceration associated with being Black versus being White (note, though, that census and prison population racial definitions may differ – I’ve tried to make them as comparable as possible here, but that’s often tricky). Statistics like this are often used to posit that the Federal criminal justice system is biased against Black citizens. Doing the same analysis with sex, though, shows a relative risk factor of 13.76 for Federal prison incarceration associated with being male versus being female. If we believe the argument for racial bias, then we are forced to accept the much stronger argument that the Federal criminal justice system is biased against men. The problem, of course, is that there are many other factors (economic, cultural, geographic) that might have enough explanatory power to erase the apparent anti-Black (or anti-male) bias in these numbers – we just didn’t look at them at all. In the initial analysis, we only saw race because we only looked at race. I hasten to add that nothing I have said here is proof that the criminal justice system is NOT biased in the ways that a first look at the data would seem to imply. Both of the numbers I have calculated show that Federal incarceration is not random, but don’t prove anything beyond that. They are just indicative that more detailed study should be undertaken to examine the “system” by which citizens are selected for the “sample” of Federal prison inmates.
Thomas Sowell, in The Quest for Cosmic Justice, gives a list of a number of statistical racial and ethnic disparities for which no reasonable case for discrimination can be made (he also gives another longer such list in The Vision of the Anointed). Here are a few:
- In 1999, 80% of the doughnut shops in California were owned by people of Cambodian ancestry
- In the early 20th century, 80% of the world’s sugar processing machinery was made in Scotland
- As of 1909, Italians in Buenos Aires owned more than twice as many food and drinking establishments as native Argentinians, as well as three times as many shoe stores and 10 times as many barbershops
- In the early 20th century, in the Brazilian state of Rio Grande do Sul, all of the firms in all industries producing trunks, stoves, hats, neckties, paper, leather, soap, glass, watches, beer, confections and carriages were owned by people of German ancestry.
As an explanation for these phenomena, I can’t improve on Sowell’s:
“Why are different groups so disproportionately represented in so many times and places? Perhaps the simplest answer is that there was no reason to have expected them to be statistically similar in the first place. Geographical, historical, demographic, cultural, and other variables make the vision of an even or random distribution of groups one without foundation.”
When a disparity is found that does support a prima facie case of narrative-supported bias, what should be done? Again, disparities are evidence of non-randomness, but usually we didn’t believe that the system as a whole was random anyway. What we can do to (perhaps) locate the bias we now suspect is to analyze the system further, breaking it down into smaller subsystems – examine them to see precisely where the statistical bias is actually introduced, to check whether or not a charge of “racism” or other social bias is actually plausible.
Extended example (sample demographics)
Here is an extended example of what I mean:
It has frequently been noted that civilians shot (or killed, or both, depending on the particular study) by police are “disproportionately” African-American. This is demonstrated in the following sets of demographic ratios: 2016 US population was 61.3% White, 13.4% Black, 17.7% Hispanic, and 6.3% Asian while the 2016 victims of fatal police shootings were 48.5% White, 24.7% Black, 16.9% Hispanic, and 1.6% Asian. Of course, demographic ratios like this depend heavily on the racial/ethnic definitions in use, but these two seem to be fairly close (in particular, in both cases, there is no overlap between the “Hispanic” and the “White” or “Black” categories), so they are roughly comparable. Relative risk, then, for the various demographic categories is the ratio of the two percentages, so that the relative risks are: 0.79 White, 1.84 Black, 0.95 Hispanic, and 0.25 Asian. These numbers are usually reported by comparing the non-white categories to the White category, so that Black relative risk is 2.3 times White relative risk, Hispanic relative risk is 1.2 times White relative risk, and Asian relative risk (usually not reported) is 0.3 times White relative risk.
These numbers (which are indisputable) are then used as evidence that the criminal justice system (particularly the police) in the US is systemically biased against Black and Hispanic civilians. As I have pointed out earlier, these numbers are strong evidence that police do not shoot civilians selected from the general population at random – I think we already knew that, though.
Let’s now examine the “system” that “selects” a civilian to be fatally shot by the police (the wording is obviously extremely unfortunate here). It seems to me that it proceeds in at least three steps:
- civilians selected from the general population are encountered by the police (selection here is not made by the police)
- from the civilians encountered by the police, some are selected to be arrested or stopped (selection here is made by the police)
- from the civilians arrested or stopped by the police, some are fatally shot (selection here is made by the police)
We can reasonably hope that Step 3 could be made free from statistical racial bias. Perhaps Step 2 could also be made free from statistical racial bias, though there is the possibility of some intrinsic bias based on physical descriptions of subjects sought by police (e.g., stops made in the process of looking for a White female perpetrator will likely involve civilians more White and more female than the general population). Step 1 has to do with where police are deployed, as well as population density, so it would seem that this would likely be free of bias, even though not random as a general population sample. In fact, I claim that it makes sense to compare the demographic results of Step 2 and Step 3 not to the demographics of the population as a whole, but rather to the demographics of the population likely to be encountered by police (that is, to an estimate of the output sample of Step 1).
I am not aware of any scholarly attempts to study Steps 1 or 2, but a first cut at a Step 3 analysis was done by Roland Fryer at Harvard and his results show no statistical racial bias at Step 3 (though he did find bias with regard to other police tactics, just none with shooting; his results have, of course, been heavily criticized, since they go against the prevailing cultural narrative, but as far as I’m aware, all the criticisms seem to say that he should have either excluded traffic stops or should have included both Steps 2 and 3 in a monolithic analysis – my view is that both of these criticisms are incorrect if we want to assess carefully what happens at Step 3).
Step 2 looks, to me as a criminal justice outsider, to be a very difficult problem to study. I’m not aware of any uniformly collected nationwide data on stop/arrest demographics, so one would need to (as Fryer did in looking at Step 3) do a deep dive into the records of some particular large police force. This would be a lot of work, for which the result would either be confirmation of what “everyone” believes already, or becoming an academic persona non grata. Not a lot of reward there!
Step 1, on the other hand, looks fairly easy to do, at least at an approximate level. One could take demographic data with some uniform granularity (say, the county level, since that is readily available in the Census) and weight the county demographics by an estimate of the likely number of police-civilian interactions in that county. A reasonable first-order estimate of this would be to use a sort of Brownian motion model, weighting each county’s demographics by the product of its civilian population and its police population, divided by its area (and then, scaled, obviously, by the total “weight”).
I have done this (detailed results available on request) with 2016 data from the US Census and the Bureau of Justice Statistics (particularly, the LEMAS survey of law enforcement agencies to find police force sizes) and the result is (warning – counterintuitive results ahead) that the demographics of the 2016 US population, weighted by expected number of police interactions in the county of residence, are 29.6% White, 22.4% Black, 26.5% Hispanic, and 11.3% Asian. This means that taking Steps 2 and 3 together as a unit, and comparing to this weighted population (Step 1 estimate) gives a relative police shooting risk of 1.6 White, 1.1 Black, 0.6 Hispanic, and 0.14 Asian. In this case, one would not expect anyone to relativize these numbers to the White category, but for completeness, Black relative risk is 0.69 times White risk, Hispanic relative risk is 0.38 times White risk, and Asian relative risk is 0.09 times White risk.
Note carefully what this says, and what it does not say: crime rates do NOT figure into this calculation at all, other than indirectly (that is, one would naïvely assume that police density would be generally higher in higher crime counties), so this has nothing to do with the racial demographics of criminals (or, indeed, of crime victims). Nor does it say that police use of deadly force is racially biased against White people. It only says that the lower relative risk of police shooting that White people have on average is explained (actually over-explained) by where they live – not that they necessarily live in lower crime areas, but that they simply live in areas with lower population density and lower police density, where interactions with the police are less common.
I am aware of vaguely similar analyses being done, comparing police shooting rates to poverty rates and various crime/victim demographics, but to me, those all seem to be adding more possibly confounding variables into the mix. I also admit freely that the granularity of the data I used (county level) is much too large to be truly predictive of police interactions, but it is what is readily available to the public.
Note also that this is a statistical analysis – as such, it provides no information about any particular shooting. It is also not designed to answer the question as to whether or not urban policing is in need of reform, only the question as to whether or not policing as a whole is racially biased in the matter of civilian shootings.
Asymmetry of Conditional Probability
This fallacy is a bit technical to describe, but once you see it, you’ll start seeing it everywhere. I’ll use just a bit of probability notation – if this is unfamiliar, perhaps try here to get a sense of the basic ideas. “Conditional probability” is the notion of the probability of one event, given that another has occurred (or will occur). Specifically, P(B|A) denotes the probability of event B given event A. The principal useful equation is that P(A∩B) = P(B|A) P(A) – in words, “the probability of A and B is equal to the probability of B given A multiplied by the probability of A.” Thus, P(B|A) = P(A∩B) /P(A). Note that this makes sense as long as A is not impossible – the quotient on the right is a smaller probability divided by a greater one. Also note that this doesn’t work if P(A)=0, but then the whole setup is rather vacuous.
Now, note that P(B|A) = P(A∩B) /P(A) while P(A|B) = P(A∩B) /P(B). In particular, the two are NOT equal and, if A and B have very different individual probabilities, the two different conditional probabilities will also be very different from each other. In fact, putting the two together yields Bayes’ Theorem: P(A|B) = P(B|A) [P(A)/P(B)] so that conditional probability is seen to be symmetric ONLY when A and B have the same individual absolute probabilities.
My first example here is potentially offensive, but bear with me: the offense is the fallacy – in 2020 (which I have no reason to believe was atypical in any way), there were 25 incidents of terrorism worldwide in which someone was killed. Of those, 23 were perpetrated by Muslim individuals or groups. The probability (in 2020) that you are a Muslim, given that you are a deadly terrorist, is thus quite high. On the other hand, the probability that you are a deadly terrorist, given that you are a Muslim, is quite low – in fact, pretty much indistinguishable from the probability that any individual is a deadly terrorist.
The fallacious confusion of these two conditional probabilities is the reason for whatever anti-Islamic backlash has followed 9/11 (and I truly don’t know how much there has been – the reporting on this has been absolutely abysmal).
Another example: if you are violently attacked in an urban area of the US, the likelihood that your attacker will be a Black male is statistically rather high (for New York in 2020, around 63% for murder, 66% for robbery, 53% for felonious assault). On the other hand, if you meet a Black male in an urban area of the US, the likelihood that he will attack you is quite low. Don’t confuse these two!
And another: if you are confronted by a member of the KKK, the likelihood that he will be a White male is statistically quite high. On the other hand, if you meet a White male, even in a Southern part of the US, the likelihood that he will be a member of the KKK is quite low. Don’t confuse these two!
Postscript: after this essay was written, I ran across this post by Robert Graboyes making some vaguely similar points. I heartily recommend it (especially his exposition of Simpson’s Paradox).
* Note that this is different from the percentage (13.4%) in the previous section. This highlights a difficulty in dealing with all such demographic ratios: the problem of precise definitions. In this section, I am using “Black or African-American alone” demographics since that most accurately corresponds to the available prison demographics. In the previous section (and the next section), I used “Non-Hispanic Black or African-American alone or in combination” because that more accurately corresponds to the public perception of the individuals involved.