Back when I was a statistics major in college, in probability class, we talked about false positives. I found the results vaguely disturbing, even though issues with false positives haven’t personally affected me. At least not yet. The question is stated roughly like this:
Suppose that a rare disease affects 1% of the population, and there’s a blood test that can determine whether a person is affected by the disease or not. The test isn’t completely accurate; only 95% of people who actually have the disease test positive, and 2% of people without the disease also test positive. If a randomly chosen person’s blood test comes back positive, what’s the probability that the person actually has the disease?
Intuitively, you would think “pretty low,” given that the test seems pretty accurate. But since there’s numbers involved, it’s probably better to actually do some calculations. At least, that’s what they taught us in school.
Let D be the event that the person has the disease, and let T be the event that the test comes back positive. From the problem, we know that P(D) = 0.01, P(T|D) = 0.95, and P(T|~D) = 0.02. We want to find P(D|T). Using some basic rules of conditional probability, P(D|T) = P(T|D) * P(D) / P(T) = P(T|D) * P(D) / [P(T|D) * P(D) + P(T|~D) * P(~D)] = 0.95 * 0.01 / [0.95 * 0.01 + 0.02 * (1 – 0.01)] = 0.324. So there’s only about a one-in-three chance that a positive test result from a randomly chosen individual belongs to a person who actually has the disease.
The intuitive explanation for this is that the disease is so rare that the number of false positives coming from the large proportion of the population that doesn’t have the disease is comparable to the number of true positives coming from the small number of people who actually have the disease. There was something vaguely unsettling about this, that something that seemed so accurate on the surface actually produced such untrustworthy positive results. But the math shows it happens!