# What's wrong with the "exclusion probability"

revised and extended 4 November 1997
definition
formulas

Forensic mathematics index
Likelihood ratio
Mixed stain exclusion

## The paternity situation

### a partial story

#### Information in, and not in, A

I'll try to explain why the likelihood ratio L is the right statistic to give in a paternity case, and the power of exclusion should not be given in addition or instead.

A = Probability of exclusion

(the probability, given the mother and child results, that a non-father would be excluded from paternity by this set of tests)

W = Probability of paternity
L = Paternity index = X/Y.
X = probability to see the M, C, and AF types assuming paternity.
Y = probability to see the M, C, and AF types assuming non-paternity.

For the sake of argument I will assume a 50% prior probability of paternity. On another day I would argue against that assumption, or any assumption, but for today it will make the discussion simpler, and will not do any harm.

Under that assumption, W=L/(L+1) and L=W/(1-W). Either of W or L can be computed from the other one. Thus, they convey the same information. So, just for today, I won't argue that one is better or more appropriate than the other.

On the other hand, A contains less information than W or L. For a given paternity case, it would be silly to give A when L is available (and it is).

I claim that A contains less information. Let me explain exactly how. Let's assume that the man is not excluded -- otherwise there is no need for statistics at all.

Usually the "evidence" means:

1. blood types for M
2. blood types for C
3. blood types for AF.

From this information, anybody can easily infer

4. AF is not excluded.

The point is that L is a summary of the information in 1, 2, and 3, (and therefore also includes 4) whereas A is a summary of the weaker information only of 1, 2, and 4. (4 is a little bit weaker mostly because it doesn't tell whether the man has one or two possibly paternal alleles in common with the child.)

#### L and W from exclusion: LA and WA

Suppose I am the lab director, and my assistant who does the lab work refuses to tell me 3, but only gives 1, 2, and 4. Under this limited definition of "evidence" I could compute a likelihood ratio LA and a probability of paternity WA, which turns out (see Morris in Walker 1983) to be:
 LA = 1/(1-A), WA = LA/(1+LA) = 1/(2-A). note >>>> Note: WA is normally about equal to W, since they estimate the same thing (but from slightly different versions of the evidence). And a little algebra shows that WA = 1/(2-A) = A + (1-A)2 / (2-A), which shows that WA is also about equal to A. This explains why W and A are about the same.

If I have no choice, then I will report WA or LA as my statistical summary of the evidence (or I can report A, which has the same information).

But, if my lab tech later relents and tells me the whole story, then of course I should make the best computation I can with the increased evidence, and that is W or L. I would not also include WA or LA or A in the report, for the same reason that the company financial officer, when he gives the annual financial statement, would not include an earlier draft version that he prepared based on tentative and incomplete information.

#### Another analogy

Or another analogy: Suppose that I want to know how high a person can reach. Suppose there are two measurements on each person:

(i) The height of the person from ground to shoulder.
(ii) The length of the person's arm.

So the total, (i)+(ii), tells me how high the person can reach. This is the best statistic, and is analogous to L or W. The first statistic, (i), by itself may be helpful -- it is like A. But if you can know (i)+(ii) there is no advantage at all in also knowing (i).

### dishonest statistic

By careless use of language, people often refer to a test result as an "exclusion" if the result is inconsistent with paternity.

This is careless because obviously excluding the man from paternity is a decision that is made by people on the basis of evidence; it isn't the evidence itself.

The distinction is material, not just semantics. Because of the possibility of a mutation, most laboratories won't issue an opinion of "paternity excluded" unless there are at least two (typically – but correct point of view is of course likelihood ratios, not counting "exclusions" at all) tests that have results inconsistent with paternity.

For that reason, the statistic as normally quoted is dishonest. When a lab claims

The exclusion chance is 99.8%
they have invariably made a calculation that reflects the possibly of one or more inconsistent results. The true chance to exclude is smaller because, in at least some of the cases with only one inconsistent result, they would in fact not issue an opinion of "exclusion."

### unreal

The "exclusion" statistic is also an unreal and artificial statistic in that it has at least one bizarre and unreasonable property, as follows.

Suppose the child and mother are both type Q, and consider the possibility that an alleged father is type R. In the old days we used to call this situation an "indirect exclusion" or "apparent opposite homozygosity." The word "apparent" expresses the idea that there may conceivably be a "blank" or silent or unobservable allele O in the genetic system under consideration.

Note that, depending on whether or not you believe in the existence of blank alleles, there are two different formulas for the "probability of exclusion":
no blank allele1 - q2-2q(1-q), or equivalently, 1 - q(2-q) note just below
blank allele1 - h - 2q(1-q)
 Note: Incidentally, whenever the paternal allele is Q, even when the child is heterozygous, the formula is the same. When the paternal allele is ambiguous (child is PQ and mother either is PQ as well, or is not typed), the formula would be unchanged provided that q be taken to be the total frequency of the two alleles.

Now, what is peculiar about the above situation is that the actual frequency of the blank allele occurs nowhere in either formula. If you believe there is no blank allele at all, you use the first formula. If you get word that there is even a single blank allele in the world, even two continents away, then you switch to the second formula.

Consequently, the value of A is a discontinuous function of the blank allele frequency. This is an ordinary situation in pure mathematics, but is an impossible situation in nature. Nature is never discontinuous.

Therefore the "exclusion probability" is not a description of nature.

## Other situations

The ideas above are pretty old hat, which I understood perfectly well in 1982 (Brenner 1983). In recent years several sources, such as NRC I, the court in the OJ Simpson case, the FBI, and R. Chakraborty, have recommended using the probability of exclusion to summarize the evidence in such cases. I am a little embarrassed to admit that I didn't realize immediately that the issues and reasoning in the mixed stain situation are exactly the same as the familiar reasoning for the paternity situation that I have described above, and that therefore it is a bad idea for exactly the same reasons.

Here's a comparison of various kinds of problems, and the inappropriateness of using the exclusion probability in each case:

• In a simple stain case, the exclusion probability and the likelihood ratio give exactly the same information.almost
• In a trio paternity case, discussed above, the exclusion probability gives less information, but to tell the truth not very much less.
• In a still more complicated situation -- a mixed stain case -- the exclusion probability usually discards lots of information compared to the correct, likelihood ratio, approach. But still the exclusion probability may be acceptable sometimes.
• In a very complicated situation, a kinship case where not very many people are typed, the likelihood ratio method is often the only way. In many such cases the exclusion probability is 0, so people who make the mistake of thinking from the "exclusion" point of view are often completely stuck and don't understand how to assess the evidence at all.

Not exactly accurate, I guess, because as normally defined a homozygous Q person would not be "excluded" from a QR stain.