Paternity with many hypotheses

Contents:

Probabilities when there are many hypotheses

References:

Paternity probabilities when there are many hypotheses

Paradoxical (wrong) answer

Suppose two men are each tested for paternity of the same child. It occasionally happens (especially if the men are brothers) that neither man is "excluded".

Maybe the paternity indices are 200 and 500 for the two men. Applying the usual formula for paternity probability, W=PI/(PI+1), gives the result that the first man "has a 99.5% probability of paternity" and the second man "has a 99.8% probability of paternity."

I put "has a ... probability of paternity" in quotes because (a) it is a term commonly used in this situation, so I am quoting, and (b) there is obviously some nonsense hidden here, so I am using quotes of derision.

The lie would be perhaps even more bald if we used more natural English:

The first man is 99.5% to be the father and the second man is 99.8% to be the father of the child.

Unkind words

If you believe that, then there is at least a 99.3% chance that both men are the father. Such a conclusion flies in the face of conventional wisdom about biology. Yet many paternity experts – especially German ones – make such statements all the time. The kindest excuse I can make for them is that, when they use the term "probability" it is a term of art, which is to say that it doesn't mean "probability." That's not a particularly kind excuse, for it omits an explanation of what they do mean. Kinder I won't be, for I don't think they know.

Where does this nonsense come from?

The aforementioned "usual formula for paternity probability, W=PI/(PI+1)" has a built in "prior probability" assumption of 50% – i.e. the assumption that the non-genetic evidence is not only equally balanced between the two contrary hypotheses of paternity and unrelatedness, but that specifically each hypothesis is 50% to be correct.

That's a problem. For simplicity and definiteness, let's assume the situation that the two tested men are brothers. Then there are three possibilities under consideration:

F – the first man is the father
U – the first man is the uncle (because the second man is the father)
Z – the first man is (and therefore both men are) unrelated to the child

The "usual formula" assumes a priori that the first man is 50% to be the father, and 50% to be unrelated. Moreover, the second man is also assumed to be 50% to be the father. 150% is too much.

Reasonable prior probabilities

Whatever you may think about prior probabilities, however they are assigned or derived, the total of any set of probabilities of exclusive events can add at most to 100%. Keeping that in mind we imagine that through some process or other, prior probabilities Rf, Ru, and Rz are assigned to the three hypotheses, such that Rf+Ru+Rz=1. Suppose we consider Rf=1/4, Ru=1/4, Rz=1/2 based on the facts in a particular case.

Reasonable posterior probabilities

Then through genetic testing we compute some likelihood ratios. Normally we would compare F vs Z, getting for example LR(F,Z)=500, and compare U vs Z, gettting LR(U,Z)=100. (Note that it automatically follows that LR(F,U) = 500/100=5, and that LR(Z,F) = 1/LR(F,Z), etc.)

Now, we would like to compute some posterior probabilities Pf, Pu, Pz, where Pf+Pu+Pz=1 also.

Maybe the easiest way to think of the answer is to think of the several LR computations in terms of a "triple ratio":

F U Z

are explained by the data in the proportion 500 : 100 : 1.

It's worth considering carefully what this means. It means that, given F, the chance to see genetic types like those observed is proportional to 500. Given U, the chance is proportional to 100. Etc. What does "proportional" mean in this context? It means that we can replace the 500, 100, and 1 by any other triple of numbers so long as they are in the same ratio.

Hence, the posterior probability Pf is proportional to Rf x 500 = 125.
Pu is proportional to Ru x 100 = 25.
Pz is proportional to Rz = 1 = 1/2.

In other words, Pf=125c, Pu=25c, Pz=0.5c. Since we also know Pf+Pu+Pz=1, we can solve for c and eliminate it.

Pf+Pu+Pz = 150.5 c, or c=1/150.5. Hence the final answer is Pf=125/150.5, Pu=25/150.5, and Pz=0.5/150.5.

Reviewing the above process, the formula turns out to be:

Pf = LR(F,Z) Rf

LR(F,Z)Rf + LR(U,Z)Ru + LR(Z,Z)Rz

and similarly for Pu or Pz. LR(Z,Z)=1 of course.

In the above derivation, I chose the hypotheses Z as "pivotal" in the sense that I compared each of the others to it. Had I chosen F or U as pivotal, the result would have been the same. The only restriction on choice of pivot is that the pivotal hypothesis must be possible, from the genetic evidence. (Otherwise the LR's will mostly be infinite.) There is no reason that I can see that the hypothesis chosen as pivotal must have a prior>0 however.

Forensic mathematics home page