Proof of a Mixed Stain Formula of Weir

Charles H. Brenner

Abstract

The genetic markers ("alleles") of an evidence stain may be identical to the alleles of a reference sample (such as a suspect for example). The likelihood ratio for the evidentiary strength favoring association is then simply the inverse of the profile frequency. However the evidence stain is often complicated by the presence of additional alleles, variously from additional known or unknown suspects or victims. The likelihood ratio is clearly more complicated in such cases, but Weir et al (1996) presents without proof a general and elegant formula for the probabilities that occur as numerator or denominator. In this paper we give a proof.

Key words: forensic science, mixed stains, DNA profiles, likelihood ratios

Notation

(Not all symbols are properly translated & proofread yet)

Let E be the set of alleles observed in the evidence for some discrete-allele system. Of these some may be attributable to known parties; the remainder UE are to be explained by x people with two alleles (not necessarily distinct) each. Explained means that U X E, where X is the set of all alleles in the x people. More generally for a subset SU, we will say that people with alleles X exactly explain S if XE and X CAP U = S, or equivalently, putting W=U\S, if U\W X E\W.

The set-notation symbols are to be understood according to the standard conventions: UE means that the alleles U are among those of E, including the possibility that U=E. X CAP U is the intersection -- the set of alleles that are both in X and in U. U\S is the set difference -- the set of alleles that are in U excluding those that are also in S. The cardinality (size, in number of alleles) of a set J is written |J|. The symbol epsilon denotes membership; j epsilon U means that j is an allele of the set U.

Following Weir et al we write P_x(U|E) for the probability that x random people explain U. Let J, JU be a set of alleles. We will be interested in sets of people who omit the set J. Let

(1) Tm_J = the total of the frequencies of the alleles in E\J.

Theorem

Weir discovered that

(2)

but did not supply a proof.

Proof

A proof seems worthwhile. The general idea is clear enough -- (2) is an instance of the principle of inclusion and exclusion (Hall, 1967). From the definition (1) and the assumption of a discrete allele system, is the probability that x people's alleles are all in E\J. As the basis for the inclusion-exclusion analysis, we note that

(3)

because any set of people whose alleles are among E\J exactly explains some one and only one subset, U\W, of U\J. The summation is taken over all sets of alleles W that satisfy JWU. Introduction of the sets W, as a means of effectively classifying the various positive and negative contributions to the sum in (2), is the key idea in the proof.

Define

(4)

In this notation, (2) is expressed as

(5) P_x(U|E) = Q₀ - Q₁ +Q₂ - + . . ..

Summing (3) for fixed m over all sets JU of cardinality m we obtain from (4)

(6) Q_m =

(7) =

(8) =

(9) = .

On the right hand side of line (6) each W occurs many times, once for each J of which it is a superset. The object is to count how many times. Classifying the W's according to their size k on line (7) we see on (8) that it is the same as the number of m allele subsets of a k-set, which is exactly the definition of the binomial symbol . Hence line (9).

To verify (5) form now the alternating sum over m, where the sum runs to n=|U|,

Q₀ - Q₁ + Q₂ - + . . . =

(10) =

(11) = .

In line (10) shows the same set W may occur in several Q_m terms. To compute the net contribution due to each W, it is natural to reverse the order of summation so that the classification is on W first and then on m, which is formula (11). To verify the transition from (10) to (11) note that the index sets of the double summations and range over the same pairs (k, m) -- namely the triangular array where 0 <= m& <= k <= n.

Hence the net number of times that a contribution from each set W is included and excluded is given by the last factor in (11). That factor is simply unity when k=0, and when k>0 is it even simpler, for by the binomial theorem = (1-1)^k = 0. So

Q₀ - Q1 + Q₂ - + . . . =

= , Q.E.D.

References

Hall M. Combinatorial Theory. New York: John Wiley & Sons, 1967.

Weir BS, Triggs CM, Starling L, Stowell LI, Walsh KAJ, Buckleton J. (1997) Interpreting DNA mixtures. J For Sci. 42(2):..-220