# Dawid & Mortera's paper "Coherent Analysis"

## Extract from the paper

The part of this paper that is really interesting to me is § 8.3 Use of Databases in which they outline how to treat the database itself — the allele reference sample from the population — as part of the evidence.

Consider, for example, a simple model for a homogeneous population, with the (χi) initially exchangable, having de Finetti representation ... with β(a, b) prior:

dF(p) ∝ pa-1 (1-p)b-1dp.

Suppose that the database δ is of size n, containing r instances of [the crime scene type]. There is also the finding on the suspect [having the crime scene type] so ...

[matching probability]=(r+1+a)/(n+1+a+b) .

For small a and b, this is approximately equivalent to adding the suspect to the database and using (approximately) relative frequency estimates. For large r and n, the effect of conditioning on the suspect becomes unimportant.

## Applications and comments on the β distribution

Given that

β(p;a,b) ∝ pa−1(1−p)b−1

• If a=b=1 we have β(p;1,1) ∝ 1 (uniform distribution).

• Ewens gives
f(x)∝(1/x)θ(1−x)(θ−1), that is
β(x; a=0, b)
for the prior probability distribution of allele frequencies under the infinite alleles model.

In 1999 I presented the combination of this with Dawid & Mortera's formula above as a tenative solution to the matching probability for a rare haplotype. Assuming the haplotype occurs r=0 times in the database, the matching probability for an innocent suspect is 1/(n+1+θ). There are various ways to estimate θ, such as one less than the reciprocal of the empirical pairwise matching rate (hence θ≈9000 for US Caucasians). However while the idea (of adopting a β prior based on Ewens' result) is elegant I couldn't validate the result (because of the ideal assumptions) so chose not to recommend it in publication for actual court use.

• Brenner's law is Ewens' distribution with θ=1, hence a=0 and b=1. I use the symbol k for D&M's r, hence for STR matching probabilities we get (k+1)/(n+2) — close to but not identical with the intuitive (k+1)/(n+1) recommendation.

• Actually, for STRs it makes some kind of sense to take θ+1≈5, suggesting a matching probability of (k+1)/(n+5). However, note that typically kn/10 so ±4 in the denominator is as insignificant as ±½ in the numerator.

• Balding proposes (k+2)/(n+4) which is consistent with a different model, namely that the set of allele frequencies are uniform in the sense of a Dirichlet distribution. This formula is reasonable. From my analysis of forensic STR data the actual tendency toward rare alleles in nature is somewhat greater than predicted by the uniform Dirichlet, so probably k+2 is slightly too large; maybe k+1.5 would be closer on average.

Return to home page of Charles H. Brenner 