Coherent Analysis

The part of this paper that is really interesting to me is § 8.3 Use of Databases in which they outline how to treat the database itself — the allele reference sample from the population — as part of the evidence.

If a=b=1 we have β(p;1,1) ∝ 1 (uniform distribution).

Ewens gives
f(x)∝(1/x)θ(1−x)^(θ−1), that is
β(x; a=0, b=θ)
for the prior probability distribution of allele frequencies under the infinite alleles model.

In 1999 I presented the combination of this with Dawid & Mortera's formula above as a tenative solution to the matching probability for a rare haplotype. Assuming the haplotype occurs r=0 times in the database, the matching probability for an innocent suspect is 1/(n+1+θ). There are various ways to estimate θ, such as one less than the reciprocal of the empirical pairwise matching rate (hence θ≈9000 for US Caucasians). However while the idea (of adopting a β prior based on Ewens' result) is elegant I couldn't validate the result (because of the ideal assumptions) so chose not to recommend it in publication for actual court use.

Brenner's law is Ewens' distribution with θ=1, hence a=0 and b=1. I use the symbol k for D&M's r, hence for STR matching probabilities we get (k+1)/(n+2) — close to but not identical with the intuitive (k+1)/(n+1) recommendation.

Actually, for STRs it makes some kind of sense to take θ+1≈5, suggesting a matching probability of (k+1)/(n+5). However, note that typically k≈n/10 so ±4 in the denominator is as insignificant as ±½ in the numerator.

Balding proposes (k+2)/(n+4) which is consistent with a different model, namely that the set of allele frequencies are uniform in the sense of a Dirichlet distribution. This formula is reasonable. From my analysis of forensic STR data the actual tendency toward rare alleles in nature is somewhat greater than predicted by the uniform Dirichlet, so probably k+2 is slightly too large; maybe k+1.5 would be closer on average.

Dawid & Mortera's paper "Coherent Analysis"

Extract from the paper

Applications and comments on the β distribution