|Suppose you collect a sample of 11-locus Y-chromosome haplotypes
from 100 men and every haplotype in the sample is
different. We know that the various loci are linked, so it
would not be valid to estimate the population frequency of a
haplotype by assuming the "multiplication rule" most haplotypes
figure to be more common than the product of the frequencies of the
alleles at each locus.
Since multiplication is out, what is a good guess as to
the frequency of each haplotype? 1/100?
No, 1/100 is a crazy guess. If I guess that every haplotype has a
frequency of about 1/100, then I am guessing that almost all the men
in the population are one of these 100 types. That's possible, but if
it were so, what is the chance to obtain a database such as I have
with each haplotype seen only once? It is
Do I really believe I am that lucky? The frequency of each
haplotype has to be something like 1/1002 in order to make it
reasonably likely that 100 random men will have no duplications. In
I think it is reasonable to guess that any particular haplotype
is at least as rare as 1/5000 (and the sky's the limit).
|Suppose instead that you sample 100 men, determine 9-locus
Y-chromosomal haplotypes, and there are some duplicates. However
70 haplotypes occur exactly once.
Pick any one of the unique haplotypes at random. What is the
probability that the next man you see has the same haplotype?
|About (ln 100/70)/99, or 1/278 is a reasonable answer in the sense of
First remark: This is an unpublished, and as far as I know, original result.
Please give a citation if you use it.
Second remark: Can you prove it? (I have a mildly implausible derivation,
but not a proof.)
Third remark (April, 2002): I can prove it.
Comments? Questions? Disputes?
Links: Forensic mathematics home page.
Posers in forensic mathematics.