Haplotype frequency poser

Haplotype frequencies

The problem:

Suppose you collect a sample of 11-locus Y-chromosome haplotypes from 100 men and every haplotype in the sample is different. We know that the various loci are linked, so it would not be valid to estimate the population frequency of a haplotype by assuming the "multiplication rule" – most haplotypes figure to be more common than the product of the frequencies of the alleles at each locus.

Since multiplication is out, what is a good guess as to the frequency of each haplotype? 1/100?

No, 1/100 is a crazy guess. If I guess that every haplotype has a frequency of about 1/100, then I am guessing that almost all the men in the population are one of these 100 types. That's possible, but if it were so, what is the chance to obtain a database such as I have with each haplotype seen only once? It is

Do I really believe I am that lucky? The frequency of each haplotype has to be something like 1/100² in order to make it reasonably likely that 100 random men will have no duplications. In particular,

I think it is reasonable to guess that any particular haplotype is at least as rare as 1/5000 (and the sky's the limit).

harder problem:
Suppose instead that you sample 100 men, determine 9-locus Y-chromosomal haplotypes, and there are some duplicates. However 70 haplotypes occur exactly once.
Pick any one of the unique haplotypes at random. What is the probability that the next man you see has the same haplotype?

answer:
About (ln 100/70)/99, or 1/278 is a reasonable answer in the sense of maximum likelihood.
First remark: This is an unpublished, and as far as I know, original result. Please give a citation if you use it.
Second remark: Can you prove it? (I have a mildly implausible derivation, but not a proof.)
Third remark (April, 2002): I can prove it.

Comments? Questions? Disputes?

Links: Forensic mathematics home page. Posers in forensic mathematics.