- Popularities
- Brenner's Law
- "Infinite alleles" means that mutation is always to a new type; STR mutations are nearly always to an already existing type.
- STR mutation rates are about 1/350 per meiosis, not zero.
- Real populations grow and have immigration.
- Consequences of Brenner's Law
- Allele frequency spectrum
- Two ways to randomly sample allelic types
- Randomly choose an allelic type
- Randomly choose a chromosome

Allele size | count or popularity |
---|---|

8 | 18 |

9 | 16 |

10 | 28 |

11 | 21 |

12 | 14 |

13 | 2 |

14 | 1 |

The table at the left is a typical reference sample for some STR locus. It shows allele size and number (count) of observations of the allele. Let's just focus on the count or popularity column, and suppose we examine many such tables. What will be the most popular number to appear as a count?

m=multiplicity of p within database |
# α_{p} of databases with
m types of popularity (count) p |
fraction of databases with m types of popularity p | ||||
---|---|---|---|---|---|---|

p=1 (singletons) |
p=2 (doubletons) |
p=3 (tripletons) |
p=1 |
p=2 |
p=3 | |

0 | 296 | 467 | 547 | 0.37 | 0.58 | 0.68 |

1 | 282 | 237 | 213 | 0.35 | 0.3 | 0.27 |

2 | 121 | 75 | 35 | 0.15 | 0.09 | 0.04 |

3 | 48 | 14 | 4 | 0.06 | 0.02 | 0.005 |

4 | 31 | 6 | 2 | 0.04 | 0.007 | 0.002 |

5 | 11 | 1 | 0.01 | 0.001 | ||

6 | 4 | 1 | 0.005 | 0.001 | ||

7 | 5 | 0.006 | ||||

8 | 3 | 0.004 | ||||

α_{p}=total p counts across 801 databases |
α_{1}=930 |
α_{2}=464 |
α_{3}=303 |
1.0 | 1.0 | 1.0 |

pα_{p}=total chromosomes accounted for
| 1·α_{1}=930 |
2·α_{2}=928 |
3·α_{3}=909 |
← equal under the Law |

I examined a large collection of mostly published STR reference "databases"
or population samples of moderate size.
I tabulated 801 of them each having
from 100 to 1000 chromosomes (observations). Singletons
— allelic types with
a count of one — are by a large margin the most popular;
occurring in total 930 times among the 801 databases.
63% had one or more once-observed
allelic types or "singletons". On average there were 1.16 singletons per
database. I suggest the word popularity for the number of
times something has occurred. A *singleton* means an allelic
type of count or *popularity* p=1 in a database.
If we denote by α_{p}
the number of allelic types of popularity *p* found in the dataset, then
we can say that α_{1}=930 is the popularity of singletons, and that
singletons are very popular.
Obviously these 930 singletons represent 930 (fragments of) chromosomes.

Doubletons — types of count *p*=2 — had a total
popularity of α_{2}=464 among the 801 databases. Since each doubleton represents
two observations, in total they account for 2·α_{2}=928
chromosomes, nearly the same
as the singletons. And the 303 tripletons
represent a similar number, 3·α_{3}=909, of total observations.

All of which suggests Brenner's Law, the rule that

The number of

p·α_{p}of alleles represented by database popularity p is constant over p.

How well does it hold up? Look at the dotted line in the image at right. It's not highly accurate; let's call it a rule of thumb. It's moderately supported by the data shown, but it is also suggested by more than the data here presented. I did an earlier study based on RFLP markers; they conform more closely. Most importantly there is a theoretical underpinning. In fact I first investigated this distribution to compare STR markers with Ewens's sampling distribution for the ideal situation of "infinite alleles." Brenner's Law follows from Ewens' formula in the limit as the mutation rate goes to zero. Of course STRs violate all of the assumptions of the infinite alleles model with 0 mutation —

so we cannot expect accuracy. But the main reason of the above that the data doesn't conform to the Law is #1. The fact of convergent mutation for STRs is an influence towards common types. Point #2 compensates somewhat. A very high mutation rate, such as exists for Y-haplotypes, discourages common types.

The general point is that nature strongly favors rare alleles.

Brenner's Law is an observation about the comparative prevalence of rare and of common forensic STR allelic variants.

It is an example of a
"frequency spectrum" — the distribution of
frequencies that we can expect nature to deal to us. It says that for any given locus,
population, and small frequency range *f*±ε, the probabity that
an allelic type with frequency in the range
*f*±ε exists, is double the probability that an allelic type with
frequency in the range 2*f*±ε exists, and so on.

A frequency spectrum is thus a prior probability distribution for allele frequencies, and it can be used to impute a match probability via Bayes' theorem.

One way is to list all the allelic types, for example those that are represented one or more times in a sample, then select at random from the list.

Under this sampling rule a singleton type in the sample has the same chance
to be chosen as a common type. Hence this can be thought of as sampling according
to the distribution of α_{p}. This is the sampling experiment
I have in mind when I say that the most likely allelic popularity is singletons,
and that rare alleles are prevalent.

Assuming Brenner's Law, this sampling will choose a singleton twice as frequently as a doubleton, thrice as frequently as a tripleton, and so on.

Another way to think of randomly sampling is to randomly choose a chromosome
from all those counted in compiling the population sample. Brenner's Law
predicts that if we sample by this method then ask for the popularity *p*
of the allelic type thus obtained in its database, all choices for *p* are equally
likely.