How many SNP's equal one STR? one RFLP?
Without population data
- Why is a database of size N=0 big enough?
- Database of size N=0 – some details

The Power of SNP's – Even Without Population Data

Poster presentation at the 10th Promega Symposium on Human Identification, Orlando, Florida, 29 September 1999
CH Brenner, Consulting in forensic mathematics,
1999: Department of Genetics, Univ of Leicester, UK
2000: Berkeley, California, USA

Feel free to link to this page, but please do not reproduce this material without permission of the author.

How many SNP's equal one STR? one RFLP?

Introduction

Single nucleotide polymorphisms (SNP) have the potential to be just as discriminating as loci of high polymorphism such as VNTR-RFLP or MVR systems – you just need more of them. This is obviously true for forensic identification. It is also true for more complex problems such as deciding sibling-ship or drawing inferences from mixed stains.

So there is a tradeoff possible:

a handful of highly polymorphic systems, or
a larger number of less-polymorphic systems.

What is the rate of tradeoff? How many SNP's per STR? per VNTR-RFLP?

Discussion

The rate of tradeoff is not simple.

The tradeoff is different for different kinds of casework – i.e. stain matching, paternity, mixed stain, kinship ...
The tradeoff is sometimes different for proving, than for disproving.
The tradeoff depends somewhat on ones choice of criteria. (The " typical likelihood ratio" is not always a possible choice.)

INCIDENTAL COMMENT

SNP's can be valuable in casework even without population data.

Methods and ground rules

The method of analysis is exact calculation, using an idealized model of loci, to wit:

Every allele at a locus is assumed to be equally frequent.
A "k-locus" means a locus with k equally-frequent alleles, so

SNP	=2-locus (optimistically assuming heterozygosity=1/2)
STR	is roughly equivalent to a 4-locus or a 5-locus.
VNTR-RFLP	is modeled by 12<k<100 roughly. k=40 typical.

Likelihood ratios are used of course for the analysis. The "typical" likelihood ratio is defined as a geometric-mean-value – i.e. not the arithmetic mean (which is not sensible) – but as an average taken in the multiplicative sense.

FORENSIC TRADEOFF

From the graph (triangle heights) we have tradeoff rates

SNP :	STR :	VNTR-RFLP
1 :	2.6 :	6.4

for the stain matching ("forensic") problem. It takes 6.4 SNP's (or 6.4/2.6=2.5 STR's) to equal the power of one VNTR-RFLP.

PATERNITY TRADEOFF

Paternity casework gives a different set of ratios:

SNP :	STR :	VNTR-RFLP
1 :	4 :	12

To replace three VNTR-RFLP's, you need more than 30 SNP's.

MORAL

Suppose a laboratory does both kinds of casework, at present using a battery of non-SNP markers. They consider switching to a battery of SNP's. They determine that the SNP battery will be equal to what they now get for forensic work. Then, they will be losing performance on their paternity work. Paternity is "harder" than forensic work for SNP's.

SNP's per k-locus, various problems

Different Problem, Different Tradeoff

# of SNP's equivalent to a k-locus
	Number of (equi-frequent) alleles per locus

TRADEOFF CHART From the graph we have tradeoff rates
	k=2 SNP :	k=5 STR :	k=40 VNTR-RFLP
mix2	1 :	6.5 :	27
mixed	1 :	7 :	23
no ma	1 :	3.5 :	14
trio	1 :	4 :	12
sib	1 :	3.3 :	11
forensic	1 :	2.6 :	6.4
non-sib	1 :	2.7 :	5.1

Absolute difficulty of various problems; loci needed for likelihood ratio=1000 Different question, different order
expected # of loci needed for LR=1000
	Number of (equi-frequent) alleles/locus

Absolute difficulty of various problems; loci needed for likelihood ratio=1000

Different question, different order

expected # of loci needed for LR=1000

Number of (equi-frequent) alleles/locus

Summary – Power of SNP's

The tradeoff – amount of polymorphism vs number of loci – depends on several things, especially the type of problem (forensic, paternity, sibship, mixture).
Mixed stain problems, and to a lesser extent motherless paternity, are "relatively hard" for SNP's – a lot of SNP's per STR for equivalent performance. But no problem is impossible. Simple stain matching is not the "relatively easiest" problem for SNP's; disproving sibship ("non-sib") is easier.
The problems that are "relatively hard" are not necessarily the ones that are "hard" in the sense of requiring many loci. Paternity and sibling problems are "relatively easier" than mixed stain problems, but they are "harder" in the sense of requiring more loci for equivalent likelihood ratio.

Without Population Data

Why is a database of size N=0 big enough?

because you can afford to have a lot of SNP's.

Suppose we have a panel of 100 SNP's for stain matching. Genotypes are AA, AB, and BB. Suppose a crime is committed in a possibly highly inbred population for which there are no population statistics.

Nonetheless, it is reasonable to hope (not assume!) that the people are not genetic clones, and that the allele frequencies will generally be in the 30-70% range since the loci were presumably screened for polymorphism in some population.

Moreover, we assume that the loci have been confirmed to be selectively neutral, and unlinked.

A 100-locus genotype, matching between suspect and crime stain looks like

Ignore all but the heterozygous loci.

Every heterozygous locus contributes a likelihood ratio > 2 even in a substructured population.

If there are 24 heterozygous loci among the 100, the matching odds will be > 2²⁴ or >10 million – practically definitive. Even if only 10 loci are heterogeneous, the matching odds are certainly >1000.

Database of size N=0 – some details

(1) Every heterozygous locus contributes a likelihood ratio > 2

[I originally discussed this idea in 1997 under the title "The littlest database".] even in a substructured population. Proof: Let p and q=1-p be the allele frequencies. The proportion H of heterozygotes is 2pq if genes flow freely in the population, otherwise even less. So H < 2pq = 2(0.5-(0.5-p))(0.5+(0.5-p)) = 2[(0.5)²-(0.5-p)²] < 0.5, thus 1/H > 2, Q.E.D.

(2) 24 heterozygous loci

If all loci have allele frequencies ¼ and ¾, in 99.9% of cases there will be >23 heterozygous loci out of 100. Even if the allele frequencies are 0.1 and 0.9, it is 99% to have at least 10 heterozygous loci.

top of SNP Power page

This work was partially supported by table legs and a chair.

forensic	simple stain matching
mixed	mixture of victim and suspect
mix2	mixture of suspect and an unknown
trio	true paternity situation: mother, child, and father
no ma	true paternity case with mother not tested
sib	full sibs present; distinguish them from half-sibs
non-sib	half-sibs present; distinguish them from full-sibs