# Careful Formulation of a Likelihood Ratio Statement based on Anecdotal Evidence

1. The problem
2. Analysis
3. Exercises
4. LR shortfall or requisite prior

## The problem

### Background

A DNA profile as evidence linking two things (i.e. suspect to crime scene) is a situation well suited to formulation of a likelihood ratio (LR) because the DNA profile is
1. well defined — we know what the profile is and hence what a "match" means, and
2. nicely quantifiable — most importantly we can compute with reasonable accuracy the probability of the random occurrence of a profile
Other kinds of evidence are not so nice. Especially anecdotal evidence – non-scientific evidence – is likely to be deficient with respect to both of the above criteria. Still, I think it can sometimes be usefully treated in a Bayesian way. Inevitably we will have to guess the strength of the evidence partly through intuition. But still there's scope for exacting analysis. Namely, it's important to formulate accurately the questions to be answered, so that at least we apply our subjective intuition to the correct questions.

#### ProBusqueda

In our ProBusqueda work, we have many cases of young adults who were lost to their families in the chaos of military fighting in El Salvador, lost their birth identities (BI), were adopted, and now as young adults (YA) wish to reconnect with their birth family. Sometimes DNA is sufficient to make a confident link between YA and BI, but sometimes it falls short. Always there is other connecting evidence of various kinds in which case we would like to do the impossible and quantify it in order to make a good decision as to whether the non-DNA evidence is sufficient to make up the shortfall between the DNA evidence and confident identification.

### (anecdotal) example

Family of birth name/missing child "Nestor" mentions neck scars from vaccination & insect bites.

## Analysis

### What is the evidence?

Is the evidence that
YA Carter and BI Nestor both have scars?

Let's be more precise. The evidence is that

YA Carter has scars and relatives report that the infant BI Nestor had scars.

More carefully, let's describe the evidence as EC & EN, where
 EC = YA Carter has scars of a certain description S. EN = BI Nestor's family describes scars of a certain description S' (similar to but certainly not identical description to S)

### What is the LR?

Then the LR = X/Y where

X = Pr(EC & EN | Carter is Nestor)
Y = Pr(EC & EN | Carter is not Nestor).

### Do the math

 ... doesn't depend on the relationship The way people describe Carter is not biased by his true history (especially if unknown).

Note: EC doesn't depend on the relationship, so Pr(EC | Carter is Nestor) = Pr(EC | Carter is not Nestor) = Pr(EC).

Consider X. There are two ways to apply the identity Pr(F & G) = Pr(F)Pr(G|F). In this case I choose an adult-centric formulation by letting EC play the role of F:
X= Pr(EC & EN | Carter is Nestor)
= Pr(EC | Carter is Nestor) Pr(EN | Carter is Nestor & EC)
= Pr(EC) Pr(EN | Carter is Nestor & EC).
(adult-centric in that the probability of the YA appearance is evaluated unconditionally — Pr(EC) — and the probability of the BI appearance is considered conditionally on the YA appearance)

And Y:
Y = Pr(EC & EN | Carter is not Nestor)
= Pr(EC | Carter is not Nestor) Pr(EN | Carter is not Nestor & EC)
= Pr(EC) Pr(EN | Carter is not Nestor & EC).

So LR=Pr(EN | N*) / Pr(EN | ~N*), where
N* = Nestor is a childhood version of Carter, a person who bears scars S.

 ... apply the identity The mathematical rule for the probability of a conjunction in terms of the probabilities of the constituent events. Example: The probability that a person is F=over fifty and G=unemployed can be computed if you know Pr(F) = proportion of people over fifty Pr(G|F) = proportion of unemployed among those over fifty. Now just multiply. Alternatively, it could be calculated from Pr(G) = proportion of unemployed Pr(F|G) = proportion of over-fifties among those unemployed. Now multiply.

### Conclusion

Let's rephrase:
 Evidence EN = Relatives remember and report scars S' for the child Nestor hypothesis H1 = the child is the same person as the adult who now has scars S hypothesis H0 = the child is random person LR = Pr(EN | H1) / Pr(EN | H0).
 or in words: How many times more probable is EN when H1 is true, than when H0 is true?

#### Interpretation

The LR is thus a comparison of these two probabilities:
 Pr(EN | H1) = The probability that relatives of a person with scars S would report childhood scars S' versus Pr(EN | H0) = The probability that relatives of some random infant would recall childhood scars S'.
No doubt the first probability is larger than the second one, hence LR>1. But unless the scars are fairly striking – obviously unusual – or the coincidence between the descriptions S and S' very strong, I would be reluctant to conclude that LR is very large.

#### An issue of bias

It's not clear, from the story I've presented, how the testimony from the relatives came about. It could be that the relatives spoke of the scars S' after being prompted in some way. That is, suppose that the actual evidence is a biased version EN' of EN:
Evidence EN' = Relatives remember and report scars S' for the child Nestor after being given a suggestion in the form of information or a picture of Carter.
It may be that the bias makes the denominator much larger, i.e. that Pr(EN' | H0) >> Pr(EN | H0). (Maybe numerator too, but probably less so.) It may be that a very high percentage of children might be described as having scars if the survey is in the form of a leading question. The way the information is collected may have a big effect and consequently the actual LR from the anecdote may be not very big, not very helpful in concluding identity.

## Exercises

### Age coincidence

As an exercise, consider the following data:
• EN: BI Nestor was born in 1977 and disappeared in 1982.
• EC: YA Carter's belief that he was born about 1978 and was adopted in 1982
Hypothesis H1 = the child is the same person as the adult who believes EC
Hypothesis H0 = the child is random person

We have some survey data based on 698 missing children:

• 19 children aged 5 disappeared in 1982. (That's 1 in 36 of the missing.)
• 58 children aged 4–6 disappeared in 1982. (1 in 12 of the missing.)
• 114 children aged 4–6 disappeared in 1981-1983. (1 in 6 of the missing.)

 ... are represented by some part of the sample of 698 "Represented" doesn't mean Carter would be among the 698 in the sample, but only that, as the sample represents the 30,000 or so missing children, the missing children with Carter's age and time of disappearance are proportionally represented in it.
If Carter is not Nestor, we can assume that his possible birth identities are represented by some part of the sample of 698. Depending on how much vagueness about dates and ages we accept, there are from 19 to 114 out of those 698 that are consistent with his belief. Therefore the LR supporting identity based on this data is

6 ≤ LR ≤ 36.

## LR shortfall or requisite prior

Let's consider the above analyses in this context:
• We estimate 30000 missing children altogether. Therefore we start with a "baseline prior probability" of 1/30000 that Carter=Nestor.
• We would like to maintain a standard of 99.9% (posterior) probability supporting an identity in order to assume it.
• Since 99.9% is 1000:1 odds and 1/30000 probability is 1:30000 odds and
(posterior odds) = (prior odds) × LR
the necessary total evidence LRT that we require must satisfy
LRT = (posterior odds) ÷ (prior odds) ≥ 1000 ÷ (1/30000) = 30,000,000.

Considering the total evidence LRT to be composed of two factors, scientific (meaning DNA) and anecdotal (everything else), we have  LRT = LRDNA × LRother = 11,700 × LRother ≥ 30,000,000, so we need LRother ≥ 30,000,000/11,700 = 2564.

Thus to declare an identification at our 99.9% standard, we'll need to supplement the DNA evidence with "other" or "anecdotal" evidence of strength LRother≥ 2564. Here are two equivalent ways to think from here:
LR shortfall thinking requisite prior thinking
LRother ≥ 2564 means that the DNA evidence leaves us with a LR shortfall of 2564.

Can we do it? Suppose we figure

• LRother = LRsex × LRage × LRscar, where
• LRsex = 2(note) since Nestor was male as would be expected if he is Carter
• LRage = 12 (say), from the age and date discussion above
• LRscar represents the evidence from the coincidence about scarring.
Then to bring LRother ≥ 2564, we need to believe that

LRscar 2564 / (2 × 12) = 107.

We could say that the LR shortfall before consideration of the scar is 107.

Is LRscar 107? That's believable, but it's not obvious! Might depend on the details.

(note)

#### LRsex — why it is 2

evidence E = gender of BI is same as gender of YA.
 LR = Pr(E | BI = YA) Pr(E | BI unrelated to YA) = 1/(proportion of the population with the gender of YA) = 1/½ (suppose) = 2.
The "requisite prior" thinking means to consider the "other" evidence before the DNA, and wrap it into the prior odds.
1. First, consider how much the prior odds must be in order that the evidence DNA be sufficient.

Our policy is to require (posterior probability) ≥ 99.9%, i.e. (posterior odds) ≥ 1000.

Since we know LRDNA = 11,700 and
(because (posterior odds) = (prior odds) × LR), the requisite prior odds for identification given LRDNA=11,700 are
(requisite prior odds) = (posterior odds) / LRDNA
(requisite prior odds) = 1000 / 11700
(requisite prior odds) = 1/11.7

2. Starting from the baseline prior = 1/30,000, can we justify that requisite prior?
sex
1/30000 is the odds prior to considering the sex data, and LRsex=2.
(odds posterior to sex)=(odds prior to sex) × LRsex
(odds posterior to sex)=(1/30000)×2=1/15000
age
1/15000 is the odds posterior to sex, and prior to considering age. Say LRage=12.
(odds posterior to age)=(odds prior to age) × LRage
(odds posterior to age)=(1/15000)×12=1/1250
scar
1/1250 is the odds prior to considering the scar.
(odds posterior to scar, prior to DNA)=(odds prior to scar) × LRscar.
We must have (odds posterior to scar, prior to DNA) ≥ (requisite prior) = 1/11.7, so
(odds prior to scar) × LRscar ≥ 1/11.7, i.e.
1/1250 × LRscar ≥ 1/11.7
, which entails
LRscar ≥ 1250/11.7 = 107 as before.
That is, given our policy and our other assumptions, we need LRscar ≥ 107 to declare the identification.

Go to top 