Careful Formulation of a Likelihood Ratio Statement based on Anecdotal Evidence

Table of contents

  1. The problem
    1. Background
    2. (anecdotal) example
  2. Analysis
    1. What is the evidence?
    2. What is the LR?
    3. Do the math
    4. Conclusion
      1. Interpretation
      2. An issue of bias
  3. Exercises
    1. Age coincidence
  4. LR shortfall or requisite prior

  1. The problem
    1. Background
    2. A DNA profile as evidence linking two things (i.e. suspect to crime scene) is a situation well suited to formulation of a likelihood ratio (LR) because the DNA profile is
      1. well defined — we know what the profile is and hence what a "match" means, and
      2. nicely quantifiable — most importantly we can compute with reasonable accuracy the probability of the random occurrence of a profile
      Other kinds of evidence are not so nice. Especially anecdotal evidence – non-scientific evidence – is likely to be deficient with respect to both of the above criteria. Still, I think it can sometimes be usefully treated in a Bayesian way. Inevitably we will have to guess the strength of the evidence partly through intuition. But still there's scope for exacting analysis. Namely, it's important to formulate accurately the questions to be answered, so that at least we apply our subjective intuition to the correct questions.

        ProBusqueda

        In our ProBusqueda work, we have many cases of young adults who were lost to their families in the chaos of military fighting in El Salvador, lost their birth identities (BI), were adopted, and now as young adults (YA) wish to reconnect with their birth family. Sometimes DNA is sufficient to make a confident link between YA and BI, but sometimes it falls short. Always there is other connecting evidence of various kinds in which case we would like to do the impossible and quantify it in order to make a good decision as to whether the non-DNA evidence is sufficient to make up the shortfall between the DNA evidence and confident identification.

    3. (anecdotal) example
    4. Young adopted adult "Carter" bears scars.
      Family of birth name/missing child "Nestor" mentions neck scars from vaccination & insect bites.

  2. Analysis
    1. What is the evidence?
    2. Is the evidence that
      YA Carter and BI Nestor both have scars?

      Let's be more precise. The evidence is that

      YA Carter has scars and relatives report that the infant BI Nestor had scars.

      More carefully, let's describe the evidence as EC & EN, where
      EC = YA Carter has scars of a certain description S.
      EN = BI Nestor's family describes scars of a certain description S' (similar to but certainly not identical description to S)

    3. What is the LR?
    4. Then the LR = X/Y where

      X = Pr(EC & EN | Carter is Nestor)
      Y = Pr(EC & EN | Carter is not Nestor).

    5. Do the math
    6. ... doesn't depend on the relationship The way people describe Carter is not biased by his true history (especially if unknown).

      Note: EC doesn't depend on the relationship, so Pr(EC | Carter is Nestor) = Pr(EC | Carter is not Nestor) = Pr(EC).

      Consider X. There are two ways to apply the identity Pr(F & G) = Pr(F)Pr(G|F). In this case I choose an adult-centric formulation by letting EC play the role of F:
      X= Pr(EC & EN | Carter is Nestor)
      = Pr(EC | Carter is Nestor) Pr(EN | Carter is Nestor & EC)
      = Pr(EC) Pr(EN | Carter is Nestor & EC).
      (adult-centric in that the probability of the YA appearance is evaluated unconditionally — Pr(EC) — and the probability of the BI appearance is considered conditionally on the YA appearance)

      And Y:
      Y = Pr(EC & EN | Carter is not Nestor)
      = Pr(EC | Carter is not Nestor) Pr(EN | Carter is not Nestor & EC)
      = Pr(EC) Pr(EN | Carter is not Nestor & EC).

      So LR=Pr(EN | N*) / Pr(EN | ~N*), where
      N* = Nestor is a childhood version of Carter, a person who bears scars S.

      ... apply the identity The mathematical rule for the probability of a conjunction in terms of the probabilities of the constituent events.

      Example: The probability that a person is F=over fifty and G=unemployed can be computed if you know

      • Pr(F) = proportion of people over fifty
      • Pr(G|F) = proportion of unemployed among those over fifty.
      Now just multiply.

      Alternatively, it could be calculated from

      • Pr(G) = proportion of unemployed
      • Pr(F|G) = proportion of over-fifties among those unemployed.
      Now multiply.

    7. Conclusion
    8. Let's rephrase:
      Evidence EN = Relatives remember and report scars S' for the child Nestor
      hypothesis H1 = the child is the same person as the adult who now has scars S
      hypothesis H0 = the child is random person
      LR = Pr(EN | H1) / Pr(EN | H0).
      or in words: How many times more probable is EN when H1 is true, than when H0 is true?

      1. Interpretation
      2. The LR is thus a comparison of these two probabilities:
        Pr(EN | H1) = The probability that relatives of a person with scars S would report childhood scars S'
        versus
        Pr(EN | H0) = The probability that relatives of some random infant would recall childhood scars S'.
        No doubt the first probability is larger than the second one, hence LR>1. But unless the scars are fairly striking – obviously unusual – or the coincidence between the descriptions S and S' very strong, I would be reluctant to conclude that LR is very large.

      3. An issue of bias
      4. It's not clear, from the story I've presented, how the testimony from the relatives came about. It could be that the relatives spoke of the scars S' after being prompted in some way. That is, suppose that the actual evidence is a biased version EN' of EN:
        Evidence EN' = Relatives remember and report scars S' for the child Nestor after being given a suggestion in the form of information or a picture of Carter.
        It may be that the bias makes the denominator much larger, i.e. that Pr(EN' | H0) >> Pr(EN | H0). (Maybe numerator too, but probably less so.) It may be that a very high percentage of children might be described as having scars if the survey is in the form of a leading question. The way the information is collected may have a big effect and consequently the actual LR from the anecdote may be not very big, not very helpful in concluding identity.

  3. Exercises
    1. Age coincidence
    2. As an exercise, consider the following data: Hypothesis H1 = the child is the same person as the adult who believes EC
      Hypothesis H0 = the child is random person

      We have some survey data based on 698 missing children:

      ... are represented by some part of the sample of 698 "Represented" doesn't mean Carter would be among the 698 in the sample, but only that, as the sample represents the 30,000 or so missing children, the missing children with Carter's age and time of disappearance are proportionally represented in it.
      If Carter is not Nestor, we can assume that his possible birth identities are represented by some part of the sample of 698. Depending on how much vagueness about dates and ages we accept, there are from 19 to 114 out of those 698 that are consistent with his belief. Therefore the LR supporting identity based on this data is

      6 ≤ LR ≤ 36.

  4. LR shortfall or requisite prior
  5. Let's consider the above analyses in this context: Thus to declare an identification at our 99.9% standard, we'll need to supplement the DNA evidence with "other" or "anecdotal" evidence of strength LRother≥ 2564. Here are two equivalent ways to think from here:
    LR shortfall thinking requisite prior thinking
    LRother ≥ 2564 means that the DNA evidence leaves us with a LR shortfall of 2564.

    Can we do it? Suppose we figure

    • LRother = LRsex × LRage × LRscar, where
    • LRsex = 2(note) since Nestor was male as would be expected if he is Carter
    • LRage = 12 (say), from the age and date discussion above
    • LRscar represents the evidence from the coincidence about scarring.
    Then to bring LRother ≥ 2564, we need to believe that

    LRscar 2564 / (2 × 12) = 107.

    We could say that the LR shortfall before consideration of the scar is 107.

    Is LRscar 107? That's believable, but it's not obvious! Might depend on the details.


    (note)

    LRsex — why it is 2

    evidence E = gender of BI is same as gender of YA.
    LR = Pr(E | BI = YA)
    Pr(E | BI unrelated to YA)
    = 1/(proportion of the population with the gender of YA)
    = 1/½ (suppose)
    = 2.
    The "requisite prior" thinking means to consider the "other" evidence before the DNA, and wrap it into the prior odds.
    1. First, consider how much the prior odds must be in order that the evidence DNA be sufficient.

      Our policy is to require (posterior probability) ≥ 99.9%, i.e. (posterior odds) ≥ 1000.

      Since we know LRDNA = 11,700 and
      (because (posterior odds) = (prior odds) × LR), the requisite prior odds for identification given LRDNA=11,700 are
      (requisite prior odds) = (posterior odds) / LRDNA
      (requisite prior odds) = 1000 / 11700
      (requisite prior odds) = 1/11.7

    2. Starting from the baseline prior = 1/30,000, can we justify that requisite prior?
      sex
      1/30000 is the odds prior to considering the sex data, and LRsex=2.
      (odds posterior to sex)=(odds prior to sex) × LRsex
      (odds posterior to sex)=(1/30000)×2=1/15000
      age
      1/15000 is the odds posterior to sex, and prior to considering age. Say LRage=12.
      (odds posterior to age)=(odds prior to age) × LRage
      (odds posterior to age)=(1/15000)×12=1/1250
      scar
      1/1250 is the odds prior to considering the scar.
      (odds posterior to scar, prior to DNA)=(odds prior to scar) × LRscar.
      We must have (odds posterior to scar, prior to DNA) ≥ (requisite prior) = 1/11.7, so
      (odds prior to scar) × LRscar ≥ 1/11.7, i.e.
      1/1250 × LRscar ≥ 1/11.7
      , which entails
      LRscar ≥ 1250/11.7 = 107 as before.

Go to top