See also:
Forensic mathematics
3-banded patterns in paternity
Paternity with many hypotheses
Zygosity studies in DNA·VIEW

Likelihood Ratios

What are they good for?

Summarizing evidence. Any kind of evidence. To illustrate, here are some examples.

French People

I propose the following method to test if a person is French. Time a one hour interval, and count the number of French words the person speaks.

Obviously this is relevant – on the whole, French people speak more French than non-French. There are exceptions of course, such as sleeping French people (who speak little French) and Quebec people (who speak quite a lot).

Since there is no obvious mathematical model for how many French words either a French or a non-French person speaks, it will be a good idea to calibrate my test in a pragmatic way. Let's build a database by measuring a large number of people, of both types (French and non-French), for one hour each.

Now let's take a test subject, start the stopwatch, and suppose we count ten French words in the sampling hour. Of course, it might be informative to know what those words were, but let's not ask that, and just ask what conclusion follows from the limited view that there were ten French words.

Ten French words in an hour

Suppose that a check of the database shows that this result is a performance occasionally turned in by a French person – once per hundred hours – and less often by non-French people – only once out of 500 one-hour experiments. That is, speaking 10 French words in an hour is 5 times more characteristic of French, than of non-French.

Does this mean the person is 5:1 to be French? Possibly, but not necessarily. It depends on the context. If the person had been intentionally selected as 50:50 to be either French nor not, then yes. If the person was randomly selected from the Paris phone book, then the person was 100:1 to be French before the experiment and this number doesn't decrease to 5:1 because they speak French!

Sometimes people try to insist on asking, "What if the context is completely unknown, is random, there is no context. Then what?" The question has little meaning and makes no sense. There is a temptation to claim that "picking a person at random from the whole world" is a random context, but why should that be so? Doesn't a "random context" really mean picking a context at random from all contexts? But here are some contexts:

  1. Pick a person at random from all people in the world except Aaron Aardvark.
  2. Pick Tallyrand with a 70% probability, and otherwise pick Quisling.
  3. Pick the next person who walks into Grand Central Station.

The list is not only infinitely long, it is uncountably infinite. There is no way to "average" over it. Context is like preconceptions. You may sometimes think you don't have them, but surely you know that everyone else does. The fact is, you can't avoid them.

Exactly French words in an hour

Ok, now let's take a closer look at the data. Exactly ten French words in an hour is an unlikely result from anyone (most French people speak either more or less), but I'm sure you won't be surprised when I tell you that, according to the database, it is 5 times more characteristic of French, than of non-French.

What, you want to know the actual numbers? I'm disappointed; you seem to be thinking like a statistician.

What are they?

LR = likelihood1 ratio = "the ratio of two probabilities of the same event under different hypotheses." If you can think of "evidence" as meaning information that might nudge your decision about some matter in one direction or the other, then the LR is the appropriate numerical summary of the evidence.

Testing for a disease

A classical example is testing for a disease. The subject tests positive for the disease, under a test that has a true positive rate of 60% (only 60% of the afflicted trigger a positive response for the disease) and a false positive rate of 1% (1% of healthy people trigger a positive response). Then the LR=60, meaning that the positive response is 60 times more characteristic of sick people than of healthy. Although that is strong evidence that the subject has the disease, obviously it does not imply any probability conclusion. If the disease is very rare, then most positives are in fact false positives. But it does increase the odds of being afflicted 60-fold from whatever they were before testing.
1. Likelihood is a synonym for probability, except that we say "likelihood" when the emphasis2 is on varying the hypotheses (or "conditional") under which the "event" is considered (as opposed to varying the event, or varying neither).

2. Ok, maybe more than just "emphasis". In statistics there's a technical definition of the word "likelihood" according to which it is not synonymous with probability, but rather is applied to a condition and means the probability, under that condition, of an unstated but assumed event. For example consider these two probabilities –

Then That usage flies in the face of grammar and seems guaranteed to confuse normal people and encourage "transposition of the conditional." But it's standard in statistics and probably useful if you're used to it. I think it was invented by Fisher in order to have a word for the second situation.


Return to home page of Charles H. Brenner