Closed vs. Open?
LinksForensic mathematics home page
Mathematics of Tsunami Identifications (paper)
Identifying whole families by DNA
WTC identification strategy
Progress on WTC identifications
Re-uniting El Salvador families
The general idea, I am sure, is that there is a dichotomy along these lines:
I disagree. This is a distinction without a difference. There are only differences of degree.
L>tthe identity is established.
But was it really special? In effect we assumed that the final identifications were idiosyncratic or ad hoc. Compared to the paradigm we had in mind up to the end that is true, but what was really going on? Later, as KADAP we considered prospectively the end-game of WTC identifications. We did not expect ever to have the neat and complete lists as in the Swissair case, but would the "closed" paradigm kick in at some late stage in the ID's? When? To what extent?
To answer questions like that, it is necessary to have an explicit understanding of the so-called "closed" phenomenon.
prior probability p, such as p=1/(v+1) where
v+1=number of victims
and the likelihood ratio is interpreted in the context of p, then compared with some threshold posterior probability.
Expressed in terms of odds, the above amounts to
posterior odds P=L (prior odds)=L/v,
which must exceed an odds threshold todds, perhaps todds=1000:
This more elaborate paradigm is what we adopted from the beginning for WTC. We considered v=10000 for the vague purpose of choosing todds (big enough to ensure 99% probability of no mis-identifications), and then worked with v=5000 for purposes of choosing t. For the first year v was conservatively kept at 5000.
At the KADAP meeting of September 9-10, 2001 the ID parameters were revisited. According to the meeting notes, v was reduced to v=3000 to reflect that the disaster was considered "closed" (meaning apparently that definition #1 was nearly satisfied). That seems to me non sequitur. The reason to reduce v was simply that the number of victims was known to be <3000. Whether the number was known precisely or the names known or DNA references available was irrelevant.
It could also have been recommended to always use
v=(number of remaining victims)-1.
Both. It is an extreme case of something that we could do all the time, if we thought to do so.
posterior odds = (likelihood ratio)(prior odds)= L/v = 200;Equivalently, we could arrange the work in a table. For this purpose, instead of the ratio of two likelihoods it is attractive to consider the two likelihoods separately.
posterior probability = (prior odds)/(1+prior odds) = 200/201 = 99.5%.
|(relative) prior odds||1||1000|
|(relative) posterior odds||200,000||1000||total=201,000|
Imagine one hypothesis per missing person:
The corresponding likelihoods are
and for the relative priors, we assume that
Now consider the situation when all but the last few identifications have been made only J, 1, and 2 are still to be identified. In a simple typical case, the situation would be:
|(relative) prior odds||1||1||1||0||...||0|
|(relative) posterior odds||50||0||0||0||...||0||total=50|
To the extent that the intuitive concept of "closed system" means anything in particular, I think it is explicated by the last table. That is, it corresponds to taking advantage of the likelihoods that are 0 because the victim profile mismatches certain victim's references, rather than concentrating solely on the suspected identity. Thus, if "closed" means anything, it means being in a situation where the idea of Table 2 can be employed.
But it should be apparent that the computations of Table 2 are possible any time, not just in the end game (although the consequences would be less dramatic). That means that every situation is, to a greater or lesser degree "closed". There is no actual distinction between "open" and "closed"; they are just matters of degree. That being the case it would be hard to support a claim that it is a critical criterion, whether the system is open or closed. In particular there are no special "closed" methods of analysis; the mathematical tools of analysis are always the same and always follow a Bayesian paradigm with likelihood ratios interpreted in light of prior odds.
Easier said than done. My sense, based on experience trying to
understand more deeply several cases where the likelihood ratio for
some identification was modest, is that they are quite knotty to
analyze. With indirect references there is always mutation to
consider and if there are direct references but the likelihood ratio
is still small the quality of the data is inevitably poor. Each of the
hypothetical 1159-30 eliminations would require manual inspection, and
maybe a lot of them would be problematic. So no doubt some ore is left
in the ground, but it is hard to extract. Some software aids could be
helpful, but there would still be a lot of manual tedium.
The ultimately correct mathematical approach might in fact to be to
formulate identification hypotheses that consider ALL the victim
profiles and ALL the reference data simulataneously. The number of
hypotheses would then be truly enormous (v+1 factorial), for a typical
hypothesis would be like:
body a is victim #3, and body b is victim #23, and ...
In the extreme case that all the definition clauses hold it may be if
the reference information is straightforward, mostly direct references
that one and only one of the of the (v+1)! compound hypotheses
"fits" the data, i.e. all the likelihoods but one are zero (provided
one's model excludes mutation and other uncertainties about the data).
I think this is the situation that Jack had in mind in persisting that
the "closed" paradigm has real meaning, and that the draft "Lessons
Learned" document alludes to where it says that in an "open" system,
in distinction to a "closed" one, identifications are "statistical"
(i.e. probabilistic). If "closed" alludes only to this very particular
circumstance, then I would concede that there is a fairly sharp distinction
between closed and open. But clearly, it is a circumstance that has no
relevance to WTC.
The ultimately correct mathematical approach might in fact to be to formulate identification hypotheses that consider ALL the victim profiles and ALL the reference data simulataneously. The number of hypotheses would then be truly enormous (v+1 factorial), for a typical hypothesis would be like: