"Closed" vs. "Open" Disaster: Useful distinction or misunderstanding?

definitions?

There is a complete list of the missing (like a flight manifest)
As many bodies have been found as the number of victims. ("Victim" and "missing" are regarded here as synonyms.)
There is DNA for every body.
There is a DNA reference for every victim.

The general idea, I am sure, is that there is a dichotomy along these lines:

normal disaster identification is through DNA similarity between victim and reference samples that is prohibitively unlikely by coincidence
in the "closed" situation when you come to the last or last few identifications you have a new tool – elimination.

I disagree. This is a distinction without a difference. There are only differences of degree.

The Swissair 101 example

Normal identification protocol

L>t

The last few identifications

L>t

But was it really special? In effect we assumed that the final identifications were idiosyncratic or ad hoc. Compared to the paradigm we had in mind up to the end that is true, but what was really going on? Later, as KADAP we considered prospectively the end-game of WTC identifications. We did not expect ever to have the neat and complete lists as in the Swissair case, but would the "closed" paradigm kick in at some late stage in the ID's? When? To what extent?

To answer questions like that, it is necessary to have an explicit understanding of the so-called "closed" phenomenon.

Deciding the threshold

Prior odds

L>1 million

prior probability p, such as p=1/(v+1) where
v+1=number of victims

and the likelihood ratio is interpreted in the context of p, then compared with some threshold posterior probability.

Expressed in terms of odds, the above amounts to

prior odds=1/v
posterior odds P=L · (prior odds)=L/v,
which must exceed an odds threshold t_odds, perhaps t_odds=1000:
L/v>t_odds.

This more elaborate paradigm is what we adopted from the beginning for WTC. We considered v=10000 for the vague purpose of choosing t_odds (big enough to ensure 99% probability of no mis-identifications), and then worked with v=5000 for purposes of choosing t. For the first year v was conservatively kept at 5000.

At the KADAP meeting of September 9-10, 2001 the ID parameters were revisited. According to the meeting notes, v was reduced to v=3000 to reflect that the disaster was considered "closed" (meaning apparently that definition #1 was nearly satisfied). That seems to me non sequitur. The reason to reduce v was simply that the number of victims was known to be <3000. Whether the number was known precisely or the names known or DNA references available was irrelevant.

It could also have been recommended to always use

v=(number of remaining victims)-1.

The end-game

Both. It is an extreme case of something that we could do all the time, if we thought to do so.

Bayes' theorem

Two hypotheses – odds formulation

v=1000

L=200,000

posterior odds = (likelihood ratio)·(prior odds)= L/v = 200;
posterior probability = (prior odds)/(1+prior odds) = 200/201 = 99.5%.

Table 1
hypothesis	identity=J	someone else
(relative) likelihood	200,000	1
(relative) prior odds	1	1000
(relative) posterior odds	200,000	1000	total=201,000
normalized (probability)	99.5%	0.5%

Multiple hypotheses

Imagine one hypothesis per missing person:

H_J – body is person J
H₁ – body is person 1
H₂ – body is person 2
...
H_v – body is person v.

The corresponding likelihoods are

L_J=200,000
L_i=0 if the victim type is incompatible with the references for i
L_i=1 if there is no reference for i
L_i might have other values is special cases, for example if i related to J,

and for the relative priors, we assume that

prior=1 if i is missing
prior=0 if i has already been identified.

Now consider the situation when all but the last few identifications have been made – only J, 1, and 2 are still to be identified. In a simple typical case, the situation would be:

Table 2

hypothesis H_J H₁ H₂ H₃ H_... H_v
(relative) likelihood 50 0 0 0 ... 0
(relative) prior odds 1 1 1 0 ... 0
(relative) posterior odds 50 0 0 0 ... 0 total=50
normalized (probability) 100% 0 0 0 ... 0
Thus, the identification of J is certain even though L_J=50 is very modest evidence.

Table 2
hypothesis	H_J	H₁	H₂	H₃	H_...	H_v
(relative) likelihood	50	0	0	0	...	0
(relative) prior odds	1	1	1	0	...	0
(relative) posterior odds	50	0	0	0	...	0	total=50
normalized (probability)	100%	0	0	0	...	0

Conclusion

To the extent that the intuitive concept of "closed system" means anything in particular, I think it is explicated by the last table. That is, it corresponds to taking advantage of the likelihoods that are 0 because the victim profile mismatches certain victim's references, rather than concentrating solely on the suspected identity. Thus, if "closed" means anything, it means being in a situation where the idea of Table 2 can be employed.

But it should be apparent that the computations of Table 2 are possible any time, not just in the end game (although the consequences would be less dramatic). That means that every situation is, to a greater or lesser degree "closed". There is no actual distinction between "open" and "closed"; they are just matters of degree. That being the case it would be hard to support a claim that it is a critical criterion, whether the system is open or closed. In particular there are no special "closed" methods of analysis; the mathematical tools of analysis are always the same and always follow a Bayesian paradigm with likelihood ratios interpreted in light of prior odds.

Epilogue – WTC identifications

WTC and Table 2

v=2749-1591-1=1159

v=3000

Easier said than done. My sense, based on experience trying to understand more deeply several cases where the likelihood ratio for some identification was modest, is that they are quite knotty to analyze. With indirect references there is always mutation to consider and if there are direct references but the likelihood ratio is still small the quality of the data is inevitably poor. Each of the hypothetical 1159-30 eliminations would require manual inspection, and maybe a lot of them would be problematic. So no doubt some ore is left in the ground, but it is hard to extract. Some software aids could be helpful, but there would still be a lot of manual tedium.

Ultimate analysis

The ultimately correct mathematical approach might in fact to be to formulate identification hypotheses that consider ALL the victim profiles and ALL the reference data simulataneously. The number of hypotheses would then be truly enormous (v+1 factorial), for a typical hypothesis would be like:

body a is victim #3, and body b is victim #23, and ...

Final remark

all

definition clauses

(v+1)!

Links