WTC DNA identification – prospectus
Analysis of screening; Powerpoint presentation
Tsunami victim identification considerations
Forensic mathematics home page

World Trade Center Disaster Identification Diary

Thursday, September 13 – I managed to collect my email while visiting London on the last leg of a several week tour of Europe. Nancy and I had learned of the Trade Towers attack from the television in our hotel in Haarlem where we were on Tuesday. It was a bad end to a good afternoon. Haarlem, a historically significant and easy-going neighbor of Amsterdam (a reputedly easy-going place in its own right) was the home of the riveting 17th century portraitist Frans Hals and is today the home of the Frans Hals museum. So we – two people with little understanding, appreciation, or interest in art – had spent a fascinating hour or two perusing the collection of Hals works that are accessible given the limited budget of a small museum.
Hals' Willem Coymans

Back at the hotel I had switched on the TV in time to see some of the live action and numerous replays of now-familiar events, accompanied by surprisingly little voice-over aside from a short period during which a smug BBC commentator analyzed the context and future significance of the goings-on until mercifully he was given the hook.

My potential involvement

Wednesday morning we were slightly affected by the attack in the sense that we waded through security confusion when we flew out of Schiphol (Amsterdam) for London. Moreover, in considering the talk that I was scheduled to present in London, I hit on the idea to discuss the prospects for WTC identifications.* But not until I read my email message from Dr. Howard Baum of the Forensic Biology (i.e. DNA) section of the Office of the Chief Medical Examiner of New York did the possibility of my personal involvement with the disaster occur to me. The OCME has been using the Kinship module of DNA•VIEW for several years, and it is already well established as a tool for disaster body identification. I shouldn't have needed Howard to remind me. His message was brief. He wrote, "We need help coping with the mass disaster in New York City."

Howard's immediate thought was the application of the Kinship program, but I thought I had even more to offer. There was going to be a lot of genetic data to manipulate, whose exact nature couldn't be predicted in advance. As one who has worked with computers since 1959, earned a doctorate in mathematics, and done dozens of practical or research projects involving DNA-relationship ideas and computations, I figure I am uniquely prepared to perform whatever manipulations and analysis might be necessary to wring information from the data.

Swissair identification paradigm

From earlier experience in disaster identification with the Swissair 111 crash, I assumed that there would be a necessary "screening" step in making the WTC identifications based on relatives; further, I could extrapolate that due to the larger scale of the WTC problem new complexities would need to be faced (links WTC prospectus and WTC Powerpoint above).

Some of the victims of the Swissair crash needed to be identified indirectly, by comparison with living (or dead) relatives. A two step paradigm emerged:

  1. Screening – a rough but quick scan (<<1 second/comparison) to select likely candidate matches between victim and reference family samples by comparing each individual reference with each victim sample (millions of screening comparisons in the WTC case);
  2. Testing – a candidate family-to-victim correspondence is evaluated by using the Kinship program to make the accurate computation (minutes/comparison) which considers the family as a group (hopefully only a few thousand WTC relationships will require testing and confirmation).

Benoit LeClair, then of the RCMP DNA group, had shown how the screening could be done in that case with a mere Excel application, based on sorting the DNA profiles such that similar profiles would be sorted into neighboring positions. I considered his idea as a starting point. But for the much larger dataset that the WTC disaster might produce, I soon became convinced that more firepower and algorithmic sophistication would be necessary.

The false-positive problem

The main difficulty that I foresaw emerging as the sizes of the two lists – the victim list and the family reference list – grow, is the increasing incidence of "false positives." If both lists are small and some person C in the reference list has a brother who died, and some profile V in the victim list looks like a brother of C, it probably is. However, if the victim list has thousands of profiles, then for any given reference person C there will be dozens of victim profiles that coincidentally resemble C just as much as a typical true brother does. The proportion of false positives is proportional to the size of the victim list.

Therefore I was sure that a simple sorting program of the sort that had been adequate for the Swissair identifications, would not be very useful for the WTC disaster. Specifically, in my London talk on September 13, I hypothesized –

  1. 20,000 dead
  2. 5,000 recovered
  3. 4,000 positively identified via e.g. toothbrush DNA (LR=1012 per ID) dental or other ID
  4. 500 with 2 close relatives (parent or child)
  5. LR>108 per ID. (LR=likelihood ratio)
  6. 300 with 1 close relative
  7. LR=1000 per ID
  8. 100 with combination of sisters and uncles and cousins and aunts
  9. LR inadequate unless all data taken into account

Of course the first several of my estimates have proven to be quite far off. However, #4-8 are in the ballpark. The implication of the estimate #7 is, that for every 1000 victims, there will be about one who coincidentally resembles any given reference person to the same extent as does a true child. Thus, using individual parents as references to fish victims out of the rubble would result in more false leads than true ones. On the other hand, #5 implies that if a more sophisticated trolling operation is used, wherein two reference parents are simulataneously compared with each victim to accomplish a sort of triangulation, then the number of false hits will be very small. Point #9 is correct.

Work begins

Toward the end of September I outlined what I felt I could contribute in a fax to Howard, and was gratified at his prompt acceptance of my offer. He added a new task as well. Several different computer programs had been offered to do the "screening" part of sorting out the identifications; would I evaluate these offerings? Of course – although admittedly my expectation was already that I would write my own. As I later remarked to Howard's chief, Bob Shaler (head of the Forensic Biology section), I thought I could write the program from scratch faster than examine and evaluate the other possibilities. Bob diplomatically accepted my remark, but of course held his ground.

The upshot was, on October 2, the first of several trips to NY. Dr. Shaler organized a one-day meeting, a "summit of genetics experts" (Wall Street Journal) to discuss various problems and possible approaches for sorting through the inevitable masses of data. Five laboratories – the city Office of the Chief Medical Examiner, NY State Police, Bode Technologies, Myriad, and Celera – who were expected to do parts of the DNA analysis were represented. Coincidentally Myriad now included my old colleague Benoit formerly of the RCMP. The FBI and myself were also present to discuss software, as well as Howard Cash and others from a company called GeneCodes that is contracted with the OCME to provide software. Finally, there were a few people from the NIJ (National Institutes of Justice, which is another arm of the DOJ). Following introductory explanations by Bob Shaler in the morning several of us presented our ideas about making the necessary victim-to-reference identifications. The afternoon was mostly discussions, and of course mostly rather general and oriented toward planning. From time to time, though, people inevitably succumb to the temptation to discuss details even when it is obvious that there is neither time nor yet sufficient information to make a detailed discussion productive. When this happened, Chief Inspector Dale of the State Police patiently suggested, each time as if it were the first, that it would be appropriate to make general plans. I enjoyed that.

At some juncture, concerned that the plans might be steering toward an unnecessary and ponderous software project, I made a comment to the same effect as I have indicated above, that once I am able to get my hands on the data, I will quite quickly be able to produce the tentative identifications by myself. At this Howard Cash piped in, "Surely, Charles, even your work can stand a second opinion." I told him he had a fair point.

WTC Kinship and Data Analysis Planning Panel (KADAP)

A few weeks later, on October 19-21, the NIJ convened a distinguished panel of about 25 people of various expertises in Albany, NY. The group included several people I knew well from the forensic DNA community, and others, including a few I had heard of, who I had the pleasure of meeting for the first time.

The three-day meeting ranged over a variety of topics. The one topic originally mentioned to me was the same that Bob Shaler had already asked of me: choose which screening program to use. To that end I put together a Powerpoint presentation to explain the difficulties and pitfalls as I foresaw – and, by now, had computed.

In assessing the candidate screening programs, I had in mind several design requirements: