Kinship analysis

Kinship Analysis

A part of DNA·VIEW called the Symbolic Kinship Program analyzes problems of relatedness based on genetic data, for a rather general class of problems of which the ordinary paternity trio problem is the prototype. Examples include:
motherless case:
Is this man the father of this child, based on genetic types from just the two of them?
incest case:
Do the genetic types indicate that two people are doubly related?
sibling problems:
Are two given people full siblings? half-siblings? unrelated?
inheritance problem:
Are two people related as claimed?
twin problem:
Are two siblings (whose parents may not be tested) identical twins?
corpse identification:
Is this corpse the same person who was reported missing by some family?

Symbolic Kinship Program

The inspiration for the Symbolic Kinship Program was an earlier program developed by students of Hummel that gives numerical answers to such problems. The novelty of the Symbolic Kinship Program is that it produces explicit algebraic formulas. Naturally, once the formula is obtained a numeric answer can quickly and trivially be computed, so the formula is clearly as good as a number. In addition it provides many advantages, such as verifiability, insight, and modelling.

In principle the formula may be arbitrarily complicated and the time to derive it arbitrarily long depending on the complexity of the problem. However, the satisfying and lucky fact is that the formula complexity grows only slowly with the complexity of the problem. All practical problems that have arisen require only seconds on an ordinary desktop computer, and even more fanciful problems take only minutes.

The Prototype Kinship Program

The original version of the Symbolic Kinship Program made a computation for a single genetic system. For example, to compute a likelihood ratio comparing whether a given body, Foundindumpster, is the missing child of Mary and the missing father of Kimberley, one would type in the pedigree definitions:
     Foundindumpster/Unfound : Mary + Joe
     Kimberley               : Wife + Foundindumpster/Unfound
and symbolic genotype specifications such as
     Foundindumpster pq
     Mary            qr
     Kimberley       q
The answer, derived instantly by the program, is 1 / (2q + 4qq).

The prototype kinship program then offers to evaluate the expression for user-specified values of the gene frequencies – i.e., if q=0.1, the likelihood ratio from this locus is 4.17 favoring the proposition that Foundindumpster, rather than some untyped and as yet Unfound body is the missing person.

After a while I added a mechanism that allows the user to accumulate the answers across several loci and save the work. The resultant product seemed a useful toy. It is included as part both of DNA·VIEW of the PATER package. But it is a stand-alone tool in the sense that it doesn't interact with any of the other mechanisms of either suite of programs.
Reconstruction report

The Reconstruction Report

I did, though, include a separate facility that would prepare a layout of the available genetic data, as preparation for the user about to perform a Kinship analysis. Assuming the DNA types are available in DNA·VIEW, the "Reconstruction report" produces a tableau that shows a set of consistent symbolic genotypes for each person (such as pq, qr, q above), and the frequencies for each allele. That saves a bit of time for the analyst.

The incest example mentioned above is a more involved example of the use of the Kinship program.

"Immigrator" – the automatic version of Kinship

The Kinship program attracted far more interest and usage than I expected. Eventually the ability to solve heretofore intractible problems in an hour or two (counting analysis time) apparently succumbed to some variant of Parkinson's law ("Work expands to fill the available time."). and on the urging and commission principally of the Forensic Medicine Institute in Copenhagen, I produced a vastly more efficient version of the tool by integrating it with other facilities of DNA·VIEW. (detailed example of its use)

The principal stroke was to feed genotype and frequency information directly to the kinship module. The user of the "Immigration program" or "Immigrator" is therefore freed from deciding the genotype patterns – the program does that – and indeed of any locus-by-locus interaction. An hour of analysis is reduced to a minute.

Of course that may do no good – there is still Parkinson's law. But it surely does no harm.

email contact