Mitochondrial DNA Statistical Reporting

The statistical weight of a sequence match will be presented in the final report by stating the number of times a particular sequence is present in the current EMPOP database. The total number of sequences in the database will be reported to provide a context for that number.

A statement of statistical calculations can be prepared in the event that it is needed, as follows:

The statistical calculations are based on equations in Holland and Parsons (1999), *Forensic Science Reviews*, Mitochondrial DNA sequence analysis: Validation and use for forensic casework, page 32.

A. For cases where the casework mtDNA type has never been observed in the database (“zero proportion”) the calculation is:

Confidence limit from zero proportion = 1-a^{1/N} where

a = 0.05 for a 95% confidence level *or*

a = 0.01 for a 99% confidence level (more conservative)

N = number of samples in the database

With the current EMPOP (1) database for North Americans, N = 11,180.

Where a = 0.05, the 95% confidence limit is 0.06%. This is sometimes referred to as the 95% upper bound frequency. This is interpreted in the following way: there is a 5% chance that the true frequency in the population exceeds 0.06%. The confidence limit allows for inherent uncertainty as to the true population frequency because not everyone can be typed. The greater the size of the database, the smaller this number will become for novel types. Based on the above calculation, by extension we can exclude 99.94% of the population as contributors of the questioned sample.

Where a = 0.01, the 99% confidence limit is 0.10%. This is interpreted in the following way: there is a 1% chance that the true frequency in the population exceeds 0.10%.

B. For cases where the casework mtDNA type has previously been observed in the current database, the calculation is:

Confidence interval = p + x{sqrt[p(1-p)/N]} where

p = no. of observations of type in database / N

x = 1.96 for a 95% confidence interval *or*

x = 2.54 for a 99% confidence interval (more conservative)

N = number of samples in the database

With the current EMPOP (1) database for North Americans, N = 11,180.

For example, where there have previously been two observations of the type, the 95% confidence interval is 0 - 0.10%. The upper bound of this range is the upper bound frequency. This is interpreted in the following way: there is a 5% chance that the true frequency of this previously observed type in the population exceeds 0.10%. The confidence interval allows for inherent uncertainty as to the true population frequency because not everyone can be typed. The larger the database becomes, the smaller the confidence interval becomes. Based on the above calculation, by extension we can exclude 99.90% of the population as contributors of the questioned sample. The 99% confidence interval is 0 - 0.12%. This is interpreted in the following way: there is a 1% chance that the true frequency in the population exceeds 0.12%.

References:

^{1}EMPOP—A Forensic MtDNA Database. Walther Parson, Arne Dür, 2007, **Forensic Science International: Genetics**, Vo1. 1, No. 2, Pages 88-92.