Home | Site Map | Site Stats | Contact Us | Discussion Forum

Welcome to the Dendrome Project!

Icon 05 Help

Help Desk New Account Tutorials FAQ Forum

Icon 03 Updates

New EST analysis and submission pipeline available for use! | Plant Gene Ontology database ported into mysql | New Forestry Careers and Education Outreach Website is Live! |

Icon 02 Links

Conifer Genome Network | Conifer Genome Project | TreeGenes Database | Dendrome Wiki | Neale Lab | Forestry Careers and Education Resource |



TreeGenes :: Help :: Tutorials

Back to Tutorials

NUCLEOTIDE DIVERSITY

Nucleotide diversity is a measure of polymorphism in a sample of gene sequences. It is a summary statistic used to represent patterns of molecular diversity within a sample of gene copies. This concept is tied to measures of diversity in other biological fields (e.g., diversity metrics in ecology) and is a similar measure to the expected heterozygosity (= gene diversity) for a sample of allelic states at a single locus. This concept was introduced by Nei and Li (1979).

Their formula for nucleotide diversity is:

Pi = sum{ij}{}{x_i x_j pi_{ij}},

where pi_ij is the proportion of different nucleotides between the ith and jth types of DNA sequences, and x_i and x_j are the respective frequencies of these sequences. The proportion of different nucleotides pi_ij can be modified using a variety of statistical models describing DNA sequence evolution (e.g., Jukes-Cantor, Kimura 2 Parameter, General Time Reversible).

The summation is taken over all distinct pairs i and j without repetition. That is:

Pi = sum{ij}{}{x_i x_j pi_{ij}} = sum{i=1}{n}{ (sum{j=1}{i}{ x_i x_j pi_{ij}})},

where n is the number of sequences in the sample.

Tajima (1983) and Nei (1987) scaled this measure to the length of gene sequences being considered (L).

The formula now becomes:

Pi_n = sum{i=1}{n}{ (sum{j=1}{i}{ x_i x_j pi_{ij}})}/L.

Note: Often times pi_ij is symbolized by d_ij, so that nucleotide diversity itself becomes pi or pi_n.

Tajima (1983) and Nei (1987) also provided an estimator for the total variance (sampling and stochastic).

It is given by the following formula:

var(pi_n) = {{{n+1}/{3(n-1)L}}{pi_n}}+{{{2(n^2+n+3)}/{9n(n-1)}}{{pi_n}^{2}}},

where pi_n is the point estimate of nucleotide diversity, n is the sample size (i.e., number of gene sequences sampled), and L is the length of the gene sequences.

REFERENCE

NEI, M. 1987. Molecular Evolutionary Genetics. Columbia University Press, New York, NY, USA.


Help