Home | Site Map | Site Stats | Contact Us | Discussion Forum

Welcome to the Dendrome Project!

Icon 05 Help

Help Desk New Account Tutorials FAQ Forum

Icon 03 Updates

New EST analysis and submission pipeline available for use! | Plant Gene Ontology database ported into mysql | New Forestry Careers and Education Outreach Website is Live! |

Icon 02 Links

Conifer Genome Network | Conifer Genome Project | TreeGenes Database | Dendrome Wiki | Neale Lab | Forestry Careers and Education Resource |



TreeGenes :: Help :: Tutorials

Back to Tutorials

GENE DIVERSITY

Gene diversity is a measure of the expected heterozygosity in a sample of gene copies collected at a single locus. It is a summary statistic used to represent patterns of molecular diversity within a sample of gene copies. Typically, the gene copies are allelic states such as allozymes or fragment sizes (e.g., RFLPs, AFLPs, microsatellites). The expected heterozygosity is caluclated under the assumption that the sample of gene copies was drawn from a population at Hardy-Weinberg equilibrium (HWE).

Under HWE, the expected heterozygosity for n gene copies represented by k alleles (= haplotypes) sampled at a single locus is:

H_e = {{2n}/{2n-1}}{(1 - sum{i=1}{k}{p_i}^2)}, (Equation 1)

where p_i is the frequency for the ith of k alleles. Since diversity is defined by both the number of things sampled (i.e., alleles) and their evenness, it should be clear why large values of H_e represent very diverse samples.

Note that in a population with selfing {{2n}/{2n-1}} becomes {{n}/{n-1}}{.

The sampling variance of this measure was formulated by Nei (1987) and is given by the following formula:

var(H_e) = {2/{2n(2n-1)}}({{2(2n-2)}({{sum{i=1}{k}{{p_i}^{3}}-{(sum{i=1}{k}{{p_i}^{2}})^{2}}}+{sum{i=1}{k}{{p_i}^{2}}-{(sum{i=1}{k}{{p_i}^{2}})^{2}}})})}. (Equation 2)

Note that in a population with selfing the expression above becomes:

var(H_e) = {2/{n(n-1)}}({{2(n-2)}({{sum{i=1}{k}{{p_i}^{3}}-{(sum{i=1}{k}{{p_i}^{2}})^{2}}}+{sum{i=1}{k}{{p_i}^{2}}-{(sum{i=1}{k}{{p_i}^{2}})^{2}}})})}.

Gene diversity over all loci can be measured with a slight modification to the formulae above. In this instance, the summary statistic is referred to as the average gene diversity. This is given by the following formula:

avg(H_e) = {sum{j=1}{r} {{2n_l}/{2n_l-1}}{(1 - sum{i=1}{k}{p_i}^2)}}/r, (Equation 3)

where r is the number of sampled loci and n_l is the sample size (i.e., number of gene copies sampled) per locus.

The sampling variance of this estimator is given by:

var(avg(H_e)) = {sum{j=1}{r}{{( {{2n_l}/{2n_l-1}}{(1 - sum{i=1}{k}{p_i}^2)}}-{avg(H_e)})}^2}/{r-1}. (Equation 4)

This variance can also be decomposed into two additive components assuming that loci are independent. These two components were labeled by Nei and Roychoudhury (1974) as the interlocus and intralocus variances. The variance given above for gene diversity at a single locus (Equation 2) represents the intralocus variance. For r sampled loci, the total intralocus variance is nothing more than the average variance across all r loci. If we substract this estimate from the estimate for the sampling variance associated with the average gene diversity (Equation 4), we get an estimate of the interlocus variance. This is the sampling variance associated with the “variability” in estimates among loci. This quantity can further be used to test expectations derived from the neutral theory of molecular evolution.

REFERENCE

NEI, M. 1987. Molecular Evolutionary Genetics. Columbia University Press, New York, NY, USA.


Help