TreeGenes :: Help :: Tutorials
Back to TutorialsGENE DIVERSITY
Gene diversity is a measure of the expected heterozygosity in a sample of gene copies collected at a single locus. It is a summary statistic used to represent patterns of molecular diversity within a sample of gene copies. Typically, the gene copies are allelic states such as allozymes or fragment sizes (e.g., RFLPs, AFLPs, microsatellites). The expected heterozygosity is caluclated under the assumption that the sample of gene copies was drawn from a population at Hardy-Weinberg equilibrium (HWE).
Under HWE, the expected heterozygosity for n gene copies represented by k alleles (= haplotypes) sampled at a single locus is:
, (Equation 1)
where
is the frequency for the ith of k alleles. Since diversity is defined by both the number of things sampled (i.e., alleles) and their evenness, it should be clear why large values of
represent very diverse samples.
Note that in a population with selfing
becomes
.
The sampling variance of this measure was formulated by Nei (1987) and is given by the following formula:
. (Equation 2)
Note that in a population with selfing the expression above becomes:
.
Gene diversity over all loci can be measured with a slight modification to the formulae above. In this instance, the summary statistic is referred to as the average gene diversity. This is given by the following formula:
, (Equation 3)
where r is the number of sampled loci and
is the sample size (i.e., number of gene copies sampled) per locus.
The sampling variance of this estimator is given by:
. (Equation 4)
This variance can also be decomposed into two additive components assuming that loci are independent. These two components were labeled by Nei and Roychoudhury (1974) as the interlocus and intralocus variances. The variance given above for gene diversity at a single locus (Equation 2) represents the intralocus variance. For r sampled loci, the total intralocus variance is nothing more than the average variance across all r loci. If we substract this estimate from the estimate for the sampling variance associated with the average gene diversity (Equation 4), we get an estimate of the interlocus variance. This is the sampling variance associated with the “variability” in estimates among loci. This quantity can further be used to test expectations derived from the neutral theory of molecular evolution.
REFERENCE
NEI, M. 1987. Molecular Evolutionary Genetics. Columbia University Press, New York, NY, USA.

