dist.genet {ade4}R Documentation

Genetic distances from gene frequencies

Description

This program computes any one of five measures of genetic distance from a set of gene frequencies in different populations with several loci.

Usage

dist.genet(genet, method = 1, diag = FALSE, upper = FALSE)

Arguments

genet a list of class genet
method an integer between 1 and 5. See details
diag a logical value indicating whether the diagonal of the distance matrix should be printed by print.dist
upper a logical value indicating whether the upper triangle of the distance matrix should be printed by print.dist

Details

Let A a table containing allelic frequencies with t populations (rows) and m alleles (columns).
Let \nu the number of loci. The locus j gets m(j) alleles. m=\sum_{j=1}^{\nu} m(j)

For the row i and the modality k of the variable j, notice the value a_{ij}^k (1 \leq i \leq t, 1 \leq j \leq \nu, 1 \leq k \leq m(j)) the value of the initial table.

a_{ij}^+=\sum_{k=1}^{m(j)}a_{ij}^k and p_{ij}^k=\frac{a_{ij}^k}{a_{ij}^+}

Let P the table of general term p_{ij}^k
p_{ij}^+=\sum_{k=1}^{m(j)}p_{ij}^k=1, p_{i+}^+=\sum_{j=1}^{\nu}p_{ij}^+=\nu, p_{++}^+=\sum_{j=1}^{\nu}p_{i+}^+=t\nu

The option method computes the distance matrices between populations using the frequencies p_{ij}^k.

1. Nei's distance:
D_1(a,b)=- \ln(\frac{\sum_{k=1}^{\nu} \sum_{j=1}^{m(k)} p_{aj}^k p_{bj}^k}{\sqrt{\sum_{k=1}^{\nu} \sum_{j=1}^{m(k)} {(p_{aj}^k) }^2}\sqrt{\sum_{k=1}^{\nu} \sum_{j=1}^{m(k)} {(p_{bj}^k)}^2}})

2. Angular distance or Edwards' distance:
D_2(a,b)=\sqrt{1-\frac{1}{\nu} \sum_{k=1}^{\nu} \sum_{j=1}^{m(k)} \sqrt{p_{aj}^k p_{bj}^k}}

3. Coancestrality coefficient or Reynolds' distance:
D_3(a,b)=\sqrt{\frac{\sum_{k=1}^{\nu} \sum_{j=1}^{m(k)}{(p_{aj}^k - p_{bj}^k)}^2}{2 \sum_{k=1}^{\nu} (1- \sum_{j=1}^{m(k)}p_{aj}^k p_{bj}^k)}}

4. Classical Euclidean distance or Rogers' distance:
D_4(a,b)=\frac{1}{\nu} \sum_{k=1}^{\nu} \sqrt{\frac{1}{2} \sum_{j=1}^{m(k)}{(p_{aj}^k - p_{bj}^k)}^2}

5. Absolute genetics distance or Provesti 's distance:
D_5(a,b)=\frac{1}{2{\nu}} \sum_{k=1}^{\nu} \sum_{j=1}^{m(k)} |p_{aj}^k - p_{bj}^k|

Value

returns a distance matrix of class dist between the rows of the data frame

Author(s)

Daniel Chessel
Anne B Dufour dufour@biomserv.univ-lyon1.fr

References

To complete informations about distances:

Distance 1:
Nei, M. (1972) Genetic distances between populations. American Naturalist, 106, 283–292.
Nei M. (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics, 23, 341–369.
Avise, J. C. (1994) Molecular markers, natural history and evolution. Chapman & Hall, London.

Distance 2:
Edwards, A.W.F. (1971) Distance between populations on the basis of gene frequencies. Biometrics, 27, 873–881.
Cavalli-Sforza L.L. and Edwards A.W.F. (1967) Phylogenetic analysis: models and estimation procedures. Evolution, 32, 550–570.
Hartl, D.L. and Clark, A.G. (1989) Principles of population genetics. Sinauer Associates, Sunderland, Massachussetts (p. 303).

Distance 3:
Reynolds, J. B., B. S. Weir, and C. C. Cockerham. (1983) Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics, 105, 767–779.

Distance 4:
Rogers, J.S. (1972) Measures of genetic similarity and genetic distances. Studies in Genetics, Univ. Texas Publ., 7213, 145–153.
Avise, J. C. (1994) Molecular markers, natural history and evolution. Chapman & Hall, London.

Distance 5:
Prevosti A. (1974) La distancia genética entre poblaciones. Miscellanea Alcobé, 68, 109–118.
Prevosti A., Oca\~na J. and Alonso G. (1975) Distances between populations of Drosophila subobscura, based on chromosome arrangements frequencies. Theoretical and Applied Genetics, 45, 231–241.

To find some useful explanations:
Sanchez-Mazas A. (2003) Cours de Génétique Moléculaire des Populations. Cours VIII Distances génétiques - Représentation des populations.
http://anthro.unige.ch/GMDP/Alicia/GMDP_dist.htm

Examples

data(casitas)
casi.genet <- char2genet(casitas,
    as.factor(rep(c("dome", "cast", "musc", "casi"), c(24,11,9,30))))
ldist <- lapply(1:5, function(method) dist.genet(casi.genet,method))
ldist
unlist(lapply(ldist, is.euclid))
kdist(ldist)

Worked out examples


> library(ade4)
> ### Name: dist.genet
> ### Title: Genetic distances from gene frequencies
> ### Aliases: dist.genet
> ### Keywords: multivariate
> 
> ### ** Examples
> 
> data(casitas)
> casi.genet <- char2genet(casitas,
+     as.factor(rep(c("dome", "cast", "musc", "casi"), c(24,11,9,30))))
> ldist <- lapply(1:5, function(method) dist.genet(casi.genet,method))
> ldist
[[1]]
          casi      cast      dome
cast 0.2863338                    
dome 0.1003991 0.3450556          
musc 1.2185291 0.5636602 1.2472719

[[2]]
          casi      cast      dome
cast 0.4378511                    
dome 0.3257588 0.5135286          
musc 0.7565759 0.6318505 0.7946422

[[3]]
          casi      cast      dome
cast 0.6596192                    
dome 0.6035993 0.7872298          
musc 0.8571692 0.7891692 0.9215769

[[4]]
          casi      cast      dome
cast 0.3413694                    
dome 0.1760324 0.3827603          
musc 0.7016184 0.4739970 0.7222744

[[5]]
          casi      cast      dome
cast 0.3547475                    
dome 0.1805556 0.3888889          
musc 0.7347222 0.5104798 0.7416667

> unlist(lapply(ldist, is.euclid))
[1] FALSE  TRUE  TRUE  TRUE  TRUE
> kdist(ldist)
List of distances matrices
call: kdist(ldist)
class: kdist
number of distances: 5
size: 4
labels:
    P1     P2     P3     P4 
"casi" "cast" "dome" "musc" 
X1: non euclidean distance
X2: euclidean distance
X3: euclidean distance
X4: euclidean distance
X5: euclidean distance
> 
> 
> 
> 

[Package ade4 Index]