R: A class of data: tables of populations and alleles

genet {ade4}

R Documentation

A class of data: tables of populations and alleles

Description

There are multiple formats of genetic data. The functions of ade4 associated genetic data use the class genet. An object of the class genet is a list containing at least one data frame whose lines are groups of individuals (populations) and columns alleles forming blocks associated with the locus. They contain allelic frequencies expressed as a percentage.
The function char2genet ensures the reading of tables crossing diploid individuals arranged by groups (populations) and polymorphic loci. Data frames containing only strings of characters are transformed in tables of allelic frequencies of the class genet. In entry a row is an individual, a variable is a locus and a value is a string of characters, for example ' 012028 ' for a heterozygote carrying alleles 012 and 028, ' 020020 ' for a homozygote carrying two alleles 020 and ' 000000 ' for a not classified locus (missing data).
The function count2genet reads data frames containing allelic countings by populations and allelic forms classified by locus.
The function freq2genet reads data frames containing allelic frequencies by populations and allelic forms classified by locus.
In these two cases, use as names of variables of strings of characters xx.yyy where xx are the names of locus and yyy a name of allelic forms in this locus. The analyses on this kind of data having to use compact labels, these functions classify the names of the populations, the names of the loci and the names of the allelic forms in vectors and re-code in a simple way starting with P for population, L for locus and 1,..., m for the alleles.

Usage

char2genet(X, pop, complete)
count2genet(PopAllCount)
freq2genet(PopAllFreq)

Arguments

`X`	a data frame of strings of characters (individuals in row, locus in variables), the value coded '000000' or two alleles of 6 characters
`pop`	a factor with the same number of rows than `df` classifying the individuals by population
`complete`	a logical value indicating a complete issue or not, by default FALSE
`PopAllCount`	a data frame containing integers: the occurrences of each allelic form (column) in each population (row)
`PopAllFreq`	a data frame containing values between 0 and 1: the frequencies of each allelic form (column) in each population (row)

Details

As a lot of formats for genetic data are published in literature, a list of class genet contains at least a table of allellic frequencies and an attribut loc.blocks. The populations (row) and the variables (column) are classified by alphabetic order. In the component comp, each individual per locus of m alleles is re-coded by a vector of length m: for hererozygicy 0,...,1,...,1,...,0 and homozygocy 0,...,2,0.

Value

char2genet returns a list of class genet with :

`$tab`	a frequencies table of poplations (row) and alleles (column)
`$center`	the global frequency of each allelic form calculated on the overall individuals classified on each locus
`$pop.names`	a vector containing the names of populations present in the data re-coded P01, P02, ...
`$all.names`	a vector containing the names of the alleles present in the data re-coded L01.1, L01.2, ...
`$loc.blocks`	a vector containing the number of alleles by loci
`$loc.fac`	a factor sharing the alleles by loci
`$loc.names`	a vector containing the names of loci present in the data re-coded L01, ..., L99
`$pop.loc`	a data frame containing the number of genus allowing the calculation of frequencies
`$comp`	the complete individual typing with the code 02000 or 01001 if the option `complete` is TRUE
`$comp.pop`	a factor indicating the population if the option `complete` is TRUE

count2genet and freq2genet return a list of class genet which don't contain the components pop.loc and complete.

Author(s)

Daniel Chessel

Examples

data(casitas)
casitas[24,]
casitas.pop <- as.factor(rep(c("dome", "cast", "musc", "casi"), c(24,11,9,30)))
casi.genet <- char2genet(casitas, casitas.pop, complete=TRUE)
names(casi.genet$tab) 
casi.genet$tab[,1:8] 
casi.genet$pop.names
casi.genet$loc.names
casi.genet$all.names
casi.genet$loc.blocks # number of allelic forms by loci
casi.genet$loc.fac # factor classifying the allelic forms by locus
casi.genet$pop.loc # table populations loci
names(casi.genet$comp)
casi.genet$comp[1:4,]
casi.genet$comp.pop
casi.genet$center
apply(casi.genet$tab,2,mean)
casi.genet$pop.loc[,"L15"]
casi.genet$tab[, c("L15.1","L15.2")]
class(casi.genet)
casitas.coa <- dudi.coa(casi.genet$comp, scannf = FALSE)
s.class(casitas.coa$li,casi.genet$comp.pop)

[Package ade4 version 1.7-4 Index]