This vignette demonstrates how to use the gpcp
package
to perform genomic prediction of cross performance using genotype and
phenotype data. This method processes data in several steps, including
loading the necessary software, converting genotype data, processing
phenotype data, fitting mixed models, and predicting cross performance
based on weighted marker effects.
The package is particularly useful for users working with polyploid
species, and it integrates with the sommer
,
AGHmatrix
, and snpStats
packages for efficient
model fitting and genomic analysis.
If you haven’t installed the gpcp
package yet, you can
do so by following these steps:
# Install devtools if you don't have it
install.packages("devtools")
# Install BiocManager in order to install VariantAnnotatiion and snpStats
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
#Install VariantAnnotation and snpStats
BiocManager::install("VariantAnnotation")
BiocManager::install("snpStats")
# Install gpcp from your local repository or GitHub
devtools::install_github("cmn92/gpcp")
The main function in this package is runGPCP()
, which
predicts the performance of genomic crosses. To run this function,
you’ll need two main input files: 1. A phenotype file, which is
typically a CSV file containing the phenotypic data. 2. A genotype file,
which can be in VCF or HapMap format.
Let’s walk through a simple example to predict cross performance using the provided phenotype and genotype data.
Before running runGPCP
, load the phenotype data from a
CSV file and specify the genotype file path.
You will need to specify several inputs such as the genotypes column, traits to predict, and other variables such as weights, fixed effects, and ploidy.
# Define inputs
genotypes <- "Accession" # Column name for genotype IDs in phenotype data
traits <- c("YIELD", "DMC") # Traits to predict
weights <- c(3, 1) # Weights for each trait
userFixed <- c("LOC", "REP") # Fixed effects
Ploidy <- 2 # Ploidy level
NCrosses <- 150 # Number of crosses to predict
Now that we have the necessary inputs, we can run the
runGPCP()
function to predict cross performance.
The output of the runGPCP()
function is a data frame
that contains the predicted cross performance. You can view the top
predicted crosses like this:
The resulting data frame contains the following columns: -
Parent1
: The first parent of the cross. -
Parent2
: The second parent of the cross. -
CrossPredictedMerit
: The predicted merit of the cross. -
P1Sex
and P2Sex
: Optional. If sex information
is provided, the sexes of the parents are included.
The runGPCP()
function performs the following steps
internally: 1. Read the genotype and phenotype data:
The genotype file is converted into a matrix of allele counts, and the
phenotype data is standardized. 2. Fit mixed models:
The sommer
package is used to fit mixed models based on
user-defined fixed and random effects. 3. Predict cross
performance: Marker effects are calculated and weighted to
predict the performance of crosses, and the best crosses are
identified.
The methodology behind the gpcp
package is based on the
following references: - Xiang, J., et al. (2016). “Mixed Model Methods
for Genomic Prediction.” Nature Genetics. - Batista, L., et
al. (2021). “Genetic Prediction and Relationship Matrices.”
Theoretical and Applied Genetics.
The gpcp
package provides a flexible and efficient
framework for predicting genomic cross performance in both diploid and
polyploid species. With its ability to handle multiple traits, fixed
effects, and random effects, this package is ideal for breeders and
geneticists looking to maximize cross potential using genomic data.