% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/PCA_biplot.R
\name{PCA_biplot}
\alias{PCA_biplot}
\title{The PCA biplot with loadings}
\usage{
PCA_biplot(datap, lowt = FALSE)
}
\arguments{
\item{datap}{The data set}

\item{lowt}{A parameter indicating whether lower rates of the trait
is preferred or not. For grain yield e.g. Upper values is preferred. For plant height
lower values e.g. is preferred.}
}
\value{
Returns a a list of dataframes
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#stable}{\figure{lifecycle-stable.svg}{options: alt='[Stable]'}}}{\strong{[Stable]}}
\itemize{
\item \code{PCA_biplot()} creates the PCA (Principal Component
Analysis) biplot with loadings for the new index \code{rYWAASB}
for simultaneous selection of genotypes by trait and WAASB index.
It shows \code{rYWAASB}, \code{rWAASB} and \code{rWAASBY} indices (r: ranked) in a
biplot, simultaneously for a better differentiation of genotypes.
In PCA biplots controlling the color of variable using their
contrib i.e. contributions and cos2 takes place.
}
}
\details{
\emph{PCA} is a machine learning method and dimension
reduction technique.
It is utilized to simplify large data sets by extracting
a smaller set that preserves significant patterns and
trends(1).
According to Johnson and Wichern (2007), a PCA explains
the var-covar structure of a set of variables
\loadmathjax
\mjseqn{X_1, X_2, ..., X_p} with a less \code{linear}
combinations of such variables. Moreover the common
objective of PCA is 1) data reduction and 2) interpretation.

\emph{Biplot and PCA}:
The biplot is a method used to visually represent
both the rows and columns of a data table. It involves
approximating the table using a two-dimensional matrix
product, with the aim of creating a plane that represents
the rows and columns.
The techniques used in a biplot typically involve an eigen
decomposition, similar to the one used in PCA. It is common
for the biplot to be conducted using mean-centered and scaled
data(2). For scaling variables, the data can be transformed as follow:
\mjsdeqn{z = \frac{x-\bar{x}}{s(x)}}
where 's(x)' denotes the sample standard deviation of 'x' parameter, calculated as:
\mjsdeqn{s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2}}
\emph{Algebra of PCA:}
As Johnson and Wichern (2007) stated(3), if the random vector
\mjseqn{\mathbf{X'} = \{X_1, X_2,...,X_p \}} have the
covariance matrix \mjseqn{\sum} with eigenvalues
\mjseqn{\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0}.

Regarding the linear combinations:
\mjsdeqn{Y_1 = a'_1X = a_{11}X_1 + a_{12}X_2 + ... + a_{1P}X_p }
\mjsdeqn{Y_2 = a'_2X = a_{21}X_1 + a_{22}X_2 + ... + a_{2p}X_p}
\mjsdeqn{...}
\mjsdeqn{Y_p = a'_pX = a_{p1}X_1 + a_{p2}X_2 + ... + a_{pp}X_p}

where \mjseqn{Var(Y_i) = \mathbf{a'_i\sum{a_i}}} ,
i = 1, 2, ..., p
\mjseqn{Cov(Y_i, Y_k) = \mathbf{a'_i\sum{a_k}}} ,
i, k = 1, 2, ..., p

The principal components refer to the uncorrelated
linear combinations
\mjseqn{Y_1, Y_2, ..., Y_p} which aim to have the
largest possible variances.

For the random vector \mjseqn{\mathbf{X'}=\left [ X_1, X_2, ...,
X_p \right ]},
if \mjseqn{\mathbf{\sum}} be the associated covariance matrix, then
\mjseqn{\mathbf{\sum}} have the eigenvalue-eigenvector pairs
\mjseqn{(\lambda_1, e_1), (\lambda_2, e_2), ..., (\lambda_p, e_p)},
and as said \mjseqn{\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0}.

Then the \mjseqn{\it{i}}th principal component is as follows:
\mjsdeqn{Y_i = \mathbf{e'_iX} = e_{i1}X_1 + e_{i2}X_2 + ... + e_{ip}X_p,
i = 1, 2, ..., p}, where \mjseqn{Var(Y_i) =\mathbf(e'_i\sum{e_i}) = \lambda_i,
i = 1, 2, ..., p}
\mjseqn{Cov(Y_i, Y_k) = \mathbf{e'_i\sum e_i = 0, i \not\equiv k}}, and:
\mjseqn{\sigma_{11} + \sigma_{22} + ... + \sigma_{pp} =
\sum_{i=1}^p{Var(X_i)} = \lambda_1 +
\lambda_2 + ... + \lambda_p = \sum_{i=1}^p{Var(Y_i)}}.

Interestingly, Total population variance = \mjseqn{\sigma_{11} + \sigma_{22}
+ ... + \sigma_{pp} = \lambda_1 + \lambda_2 + ... + \lambda_{p}}.

Another issues that are significant in PCA analysis are:
\enumerate{
\item The proportion of total variance due to (explained by)
the \mjseqn{\mathit{k}}th principal component:
\mjsdeqn{\frac{\lambda_k}{(\lambda_1 + \lambda_2 + ... +
\lambda_p)},  k=1, 2, ..., p}
\item The correlation coefficients between the components \mjseqn{Y_i}
and the variables \mjseqn{X_k} is as follows:
\mjseqn{\rho_{Y_i, X_k} = \frac{e_{ik}\sqrt{\lambda_i}}{\sqrt{\sigma_{kk}}}},
i,k = 1, 2, ..., p
}

Please note that PCA can be performed on \code{Covariance} or
\verb{correlation matrices}.
And before PCA the data should be centered, generally.
}
\examples{
# Case 1: for maize dataset, grain yield
\donttest{
data(maize)
PCA_biplot(maize) # or: PCA_biplot(maize, lowt = FALSE)
}
# Case 2: for days to maturity (dm) trait of chickpea
\donttest{
data(dm)
PCA_biplot(dm, lowt = TRUE)
}
}
\references{
(1) \url{https://builtin.com}

(2) \url{https://pca4ds.github.io/biplot-and-pca.html}.

(3) Johnson, R.A. and Wichern, D.W. 2007. Applied
Multivariate Statistical Analysis. Pearson Prentice
Hall. 773 p.
}
\author{
{
Ali Arminian \href{mailto:abeyran@gmail.com}{abeyran@gmail.com}
}
}
