At the traditional Archetypal Analysis (AA) [1] we have a data frame \(df\) which is a matrix \(Y\) with dimension \(n \times d\), with n the number of observations (rows) and d the number of variables (columns) or the dimension of the relevant Linear Space \(R^d\).

The output of our AA algorithm which has been implemented at the R
Package “archetypal” computes the matrices \(A\), \(B\)
such that: \[
Y\sim\,A\,B\,Y
\] or if we want to be more strict our \(kappas\times\,d\) matrix of archetypes
\(B\,Y\) is such that next squared
*Frobenius* norm is minimum \[
SSE=\|Y-A\,B\,Y \|^2 = minimum
\] \(A\) (\(n\times\,kappas\)) is a are row stochastic
matrix and \(B\) (\(kappas\times\,n\) ) is a column stochastic
one.

We also define the Variance Explained as next: \[ varexpl=\frac{SST-SSE}{SST} \] with SST being the total sum of squares for all elements of matrix \(Y\).

It is a suitable modification of \(PCHA\) algorithm, see [2], [3], which now uses the data frames without transposing them and it has full control to all external and internal parameters of it.

But there are many disadvantages at this route:

- The number of archetypes (“\(kappas\)”) is unknown
- The choice of \(kappas\) by using
the SSE plot is a very questionable process:
- The elbow or knee point usually is located at the center of the SSE-kappas curve
- If you test more kappas, then it increases

*But what really stands for the concept of archetypes?*

We know that archetypes lie on the outer bound of the Convex Hull that covers all our data points.

Now arise next questions:

- How many points in the vector space \(R^d\) are needed in order to create a
minimal Convex Hull for them which can be called “Archetypal Spanning
Convex Hull”?
- The answer is \(kappas = d + 1\)

- If we take the above as the number of archetypes then:
- What is the percentage of our data points that are covered by that hull?
- Are we satisfied with a value of 1%?
- For social sciences data sets we observe an almost ellipsoidal distribution
- That means we need many archetypes for an acceptable data coverage
- What should we do in order to increase that data coverage?

The solution to all the above problems is *Geometrical Archetypal
Analysis (GAA)*, see also the pdf version [4].

Given the data frame \(df\) and its correspondent matrix \(Y\in R^d\) we follow next steps:

- We find the minimum \(Y_{j(1)}\) and maximum \(Y_{j(\nu)}\) values for all variables \(Y_j,j=1,...,d\)
- We take the Cartesian Product: \[ \{Y_{1(1)},Y_{1(\nu)}\}\times\{Y_{2(1)},Y_{2(\nu)}\}\times\ldots\times\{Y_{d(1)},Y_{d(\nu)}\} \]
- We define the \(2^d\) vectors as
the
*Archetypal Grid*, for example if \(d=3\) we find the 8 rows of next matrix: \[ \begin{matrix}Y1,min&Y2,min&Y3,min \\Y1,max&Y2,min&Y3,min \\Y1,min&Y2,max&Y3,min \\Y1,max&Y2,max&Y3,min \\Y1,min&Y2,min&Y3,max \\Y1,max&Y2,min&Y3,max \\Y1,min&Y2,max&Y3,max \\Y1,max&Y2,max&Y3,max \\\end{matrix} \] - We create a new data frame \(dg\)
with the
*Archetypal Grid*as the first rows and the initial data frame “df” as the rest ones - We fix the \(BY\) matrix of archetypes to be the first \(2^d\) rows of \(dg\)
- We iteratively compute the \(A\)-matrix of the \(Y\sim\,A\,B\,Y\,\) decomposition
- We collect the \(\{2^{d}+1,\ldots\,n\}\) rows of that \(A\)-matrix: these are the compositions of all initial data rows as convex combinations of Grid Archetypes

Since we are sure that our archetypes cover the 100% of the data points, then we don’t care about their number so much, since what actually matters is the compositions for each point.

We create a set of random 3D data points and then we compute the
*Grid Archetypal Analysis*.

```
library(GeomArchetypal)
# Create random data
set.seed(20140519)
=matrix(runif(300) , nrow = 100, ncol=3)
dfcolnames(df)=c("x","y","z")
# Grid Archetypal
=grid_archetypal(df, diag_less = 1e-6, niter = 50, verbose = FALSE)
gaasummary(gaa)
```

```
## Number of observations / data rows n = 108
## (The first 8 rows are the Grid Archetypes)
## Dimension of data variables d = 3
## Number of Grid Archetypes k = 8
##
## Grid Archetypes:
##
## x y z
## 1 0.04107952 0.00131741 0.01868682
## 2 0.98128784 0.00131741 0.01868682
## 3 0.04107952 0.99690733 0.01868682
## 4 0.98128784 0.99690733 0.01868682
## 5 0.04107952 0.00131741 0.99566805
## 6 0.98128784 0.00131741 0.99566805
## 7 0.04107952 0.99690733 0.99566805
## 8 0.98128784 0.99690733 0.99566805
##
## Run details:
##
## SSE VarianceExplained Convergence Iterations ElapsedTime
## 1 5.951504e-06 0.9999999 TRUE 39 0.01
```

If we plot the data points and the archetypes we observe that the
*Archetypal Spanning Convex Hull* covers all our points, a fact
that can be checked by using the function
*points_inside_convex_hull()*:

`plot(gaa)`

`points_inside_convex_hull(df, gaa$grid)`

`## [1] 100`

For the same data set as before we want to set as archetypes the Closer to the Grid Archetypes just because we’d like our archetypes to belong to the data set.

```
# Closer Grid Archetypal
=closer_grid_archetypal(df,diag_less = 1e-3, niter = 70, verbose = FALSE)
cgasummary(cga)
```

```
## Number of observations / data rows n = 100
## Dimension of data variables d = 3
## Number of Closer Grid Archetypes k = 8
## The rows that formed the Closer Grid Archetypes are:
## 52,69,18,87,6,14,39,5
##
## The Closer Grid Archetypes are:
##
## x y z
## 1 0.13352227 0.15353714 0.07159086
## 2 0.75890322 0.09126450 0.01868682
## 3 0.07234262 0.79526378 0.06748076
## 4 0.88561538 0.94945905 0.10888000
## 5 0.31213541 0.12031532 0.87912934
## 6 0.96420042 0.07130253 0.77504836
## 7 0.24115414 0.88179278 0.90609350
## 8 0.89296529 0.95993797 0.94319426
##
## The Run details are:
##
## SSE VarianceExplained Convergence Iterations ElapsedTime
## 1 0.3367711 0.9965373 TRUE 26 0
```

`plot(cga)`

What is the percentage of data coverage for the *Closer Grid
Archetypes*?

```
=points_inside_convex_hull(df,cga$aa$BY)
pcprint(pc)
```

`## [1] 59`

The is a relatively high percentage compared to the usually very low values of less than 10%, especially if we seek for a small number of archetypes.

If we decide that we’ll take a specific set of rows as archetypes,
then we can use the function *fast-archetypal()* to compute the
\(A\)-matrix of convex coefficients for
the rest of data points.

It is the generic function that is used from both
*grid_archetypal()* and *closer_grid_archetypal()* for
their main computations.

Here we can use the rows we have found just before as the rows for the archetypes.

```
# Fast Archetypal
=fast_archetypal(df, irows = cga$grid_rows, diag_less = 1e-6,
faniter = 100, verbose = FALSE)
summary(fa)
```

```
## Number of observations / data rows n = 100
## Dimension of data variables d = 3
## Number of archetypes k = 8
##
## Archetypes:
##
## x y z
## 1 0.13352227 0.15353714 0.07159086
## 2 0.75890322 0.09126450 0.01868682
## 3 0.07234262 0.79526378 0.06748076
## 4 0.88561538 0.94945905 0.10888000
## 5 0.31213541 0.12031532 0.87912934
## 6 0.96420042 0.07130253 0.77504836
## 7 0.24115414 0.88179278 0.90609350
## 8 0.89296529 0.95993797 0.94319426
##
## Run details:
##
## SSE VarianceExplained Convergence Iterations ElapsedTime
## 1 0.3305205 0.9966016 TRUE 57 0
##
## Call:
## fast_archetypal(df = df, irows = cga$grid_rows, diag_less = 1e-06,
## niter = 100, verbose = FALSE)
```

`plot(fa)`

```
par(oldpar)
options(oldoptions)
```

As it was expected the results are totally identical with the previously found ones.

[1] Cutler, A., & Breiman, L. (1994). Archetypal Analysis. Technometrics, 36(4), 338–347. https://doi.org/10.1080/00401706.1994.10485840

[2] M Morup and LK Hansen, “Archetypal analysis for machine learning and data mining”, Neurocomputing (Elsevier, 2012). https://doi.org/10.1016/j.neucom.2011.06.033.

[3] Source: https://mortenmorup.dk/?page_id=2 , last accessed 2024-03-09

[4] Christopoulos, DT, (2024) “Geometrical Archetypal Analysis: One Step Beyond the Current View”, http://dx.doi.org/10.13140/RG.2.2.14030.88642 .