Things I learned during my first weeks of working with R and the ADE4
package to perform multiple correspondence analysis.
The text is not sorted.
a) There are many packages that perform ca and mca, not just one. Some have
the same features and some have unique features. R comes with a number of
abilities and built-in packages. It is also possible to download and add
more packages. Ade4 is one of those add on packages that you need to
download and install yourself. For me, Ade4 is the best package for my
analysis needs.
b) Ade4 is developed and maintained in the spirit of the French approach to
correspondence analysis in the tradition started by Benzécri (1973) and
described in English by, for example, Greenacre (1984).
c) You may of course export coordinates and import them into Microsoft
Excel. These xy coordinates are easy to display in an Excel graph. However,
it is not possible to plot dots with labels for each dot, e.g. point xy1
with the name “AA”, point xy2 with the name “BB” and so on.
To do this, you need to download and install ChartLabeler, a free add-on for
Excel. Read more here:
http://www.microsoft.com/office/community/sv-se/default.mspx?dg=microsoft.pu
blic.excel.charting&tid=aaf4910f-bddb-43f8-b1ad-6b9237dddcea&cat=en&lang=en&
cr=&sloc=sv-se&m=1&p=1
d) The mca tutorial in SPSS is also available in pdf format here:
http://www.csc.um.edu.mt/courses/spss/manuals/SPSS%20Categories%2013.0.pdf
e) The figures that you export from R can be opened and edited with, for
example, Adobe’s Illustrator programme. If you want to edit individual dots
or labels, do this: a) make sure you export the figure in postscript format
(.ps)., b) Open the file in Illustrator, c) Right-click on any dot, and
select “Ungroup”., d) Right-click again on any dot, and select “Release
Compound path”. It is now possible to edit every item in the figure (labels,
numbers, lines et cetera).
f) You can use the command “area.plot” to make a map:
data(elec88) # French map-data from 1988.
par(mfrow = c(2,2)) # “par” is used to set graphical parameters.
area.plot(elec88$area, cpoint = 1)
# “cpoint”: A character size for drawing the polygons vertices.
# “cpoint” is used with "par("cex")*cpoint'
g) This document describes the ADE4 data-files (the original data that is
used in the examples): “D. Chessel - Biométrie et Biologie Evolutive -
Université Lyon1. Data de la librairie ade4.” For example the data file
“banque”, which is a table with qualitative variabler. This data is used in
the multiple correspondence analysis examples. 810 rows (individuals) and 21
columns (age group, social group, gender etc).
h) Here is an example with the banque-data (multiple corr. analysis):
data(banque)
banque.acm <- dudi.acm(banque,scann=F,nf=3)
# “scannf=F=FALSE=The eigenvalues bar plot should not be displayed.
# “nf”: if scannf FALSE, an integer indicating the number of kept axes.
apply(banque.acm$cr,2,mean) banque.acm$eig[1:banque.acm$nf] # the same thing
# what does “same thing” mean? Same as what? Previous command?
s.arrow(banque.acm$c1,clab=0.75)
# ‘clab’ (or ‘clabel’): Character size for the labels.
i) Input data do not have to be 0/1 data, for multiple correspondence
analysis in ADE4. Data can also be in the form of frequency table data.
j) Input data (to R) can be tab or space separated.
k) File names and R: it is recommended that you avoid using underscore
(“kents_object” or “kents-object”). Use “kents.object” instead.
l) I use these commands, when performing a multiple correspondence analysis:
library(ade4) # Load the package called ade4 (it will perform the multiple
corr. analysis).
getwd() # Get working directory (to see where you are).
# Make sure your data file is in this directory. In this example,
“tabdata.txt”.
Dat.dat <- read.table("tabdata.txt", header = TRUE, colClasses = "factor")
ls() # Shows what objects are active, i.e. “dat”.
attributes(dat) # Row and variable names.
str(dat) # More information about “dat”, i.e. number of rows and variables.
kentsacm<- dudi.acm(dat) # Performing a multiple corr. analysis.
# The results from the analysis is stored as an object called kentsacm, in
this example.
kentsacm=dudi.acm(dat, row.w = rep(1, nrow(dat)), scannf = TRUE, nf = 2) #
As above.
kentsacm # Show kentsacm and its values, e.g. eigenvalues (eig) and weights
(cw & lw).
kentsacm$eig # Show the subgroup “eig”, which is the eigenvalues.
kentsacm$lw # Subgroup “lw”, row weights.
kentsacm$cw # Subgroup “cw”, column weights.
kentsacm$li # Subgroup “li”, row coordinates (i.e. coordinates for each row
plot).
kentsacm$l1 # Subgroup “l1”, row normed scores.
kentsacm$co # Subgroup “co”, column coordinates (each column plot).
kentsacm$c1 # Subgroup “c1”, column normed scores.
s.arrow(kentsacm$co,clab=0.75) # plots the dimensions and points.
s.arrow(kentsacm$c1,clab=0.75) # plots the dimensions and points.
The facts are nothing new to the ade4-list members, I know. I just wanted to
make some contribution to the list, and not just ask for help, help and help
all the time.
Hopefully it gives some insight to how a complete beginner tries to learn R
and ADE4.
//Kent Löfgren, Umeå University
Dept. of Educational Measurement, Umea University
Address: Umea Univ., BVM, SE-90187 Umea
SWEDEN
Telephone: +46 (0)90-786 95 26
Mobile: +46 (0)70-333 80 46
E-mail: kent.lofgren@edmeas.umu.se
URL: www.umu.se/edmeas/bologna/index_eng.html
This archive was generated by hypermail 2b30 : Wed Nov 30 2005 - 16:48:10 MET