Things I learned during first weeks

From: Kent (
Date: Wed Nov 30 2005 - 17:46:03 MET

  • Next message: Emmanuel Castella: "between test"

    Things I learned during my first weeks of working with R and the ADE4
    package to perform multiple correspondence analysis.

    The text is not sorted.


    a) There are many packages that perform ca and mca, not just one. Some have
    the same features and some have unique features. R comes with a number of
    abilities and built-in packages. It is also possible to download and add
    more packages. Ade4 is one of those add on packages that you need to
    download and install yourself. For me, Ade4 is the best package for my
    analysis needs.


    b) Ade4 is developed and maintained in the spirit of the French approach to
    correspondence analysis in the tradition started by Benzécri (1973) and
    described in English by, for example, Greenacre (1984).


    c) You may of course export coordinates and import them into Microsoft
    Excel. These xy coordinates are easy to display in an Excel graph. However,
    it is not possible to plot dots with labels for each dot, e.g. point xy1
    with the name “AA”, point xy2 with the name “BB” and so on.

    To do this, you need to download and install ChartLabeler, a free add-on for
    Excel. Read more here:


    d) The mca tutorial in SPSS is also available in pdf format here:


    e) The figures that you export from R can be opened and edited with, for
    example, Adobe’s Illustrator programme. If you want to edit individual dots
    or labels, do this: a) make sure you export the figure in postscript format
    (.ps)., b) Open the file in Illustrator, c) Right-click on any dot, and
    select “Ungroup”., d) Right-click again on any dot, and select “Release
    Compound path”. It is now possible to edit every item in the figure (labels,
    numbers, lines et cetera).


    f) You can use the command “area.plot” to make a map:

    data(elec88) # French map-data from 1988.

    par(mfrow = c(2,2)) # “par” is used to set graphical parameters.

    area.plot(elec88$area, cpoint = 1)

    # “cpoint”: A character size for drawing the polygons vertices.

    # “cpoint” is used with "par("cex")*cpoint'


    g) This document describes the ADE4 data-files (the original data that is
    used in the examples): “D. Chessel - Biométrie et Biologie Evolutive -
    Université Lyon1. Data de la librairie ade4.” For example the data file
    “banque”, which is a table with qualitative variabler. This data is used in
    the multiple correspondence analysis examples. 810 rows (individuals) and 21
    columns (age group, social group, gender etc).


    h) Here is an example with the banque-data (multiple corr. analysis):


    banque.acm <- dudi.acm(banque,scann=F,nf=3)

    # “scannf=F=FALSE=The eigenvalues bar plot should not be displayed.

    # “nf”: if scannf FALSE, an integer indicating the number of kept axes.

    apply(banque.acm$cr,2,mean) banque.acm$eig[1:banque.acm$nf] # the same thing

    # what does “same thing” mean? Same as what? Previous command?


    # ‘clab’ (or ‘clabel’): Character size for the labels.


    i) Input data do not have to be 0/1 data, for multiple correspondence
    analysis in ADE4. Data can also be in the form of frequency table data.


    j) Input data (to R) can be tab or space separated.


    k) File names and R: it is recommended that you avoid using underscore
    (“kents_object” or “kents-object”). Use “kents.object” instead.


    l) I use these commands, when performing a multiple correspondence analysis:

    library(ade4) # Load the package called ade4 (it will perform the multiple
    corr. analysis).

    getwd() # Get working directory (to see where you are).

    # Make sure your data file is in this directory. In this example,

    Dat.dat <- read.table("tabdata.txt", header = TRUE, colClasses = "factor")

    ls() # Shows what objects are active, i.e. “dat”.

    attributes(dat) # Row and variable names.

    str(dat) # More information about “dat”, i.e. number of rows and variables.

    kentsacm<- dudi.acm(dat) # Performing a multiple corr. analysis.

    # The results from the analysis is stored as an object called kentsacm, in
    this example.

    kentsacm=dudi.acm(dat, row.w = rep(1, nrow(dat)), scannf = TRUE, nf = 2) #
    As above.

    kentsacm # Show kentsacm and its values, e.g. eigenvalues (eig) and weights
    (cw & lw).

    kentsacm$eig # Show the subgroup “eig”, which is the eigenvalues.

    kentsacm$lw # Subgroup “lw”, row weights.

    kentsacm$cw # Subgroup “cw”, column weights.

    kentsacm$li # Subgroup “li”, row coordinates (i.e. coordinates for each row

    kentsacm$l1 # Subgroup “l1”, row normed scores.

    kentsacm$co # Subgroup “co”, column coordinates (each column plot).

    kentsacm$c1 # Subgroup “c1”, column normed scores.

    s.arrow(kentsacm$co,clab=0.75) # plots the dimensions and points.

    s.arrow(kentsacm$c1,clab=0.75) # plots the dimensions and points.


    The facts are nothing new to the ade4-list members, I know. I just wanted to
    make some contribution to the list, and not just ask for help, help and help
    all the time.

    Hopefully it gives some insight to how a complete beginner tries to learn R
    and ADE4.

    //Kent Löfgren, Umeå University


    Dept. of Educational Measurement, Umea University

    Address: Umea Univ., BVM, SE-90187 Umea


    Telephone: +46 (0)90-786 95 26

    Mobile: +46 (0)70-333 80 46




    This archive was generated by hypermail 2b30 : Wed Nov 30 2005 - 16:48:10 MET