<-
to give a data (or other) object its values. ->, which
points the other way, can also be used, although the assignment is now from
left to right. A very common mistake (due
to conventions that used the =
sign for both comparison and
assignment) is to mix them up in R.
> x<-2
assigns the value of 2
to x
.
> y<-c(1,2,3,4,5)
assigns the vector of values shown to y
. Note here that you
must use the c
("combine") function. However, once you have
assigned the value of y
, you may then assign its value to other
data objects
> z<-y
The cryptically named c
will also combine character strings
> names<-c("Abe","Bob","Con")
into vectors or
The underscore character '_' also acted as an assignment
oeprator until R v1.8.0. This was a
real bummer if you used underscores in place of spaces in naming
objects. Fortunately, it has been officially evicted from the pantheon of
operators, but may still bedevil users of earlier versions.
foreign
package will
import many different data formats.One of the most straightforward ways to retrieve data is through plain text. Almost all applications used for handling data will export data as a delimited file in ASCII text, and this gives us a rough and ready way to get the vast majority of data into R.
First, export the data, usually using a command like Save As...
and selecting ASCII text, CSV
or just text
.
Some spreadsheets export numeric fields with embedded
spaces. These usually are translated as factors, which is often not what you
want. Stripping out any embedded spaces with:
tr -d '\40' < old.dat > new.dat
will usually fix things up. Text editors may also be used if they have a search and replace facility, by searching for spaces and replacing them with nothing.
You may have a choice of infert.dat
that looks like this
education,age,parity,induced,case,spontaneous,stratum,pooled.stratum 0-5yrs,26,6,1,1,2,1,3 0-5yrs,42,1,1,1,0,2,1 0-5yrs,39,6,2,1,0,3,4 ...
and want to import it.
> infert<-read.table("/home/jim/infert.dat",header=T,sep=",")
What read.table
does is try to read data from the file named as the
first argument. If header
is specified as T
(True),
the first line will be read as the column names for the header
defaults to F
(False). If we had used something like TAB for a
delimiter, sep
would have been defined as a C-style
write.table()
performs the opposite transformation,
writing out an R date frame object into a
rectangular data file. There are other output options like write()
to write out a matrix to a data file, and the functions in the
foreign()
package that let you write out data in proprietary
formats.
scan()
to import the file will use less memory.
scan
isn't as easy to use, and you have to
enter the column names separately.
> infert<-data.frame(scan("/home/jim/infert.dat",list("",0,0,0,0,0,0,0),skip=1)) Read 128 lines > names(infert)<-c("education","age","parity","induced","case","spontaneous","stratum","pooled.stratum")
Note how the assignment operator was used to assign the names to the data frame.
Going beyond scan()
, there are methods to store your data in a
database table and access the table using the appropriate interface. This
enables the user to access huge amounts of data by only processing it in
bits.
For more information, see An Introduction to R: Reading data from files,
and the documentation from the