Reading data from plain text files
R reads data frames from plain text files (containing data in a
tabular form) using the function read.table(). See
help(read.table) for details. Important options:
file: file to read fromheader = TRUE/FALSE: whether columns have namessep: the column separator - typically TAB ("\t"), comma, white space
Things to keep in mind:
- character vectors are automatically converted to factors
- it is possible to specify what to interpret as missing values (usually "NA")
Another way of reading in tabular data, sometimes speedier, is to
use scan().
Reading data from an Excel spreadsheet
Spreadsheets are often the most convenient way to enter and edit tabular data. To read data from Excel spreadsheets, the safest way is to save the data as a delimited text file first, and then read it usingread.table().
Exporting data
The simplest way to export data so that it could be read in by
other software (like Excel, SAS, etc) is to write it out to a file
using write.table. It has a syntax similar to
read.table. The following options are useful:
file: file to write tosep: column separatorrow.name, col.name = TRUE/FALSE: whether to save row and column names
To explore what these options can achieve, you can write the files to the R session instead of a file (to do this, just suppress the file name argument)
data(thuesen, package = "ISwR") write.table(thuesen) write.table(thuesen, row.name = FALSE) write.table(thuesen, row.name = FALSE, sep = "\t") write.table(thuesen, row.name = FALSE, sep = ",")
The R data editor
There's a spreadsheet-like data editor for the Windows GUI version
of R, but it's not very sophisticated. To use it, do
fix(thuesen) or, to leave thuesen unmodified and save the
edited data to another variable thu2 <- edit(thuesen)
Data from `foreign' software
Most data analysis software has its own data format. The
foreign package has tools to read from a few of the
formats most commonly encountered. As biostatisticians, We may expect
to encounter data from SAS, typically in the XPORT format. Such files
can be read using the read.xport function.
R Data files
R has it's own format to save datasets (or any other R object, like functions). Once you have read in data, you might consider saving the data set in this form, and read it in again the next time you are working with it. This can be useful for two reasons:
read.tablecan be slow on large data sets, since it needs to do a lot of consistency checking. R data format files can be read in much faster.- If you have done some non-trivial manipulation of the data set after you have read it in (e.g., changed some of the variables to factors with meaningful labels), all these changes will be retained in the R data file.