Importing data into R

Reading data from plain text files

R reads data frames from plain text files (containing data in a tabular form) using the function read.table(). See help(read.table) for details. Important options:

Things to keep in mind:

Another way of reading in tabular data, sometimes speedier, is to use scan().

Reading data from an Excel spreadsheet

Spreadsheets are often the most convenient way to enter and edit tabular data. To read data from Excel spreadsheets, the safest way is to save the data as a delimited text file first, and then read it using read.table().

Exporting data

The simplest way to export data so that it could be read in by other software (like Excel, SAS, etc) is to write it out to a file using write.table. It has a syntax similar to read.table. The following options are useful:

To explore what these options can achieve, you can write the files to the R session instead of a file (to do this, just suppress the file name argument)

data(thuesen, package = "ISwR")
write.table(thuesen)
write.table(thuesen, row.name = FALSE)
write.table(thuesen, row.name = FALSE, sep = "\t")
write.table(thuesen, row.name = FALSE, sep = ",")

The R data editor

There's a spreadsheet-like data editor for the Windows GUI version of R, but it's not very sophisticated. To use it, do fix(thuesen) or, to leave thuesen unmodified and save the edited data to another variable thu2 <- edit(thuesen)

Data from `foreign' software

Most data analysis software has its own data format. The foreign package has tools to read from a few of the formats most commonly encountered. As biostatisticians, We may expect to encounter data from SAS, typically in the XPORT format. Such files can be read using the read.xport function.

R Data files

R has it's own format to save datasets (or any other R object, like functions). Once you have read in data, you might consider saving the data set in this form, and read it in again the next time you are working with it. This can be useful for two reasons:

Last modified: Thu Jan 21 11:47:36 PST 2010