class: center, middle # Basic usage of R ## Introductory Computer Programming ### Deepayan Sarkar
--- # R is a full programming language * Variables * Functions * Control flow structures * For loops, while loops * If-then-else (branching) -- * Distinguishing features * Focus on _vectors_ and _vectorized operations_ * Treatment of _functions_ at par with other object types
$$ \newcommand{\sub}{_} $$
--- # R is easily extensible * Most standard data analysis methods are already implemented -- * Can be extended by writing add-on packages -- * Thousands of add-on packages are available --- # Major concepts we will discuss * Variables (in the context of programming) -- * Data structures needed for data analyis -- * Functions (set of instructions for performing a procedure) --- layout: true # Variables * Variables are symbols that may be associated with different values * Expressions are evaluated sequentially * Computations involving variables are done using their current value --- ```r sqrt(x) ``` ``` Error: object 'x' not found ``` --- ```r x <- 10 # assignment sqrt(x) ``` ``` [1] 3.162278 ``` ```r x <- -1 sqrt(x) ``` ``` Warning in sqrt(x): NaNs produced ``` ``` [1] NaN ``` ```r x <- -1+0i sqrt(x) ``` ``` [1] 0+1i ``` --- layout: false # Data structures for data analysis * Vectors * Matrices * Data frames (a spreadsheet-like data set) * Lists (general collection of objects) --- # Atomic vectors * _Indexed_ collection of _homogeneous_ scalars, can be * Numeric / Integer / Complex * Character * Logical (`TRUE` / `FALSE`) -- * Categorical (factor) - later -- * Missing values are allowed, indicated as `NA` -- * Elements are indexed starting from 1 -- * $i$th element of vector `x` can be extracted using `x[[i]]` * There are also more sophisticated forms of (vector) indexing --- # Atomic vectors: examples ```r month.name # built-in ``` ``` [1] "January" "February" "March" "April" "May" "June" [7] "July" "August" "September" "October" "November" "December" ``` -- ```r x <- rnorm(10) x ``` ``` [1] 0.7002083 -1.7591685 0.2500991 -0.2338317 -1.8224801 2.0710403 [7] 0.6204777 -0.2136382 -1.0761660 0.4715285 ``` -- ```r x[[3]] # third element of x ``` ``` [1] 0.2500991 ``` --- # Atomic vectors: examples ```r str(x) # useful function ``` ``` num [1:10] 0.7 -1.759 0.25 -0.234 -1.822 ... ``` ```r str(month.name) ``` ``` chr [1:12] "January" "February" "March" "April" "May" "June" "July" ... ``` --- # Creating atomic vectors * Constructor functions ```r numeric(10) ``` ``` [1] 0 0 0 0 0 0 0 0 0 0 ``` ```r logical(5) ``` ``` [1] FALSE FALSE FALSE FALSE FALSE ``` ```r character(5) ``` ``` [1] "" "" "" "" "" ``` --- # Scalars are also vectors * "Scalars" are just vectors of length 1 ```r str(numeric(2)) ``` ``` num [1:2] 0 0 ``` ```r str(numeric(1)) ``` ``` num 0 ``` ```r str(0) ``` ``` num 0 ``` --- # Vectors can have zero length * Vectors can have length zero ```r numeric(0) ``` ``` numeric(0) ``` ```r logical(0) ``` ``` logical(0) ``` ```r length(character(0)) ``` ``` [1] 0 ``` -- ```r length(NULL) ``` ``` [1] 0 ``` --- # Combining vectors using `c()` * Vectors can also be created by combining smaller vectors * For example, vectors `x` and `y` can be combined using `c(x, y)` ```r c(1:5, numeric(3)) ``` ``` [1] 1 2 3 4 5 0 0 0 ``` -- * Any number of vectors can be combined * This is a common way to build up a vector using scalars ```r c(2, 4, 6, 9, 11) ``` ``` [1] 2 4 6 9 11 ``` --- # Combining vectors of different types * Atomic vectors of different types cannot be combined * Attempting to do so will convert into one of the types ```r c(1:5, c(TRUE, FALSE)) ``` ``` [1] 1 2 3 4 5 1 0 ``` ```r c(1:5, month.name[[1]]) ``` ``` [1] "1" "2" "3" "4" "5" "January" ``` -- ```r c(1:5, c(TRUE, FALSE), month.name[[1]]) ``` ``` [1] "1" "2" "3" "4" "5" "TRUE" "FALSE" [8] "January" ``` ```r c(c(1:5, TRUE, FALSE), month.name[[1]]) ``` ``` [1] "1" "2" "3" "4" "5" "1" "0" [8] "January" ``` --- # Our first dataset | year| Canada| France| India| Zimbabwe| |----:|------:|------:|------:|--------:| | 1952| 68.750| 67.410| 37.373| 48.451| | 1957| 69.960| 68.930| 40.249| 50.469| | 1962| 71.300| 70.510| 43.605| 52.358| | 1967| 72.130| 71.550| 47.193| 53.995| | 1972| 72.880| 72.380| 50.651| 55.635| | 1977| 74.210| 73.830| 54.208| 57.674| | 1982| 75.760| 74.890| 56.596| 60.363| | 1987| 76.860| 76.340| 58.553| 62.351| | 1992| 77.950| 77.460| 60.223| 60.377| | 1997| 78.610| 78.640| 61.765| 46.809| | 2002| 79.770| 79.590| 62.879| 39.989| | 2007| 80.653| 80.657| 64.698| 43.487| --- layout: true # Life Expectancy in France --- ```r year <- c(1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007) year ``` ``` [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007 ``` -- ```r lexp_france <- c(67.41, 68.93, 70.51, 71.55, 72.38, 73.83, 74.89, 76.34, 77.46, 78.64, 79.59, 80.657) lexp_france ``` ``` [1] 67.410 68.930 70.510 71.550 72.380 73.830 74.890 76.340 77.460 78.640 [11] 79.590 80.657 ``` --- ```r year <- seq(1952, 2007, by = 5) year ``` ``` [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007 ``` --- ```r plot(year, lexp_france, pch = 16) ```  --- * What next? * Has the increase per years been consistent? * How can we calculate the successive differences and plot them? --- * Option 1: straightforward (but not very efficient) ```r lexp_france[[2]] - lexp_france[[1]] ``` ``` [1] 1.52 ``` --- ```r c(lexp_france[[2]] - lexp_france[[1]], lexp_france[[3]] - lexp_france[[2]], lexp_france[[4]] - lexp_france[[3]], lexp_france[[5]] - lexp_france[[4]], lexp_france[[6]] - lexp_france[[5]], lexp_france[[7]] - lexp_france[[6]], lexp_france[[8]] - lexp_france[[7]], lexp_france[[9]] - lexp_france[[8]], lexp_france[[10]] - lexp_france[[9]], lexp_france[[11]] - lexp_france[[10]], lexp_france[[12]] - lexp_france[[11]]) ``` ``` [1] 1.520 1.580 1.040 0.830 1.450 1.060 1.450 1.120 1.180 0.950 1.067 ``` --- * Option 2: Loop ```r d <- numeric(0) ``` -- ```r for (i in 1:11) { d <- c(d, lexp_france[[i+1]] - lexp_france[[i]]) } d ``` ``` [1] 1.520 1.580 1.040 0.830 1.450 1.060 1.450 1.120 1.180 0.950 1.067 ``` --- * Option 3 (Using something we have not learned yet) ```r lexp_france[-1] - lexp_france[-12] ``` ``` [1] 1.520 1.580 1.040 0.830 1.450 1.060 1.450 1.120 1.180 0.950 1.067 ``` -- ```r diff(lexp_france) ``` ``` [1] 1.520 1.580 1.040 0.830 1.450 1.060 1.450 1.120 1.180 0.950 1.067 ``` --- ```r d <- diff(lexp_france) median(d) ``` ``` [1] 1.12 ``` ```r mean(d) ``` ``` [1] 1.204273 ``` --- ```r plot(d, pch = 16, type = "o", ylab = "difference", xlab = "period") ```  --- layout: false # Types of vector indexing * Indexing refers to extracting subsets of data * R supports several kinds of indexing: * Indexing with a vector of positive integers * Indexing with a vector of negative integers * Indexing with a logical vector * Indexing with a vector of names --- # The empty index * A vector indexing operation has the form `x[index]` -- * The most basic form is an empty index, which selects all elements ```r month.name[] ``` ``` [1] "January" "February" "March" "April" "May" "June" [7] "July" "August" "September" "October" "November" "December" ``` --- # Indexing with an integer vector * For integer indexing, `index` is an integer vector ```r month.name[c(2, 4, 6, 9, 11)] ``` ``` [1] "February" "April" "June" "September" "November" ``` -- * Elements can be repeated ```r month.name[c(2, 2, 6, 4, 6, 11)] ``` ``` [1] "February" "February" "June" "April" "June" "November" ``` --- # Indexing with an integer vector * "Out-of-bounds" indexing gives `NA` (missing) ```r month.name[13] ``` ``` [1] NA ``` -- ```r seq(1, by = 2, length.out = 8) ``` ``` [1] 1 3 5 7 9 11 13 15 ``` -- ```r month.name[seq(1, by = 2, length.out = 8)] ``` ``` [1] "January" "March" "May" "July" "September" "November" [7] NA NA ``` --- # Indexing with an integer vector * Indexing with a scalar (vector of length 1) also works: ```r month.name[2] ``` ``` [1] "February" ``` -- * This is usually the same as `x[[index]]` ```r month.name[[2]] ``` ``` [1] "February" ``` -- * However, these differ in the behaviour when an index is out of bound ```r month.name[15] ``` ``` [1] NA ``` ```r month.name[[15]] ``` ``` Error in month.name[[15]]: subscript out of bounds ``` --- # Indexing with a vector of negative integers * Negative integers omit the specified entries ```r month.name[-2] ``` ``` [1] "January" "March" "April" "May" "June" "July" [7] "August" "September" "October" "November" "December" ``` ```r month.name[-c(2, 4, 6, 9, 11)] ``` ``` [1] "January" "March" "May" "July" "August" "October" "December" ``` -- * Cannot be mixed with positive integers ```r month.name[c(2, -3)] ``` ``` Error in month.name[c(2, -3)]: only 0's may be mixed with negative subscripts ``` --- # Indexing with 0 * Zero has a special meaning - it doesn't select anything ```r month.name[0] ``` ``` character(0) ``` ```r month.name[integer(0)] ## same as empty index ``` ``` character(0) ``` ```r month.name[c(1, 2, 0, 11, 12)] ``` ``` [1] "January" "February" "November" "December" ``` ```r month.name[-c(1, 2, 0, 11, 12)] ``` ``` [1] "March" "April" "May" "June" "July" "August" [7] "September" "October" ``` --- # Indexing with a logical vector * Indexing by logical vector: select `TRUE` elements ```r month.name[c(TRUE, FALSE, FALSE)] # index recycled ``` ``` [1] "January" "April" "July" "October" ``` --- # Indexing with a logical vector * Indexing by logical vector: select `TRUE` elements ```r i <- substring(month.name, 1, 1) == "J" i ``` ``` [1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE ``` -- ```r month.name[i] ``` ``` [1] "January" "June" "July" ``` --- # Indexing with a logical vector * Typical use: extract subset satisfying a certain condition (also called "filtering") ```r (x <- rnorm(20)) ``` ``` [1] 1.28577026 0.28127947 0.63668173 -1.01986299 1.15912598 0.85372801 [7] 0.05039263 0.90714843 0.55081014 1.34409804 0.69843949 0.66927954 [13] -0.18120674 0.41814937 -1.69696802 0.23723260 -1.02732736 -0.78413752 [19] -0.98025140 -0.02905033 ``` ```r x > 0 ``` ``` [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [13] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE ``` ```r x[x > 0] ``` ``` [1] 1.28577026 0.28127947 0.63668173 1.15912598 0.85372801 0.05039263 [7] 0.90714843 0.55081014 1.34409804 0.69843949 0.66927954 0.41814937 [13] 0.23723260 ``` ```r mean(x[x > 0]) ``` ``` [1] 0.6993951 ``` --- # Converting a logical index vector to integer * Logical indexing may be replaced by integer indexing using `which()` ```r i ``` ``` [1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE ``` ```r which(i) ``` ``` [1] 1 6 7 ``` -- ```r month.name[ which(i) ] ``` ``` [1] "January" "June" "July" ``` -- ```r month.name[ -which(i) ] # same as month.name[ !i ] ``` ``` [1] "February" "March" "April" "May" "August" "September" [7] "October" "November" "December" ``` --- # Converting a logical index vector to integer * But be careful about zero-length indices ```r which(substring(month.name, 1, 1) == "B") ``` ``` integer(0) ``` ```r month.name[ which( substring(month.name, 1, 1) == "B") ] ``` ``` character(0) ``` ```r -which(substring(month.name, 1, 1) == "B") ``` ``` integer(0) ``` ```r month.name[ -which( substring(month.name, 1, 1) == "B") ] ``` ``` character(0) ``` --- layout: true # Indexing with a vector of names --- - Vectors can optionally have names — one for each element - These are usually informative labels -- - Example: quantiles of a Normal random sample ```r x <- rnorm(100) qx <- quantile(x) qx ``` ``` 0% 25% 50% 75% 100% -1.994815922 -0.806412043 -0.001141318 0.721337582 3.328321693 ``` ```r names(qx) ``` ``` [1] "0%" "25%" "50%" "75%" "100%" ``` -- ```r names(x) # no names ``` ``` NULL ``` --- - When present, names may be used to identify elements - Indexing with names works in the same way as positive integers - Instead of position, the corresponding named element is selected ```r qx[["50%"]] ## extracting a single element using scalar indexing ``` ``` [1] -0.001141318 ``` ```r qx["50%"] ## extracting a single element with vector indexing ``` ``` 50% -0.001141318 ``` ```r qx[c("25%", "75%")] ## extracting multiple elements ``` ``` 25% 75% -0.8064120 0.7213376 ``` --- * Inter-quartile range ```r diff(qx[c("25%", "75%")]) ``` ``` 75% 1.52775 ``` -- ```r IQR(x) ``` ``` [1] 1.52775 ``` --- - Unmatched names are treated like out-of-bound indexes ```r qx[["95%"]] ``` ``` Error in qx[["95%"]]: subscript out of bounds ``` ```r qx["95%"] ``` ```
NA ``` --- layout: false # Lists * Lists are vectors with arbitrary types of components -- * Individual elements can be extracted using `x[[i]]` * Vector indexing by `x[i]` also works in the usual way ??? Both scalar and vector indexing, work for lists as well. -- * A list may or may not have names * Lists with names have a special type of extraction operator: `$` -- * In practice, most lists _will_ have names --- # Example: Two-sample $t$-test * Given data: * $X\sub{1}, X\sub{2}, \dotsc, X\sub{m}$ from a population with mean $\mu\sub{X}$, standard deviation $\sigma\sub{X}$ * $Y\sub{1}, Y\sub{2}, \dotsc, Y\sub{n}$ from a population with mean $\mu\sub{Y}$, standard deviation $\sigma\sub{Y}$ * Null hypothesis $H\sub{0}: \mu\sub{X} = \mu\sub{Y}$ against alternative $H\sub{1}: \mu\sub{X} \neq \mu\sub{Y}$ -- * Homoscedastic test $$ \frac{\bar{X} - \bar{Y}}{\sqrt{S^2\sub{p} \left( \frac{1}{m} + \frac{1}{n} \right)}} $$ * Null distribution is $t$ with $m + n - 2$ degrees of freedom --- # Work out in R .scrollable500[ ```r ## start with given data, type in rest x <- c(10.25, 10.06, 10.0, 10.78, 10.56, 10.08, 10.72, 10.56, 10.66) y <- c(10.93, 10.73, 10.2, 10.72, 10.68, 10.86, 10.32, 10.18, 10.77, 10.29) m <- length(x) n <- length(y) xbar <- sum(x) / m # alt mean(x) ybar <- sum(y) / n x - xbar ``` ``` [1] -0.1577778 -0.3477778 -0.4077778 0.3722222 0.1522222 -0.3277778 0.3122222 [8] 0.1522222 0.2522222 ``` ```r sum((x - xbar)^2) ``` ``` [1] 0.7655556 ``` ```r sum((x - xbar)^2) + sum((y - ybar)^2) ``` ``` [1] 1.509316 ``` ```r Sp2 <- (sum((x - xbar)^2) + sum((y - ybar)^2)) / (m + n - 2) # careful about parentheses! Sp2 ``` ``` [1] 0.08878327 ``` ```r T <- (xbar - ybar) / sqrt(Sp2 * (1/m + 1/n)) DegOfFreedom <- m + n - 2 ## P-value? Need something new: pt() for t-distribution CDF pval <- 2 * pt(abs(T), df = DegOfFreedom, lower.tail = FALSE) T ``` ``` [1] -1.170312 ``` ```r DegOfFreedom ``` ``` [1] 17 ``` ```r pval ``` ``` [1] 0.2580209 ``` ] --- layout: true # How are lists relevant here? --- * _Solution_ consists of many bits and pieces of information * Lists can encapsulate diverse pieces of information into a single object -- * Creating lists is very simple, using the `list()` functions * Any objects can be added as components, usually with a name -- * Example ```r result <- list(means = c(xbar, ybar), sizes = c(m, n), statistic = T, alternative = "both-sided", pooled.variance = Sp2, p.value = pval) ``` --- layout: true # Lists: printing and inspecting --- ```r result ``` ``` $means [1] 10.40778 10.56800 $sizes [1] 9 10 $statistic [1] -1.170312 $alternative [1] "both-sided" $pooled.variance [1] 0.08878327 $p.value [1] 0.2580209 ``` --- ```r str(result) ``` ``` List of 6 $ means : num [1:2] 10.4 10.6 $ sizes : int [1:2] 9 10 $ statistic : num -1.17 $ alternative : chr "both-sided" $ pooled.variance: num 0.0888 $ p.value : num 0.258 ``` --- layout: false # Lists: extracting elements * Scalar indexing by `x[[index]]` ```r result[[2]] ``` ``` [1] 9 10 ``` -- ```r result[["statistic"]] ``` ``` [1] -1.170312 ``` -- * Extracting element by name using `x$name` ```r result$statistic ``` ``` [1] -1.170312 ``` --- layout: true # Lists as containers --- * Obviously, R has a built-in function to do two-sample $t$-tests ```r tt <- t.test(x, y, var.equal = TRUE) ``` -- * Lists are commonly used to return such analysis results ```r str(tt, give.attr = FALSE) ``` ``` List of 10 $ statistic : Named num -1.17 $ parameter : Named num 17 $ p.value : num 0.258 $ conf.int : num [1:2] -0.449 0.129 $ estimate : Named num [1:2] 10.4 10.6 $ null.value : Named num 0 $ stderr : num 0.137 $ alternative: chr "two.sided" $ method : chr " Two Sample t-test" $ data.name : chr "x and y" ``` --- ```r tt$p.value ``` ``` [1] 0.2580209 ``` ```r tt[["statistic"]] ``` ``` t -1.170312 ``` -- * Compare with ```r str(result) ``` ``` List of 6 $ means : num [1:2] 10.4 10.6 $ sizes : int [1:2] 9 10 $ statistic : num -1.17 $ alternative : chr "both-sided" $ pooled.variance: num 0.0888 $ p.value : num 0.258 ``` --- layout: true # Important difference: display --- ```r result ``` ``` $means [1] 10.40778 10.56800 $sizes [1] 9 10 $statistic [1] -1.170312 $alternative [1] "both-sided" $pooled.variance [1] 0.08878327 $p.value [1] 0.2580209 ``` --- ```r tt ``` ``` Two Sample t-test data: x and y t = -1.1703, df = 17, p-value = 0.258 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.4490677 0.1286233 sample estimates: mean of x mean of y 10.40778 10.56800 ``` -- * Why? * Will discuss soon --- layout: false # Data Frames * R analog of a spreadsheet * Needed for handling real-life datasets -- * Rows: usually some kind of _object_ or _unit_ that we are measuring * Columns: measurement of a particular attribute --- # Example .scrollable500[ |Manufacturer |Model |Type | Min.Price| Price| Max.Price| MPG.city| MPG.highway|AirBags |DriveTrain |Cylinders | EngineSize| Horsepower| RPM| Rev.per.mile|Man.trans.avail | Fuel.tank.capacity| Passengers| Length| Wheelbase| Width| Turn.circle| Rear.seat.room| Luggage.room| Weight|Origin |Make | |:-------------|:--------------|:-------|---------:|-----:|---------:|--------:|-----------:|:------------------|:----------|:---------|----------:|----------:|----:|------------:|:---------------|------------------:|----------:|------:|---------:|-----:|-----------:|--------------:|------------:|------:|:-------|:------------------------| |Acura |Integra |Small | 12.9| 15.9| 18.8| 25| 31|None |Front |4 | 1.8| 140| 6300| 2890|Yes | 13.2| 5| 177| 102| 68| 37| 26.5| 11| 2705|non-USA |Acura Integra | |Acura |Legend |Midsize | 29.2| 33.9| 38.7| 18| 25|Driver & Passenger |Front |6 | 3.2| 200| 5500| 2335|Yes | 18.0| 5| 195| 115| 71| 38| 30.0| 15| 3560|non-USA |Acura Legend | |Audi |90 |Compact | 25.9| 29.1| 32.3| 20| 26|Driver only |Front |6 | 2.8| 172| 5500| 2280|Yes | 16.9| 5| 180| 102| 67| 37| 28.0| 14| 3375|non-USA |Audi 90 | |Audi |100 |Midsize | 30.8| 37.7| 44.6| 19| 26|Driver & Passenger |Front |6 | 2.8| 172| 5500| 2535|Yes | 21.1| 6| 193| 106| 70| 37| 31.0| 17| 3405|non-USA |Audi 100 | |BMW |535i |Midsize | 23.7| 30.0| 36.2| 22| 30|Driver only |Rear |4 | 3.5| 208| 5700| 2545|Yes | 21.1| 4| 186| 109| 69| 39| 27.0| 13| 3640|non-USA |BMW 535i | |Buick |Century |Midsize | 14.2| 15.7| 17.3| 22| 31|Driver only |Front |4 | 2.2| 110| 5200| 2565|No | 16.4| 6| 189| 105| 69| 41| 28.0| 16| 2880|USA |Buick Century | |Buick |LeSabre |Large | 19.9| 20.8| 21.7| 19| 28|Driver only |Front |6 | 3.8| 170| 4800| 1570|No | 18.0| 6| 200| 111| 74| 42| 30.5| 17| 3470|USA |Buick LeSabre | |Buick |Roadmaster |Large | 22.6| 23.7| 24.9| 16| 25|Driver only |Rear |6 | 5.7| 180| 4000| 1320|No | 23.0| 6| 216| 116| 78| 45| 30.5| 21| 4105|USA |Buick Roadmaster | |Buick |Riviera |Midsize | 26.3| 26.3| 26.3| 19| 27|Driver only |Front |6 | 3.8| 170| 4800| 1690|No | 18.8| 5| 198| 108| 73| 41| 26.5| 14| 3495|USA |Buick Riviera | |Cadillac |DeVille |Large | 33.0| 34.7| 36.3| 16| 25|Driver only |Front |8 | 4.9| 200| 4100| 1510|No | 18.0| 6| 206| 114| 73| 43| 35.0| 18| 3620|USA |Cadillac DeVille | |Cadillac |Seville |Midsize | 37.5| 40.1| 42.7| 16| 25|Driver & Passenger |Front |8 | 4.6| 295| 6000| 1985|No | 20.0| 5| 204| 111| 74| 44| 31.0| 14| 3935|USA |Cadillac Seville | |Chevrolet |Cavalier |Compact | 8.5| 13.4| 18.3| 25| 36|None |Front |4 | 2.2| 110| 5200| 2380|Yes | 15.2| 5| 182| 101| 66| 38| 25.0| 13| 2490|USA |Chevrolet Cavalier | |Chevrolet |Corsica |Compact | 11.4| 11.4| 11.4| 25| 34|Driver only |Front |4 | 2.2| 110| 5200| 2665|Yes | 15.6| 5| 184| 103| 68| 39| 26.0| 14| 2785|USA |Chevrolet Corsica | |Chevrolet |Camaro |Sporty | 13.4| 15.1| 16.8| 19| 28|Driver & Passenger |Rear |6 | 3.4| 160| 4600| 1805|Yes | 15.5| 4| 193| 101| 74| 43| 25.0| 13| 3240|USA |Chevrolet Camaro | |Chevrolet |Lumina |Midsize | 13.4| 15.9| 18.4| 21| 29|None |Front |4 | 2.2| 110| 5200| 2595|No | 16.5| 6| 198| 108| 71| 40| 28.5| 16| 3195|USA |Chevrolet Lumina | |Chevrolet |Lumina_APV |Van | 14.7| 16.3| 18.0| 18| 23|None |Front |6 | 3.8| 170| 4800| 1690|No | 20.0| 7| 178| 110| 74| 44| 30.5| NA| 3715|USA |Chevrolet Lumina_APV | |Chevrolet |Astro |Van | 14.7| 16.6| 18.6| 15| 20|None |4WD |6 | 4.3| 165| 4000| 1790|No | 27.0| 8| 194| 111| 78| 42| 33.5| NA| 4025|USA |Chevrolet Astro | |Chevrolet |Caprice |Large | 18.0| 18.8| 19.6| 17| 26|Driver only |Rear |8 | 5.0| 170| 4200| 1350|No | 23.0| 6| 214| 116| 77| 42| 29.5| 20| 3910|USA |Chevrolet Caprice | |Chevrolet |Corvette |Sporty | 34.6| 38.0| 41.5| 17| 25|Driver only |Rear |8 | 5.7| 300| 5000| 1450|Yes | 20.0| 2| 179| 96| 74| 43| NA| NA| 3380|USA |Chevrolet Corvette | |Chrylser |Concorde |Large | 18.4| 18.4| 18.4| 20| 28|Driver & Passenger |Front |6 | 3.3| 153| 5300| 1990|No | 18.0| 6| 203| 113| 74| 40| 31.0| 15| 3515|USA |Chrylser Concorde | |Chrysler |LeBaron |Compact | 14.5| 15.8| 17.1| 23| 28|Driver & Passenger |Front |4 | 3.0| 141| 5000| 2090|No | 16.0| 6| 183| 104| 68| 41| 30.5| 14| 3085|USA |Chrysler LeBaron | |Chrysler |Imperial |Large | 29.5| 29.5| 29.5| 20| 26|Driver only |Front |6 | 3.3| 147| 4800| 1785|No | 16.0| 6| 203| 110| 69| 44| 36.0| 17| 3570|USA |Chrysler Imperial | |Dodge |Colt |Small | 7.9| 9.2| 10.6| 29| 33|None |Front |4 | 1.5| 92| 6000| 3285|Yes | 13.2| 5| 174| 98| 66| 32| 26.5| 11| 2270|USA |Dodge Colt | |Dodge |Shadow |Small | 8.4| 11.3| 14.2| 23| 29|Driver only |Front |4 | 2.2| 93| 4800| 2595|Yes | 14.0| 5| 172| 97| 67| 38| 26.5| 13| 2670|USA |Dodge Shadow | |Dodge |Spirit |Compact | 11.9| 13.3| 14.7| 22| 27|Driver only |Front |4 | 2.5| 100| 4800| 2535|Yes | 16.0| 6| 181| 104| 68| 39| 30.5| 14| 2970|USA |Dodge Spirit | |Dodge |Caravan |Van | 13.6| 19.0| 24.4| 17| 21|Driver only |4WD |6 | 3.0| 142| 5000| 1970|No | 20.0| 7| 175| 112| 72| 42| 26.5| NA| 3705|USA |Dodge Caravan | |Dodge |Dynasty |Midsize | 14.8| 15.6| 16.4| 21| 27|Driver only |Front |4 | 2.5| 100| 4800| 2465|No | 16.0| 6| 192| 105| 69| 42| 30.5| 16| 3080|USA |Dodge Dynasty | |Dodge |Stealth |Sporty | 18.5| 25.8| 33.1| 18| 24|Driver only |4WD |6 | 3.0| 300| 6000| 2120|Yes | 19.8| 4| 180| 97| 72| 40| 20.0| 11| 3805|USA |Dodge Stealth | |Eagle |Summit |Small | 7.9| 12.2| 16.5| 29| 33|None |Front |4 | 1.5| 92| 6000| 2505|Yes | 13.2| 5| 174| 98| 66| 36| 26.5| 11| 2295|USA |Eagle Summit | |Eagle |Vision |Large | 17.5| 19.3| 21.2| 20| 28|Driver & Passenger |Front |6 | 3.5| 214| 5800| 1980|No | 18.0| 6| 202| 113| 74| 40| 30.0| 15| 3490|USA |Eagle Vision | |Ford |Festiva |Small | 6.9| 7.4| 7.9| 31| 33|None |Front |4 | 1.3| 63| 5000| 3150|Yes | 10.0| 4| 141| 90| 63| 33| 26.0| 12| 1845|USA |Ford Festiva | |Ford |Escort |Small | 8.4| 10.1| 11.9| 23| 30|None |Front |4 | 1.8| 127| 6500| 2410|Yes | 13.2| 5| 171| 98| 67| 36| 28.0| 12| 2530|USA |Ford Escort | |Ford |Tempo |Compact | 10.4| 11.3| 12.2| 22| 27|None |Front |4 | 2.3| 96| 4200| 2805|Yes | 15.9| 5| 177| 100| 68| 39| 27.5| 13| 2690|USA |Ford Tempo | |Ford |Mustang |Sporty | 10.8| 15.9| 21.0| 22| 29|Driver only |Rear |4 | 2.3| 105| 4600| 2285|Yes | 15.4| 4| 180| 101| 68| 40| 24.0| 12| 2850|USA |Ford Mustang | |Ford |Probe |Sporty | 12.8| 14.0| 15.2| 24| 30|Driver only |Front |4 | 2.0| 115| 5500| 2340|Yes | 15.5| 4| 179| 103| 70| 38| 23.0| 18| 2710|USA |Ford Probe | |Ford |Aerostar |Van | 14.5| 19.9| 25.3| 15| 20|Driver only |4WD |6 | 3.0| 145| 4800| 2080|Yes | 21.0| 7| 176| 119| 72| 45| 30.0| NA| 3735|USA |Ford Aerostar | |Ford |Taurus |Midsize | 15.6| 20.2| 24.8| 21| 30|Driver only |Front |6 | 3.0| 140| 4800| 1885|No | 16.0| 5| 192| 106| 71| 40| 27.5| 18| 3325|USA |Ford Taurus | |Ford |Crown_Victoria |Large | 20.1| 20.9| 21.7| 18| 26|Driver only |Rear |8 | 4.6| 190| 4200| 1415|No | 20.0| 6| 212| 114| 78| 43| 30.0| 21| 3950|USA |Ford Crown_Victoria | |Geo |Metro |Small | 6.7| 8.4| 10.0| 46| 50|None |Front |3 | 1.0| 55| 5700| 3755|Yes | 10.6| 4| 151| 93| 63| 34| 27.5| 10| 1695|non-USA |Geo Metro | |Geo |Storm |Sporty | 11.5| 12.5| 13.5| 30| 36|Driver only |Front |4 | 1.6| 90| 5400| 3250|Yes | 12.4| 4| 164| 97| 67| 37| 24.5| 11| 2475|non-USA |Geo Storm | |Honda |Prelude |Sporty | 17.0| 19.8| 22.7| 24| 31|Driver & Passenger |Front |4 | 2.3| 160| 5800| 2855|Yes | 15.9| 4| 175| 100| 70| 39| 23.5| 8| 2865|non-USA |Honda Prelude | |Honda |Civic |Small | 8.4| 12.1| 15.8| 42| 46|Driver only |Front |4 | 1.5| 102| 5900| 2650|Yes | 11.9| 4| 173| 103| 67| 36| 28.0| 12| 2350|non-USA |Honda Civic | |Honda |Accord |Compact | 13.8| 17.5| 21.2| 24| 31|Driver & Passenger |Front |4 | 2.2| 140| 5600| 2610|Yes | 17.0| 4| 185| 107| 67| 41| 28.0| 14| 3040|non-USA |Honda Accord | |Hyundai |Excel |Small | 6.8| 8.0| 9.2| 29| 33|None |Front |4 | 1.5| 81| 5500| 2710|Yes | 11.9| 5| 168| 94| 63| 35| 26.0| 11| 2345|non-USA |Hyundai Excel | |Hyundai |Elantra |Small | 9.0| 10.0| 11.0| 22| 29|None |Front |4 | 1.8| 124| 6000| 2745|Yes | 13.7| 5| 172| 98| 66| 36| 28.0| 12| 2620|non-USA |Hyundai Elantra | |Hyundai |Scoupe |Sporty | 9.1| 10.0| 11.0| 26| 34|None |Front |4 | 1.5| 92| 5550| 2540|Yes | 11.9| 4| 166| 94| 64| 34| 23.5| 9| 2285|non-USA |Hyundai Scoupe | |Hyundai |Sonata |Midsize | 12.4| 13.9| 15.3| 20| 27|None |Front |4 | 2.0| 128| 6000| 2335|Yes | 17.2| 5| 184| 104| 69| 41| 31.0| 14| 2885|non-USA |Hyundai Sonata | |Infiniti |Q45 |Midsize | 45.4| 47.9| 50.4| 17| 22|Driver only |Rear |8 | 4.5| 278| 6000| 1955|No | 22.5| 5| 200| 113| 72| 42| 29.0| 15| 4000|non-USA |Infiniti Q45 | |Lexus |ES300 |Midsize | 27.5| 28.0| 28.4| 18| 24|Driver only |Front |6 | 3.0| 185| 5200| 2325|Yes | 18.5| 5| 188| 103| 70| 40| 27.5| 14| 3510|non-USA |Lexus ES300 | |Lexus |SC300 |Midsize | 34.7| 35.2| 35.6| 18| 23|Driver & Passenger |Rear |6 | 3.0| 225| 6000| 2510|Yes | 20.6| 4| 191| 106| 71| 39| 25.0| 9| 3515|non-USA |Lexus SC300 | |Lincoln |Continental |Midsize | 33.3| 34.3| 35.3| 17| 26|Driver & Passenger |Front |6 | 3.8| 160| 4400| 1835|No | 18.4| 6| 205| 109| 73| 42| 30.0| 19| 3695|USA |Lincoln Continental | |Lincoln |Town_Car |Large | 34.4| 36.1| 37.8| 18| 26|Driver & Passenger |Rear |8 | 4.6| 210| 4600| 1840|No | 20.0| 6| 219| 117| 77| 45| 31.5| 22| 4055|USA |Lincoln Town_Car | |Mazda |323 |Small | 7.4| 8.3| 9.1| 29| 37|None |Front |4 | 1.6| 82| 5000| 2370|Yes | 13.2| 4| 164| 97| 66| 34| 27.0| 16| 2325|non-USA |Mazda 323 | |Mazda |Protege |Small | 10.9| 11.6| 12.3| 28| 36|None |Front |4 | 1.8| 103| 5500| 2220|Yes | 14.5| 5| 172| 98| 66| 36| 26.5| 13| 2440|non-USA |Mazda Protege | |Mazda |626 |Compact | 14.3| 16.5| 18.7| 26| 34|Driver only |Front |4 | 2.5| 164| 5600| 2505|Yes | 15.5| 5| 184| 103| 69| 40| 29.5| 14| 2970|non-USA |Mazda 626 | |Mazda |MPV |Van | 16.6| 19.1| 21.7| 18| 24|None |4WD |6 | 3.0| 155| 5000| 2240|No | 19.6| 7| 190| 110| 72| 39| 27.5| NA| 3735|non-USA |Mazda MPV | |Mazda |RX-7 |Sporty | 32.5| 32.5| 32.5| 17| 25|Driver only |Rear |rotary | 1.3| 255| 6500| 2325|Yes | 20.0| 2| 169| 96| 69| 37| NA| NA| 2895|non-USA |Mazda RX-7 | |Mercedes-Benz |190E |Compact | 29.0| 31.9| 34.9| 20| 29|Driver only |Rear |4 | 2.3| 130| 5100| 2425|Yes | 14.5| 5| 175| 105| 67| 34| 26.0| 12| 2920|non-USA |Mercedes-Benz 190E | |Mercedes-Benz |300E |Midsize | 43.8| 61.9| 80.0| 19| 25|Driver & Passenger |Rear |6 | 3.2| 217| 5500| 2220|No | 18.5| 5| 187| 110| 69| 37| 27.0| 15| 3525|non-USA |Mercedes-Benz 300E | |Mercury |Capri |Sporty | 13.3| 14.1| 15.0| 23| 26|Driver only |Front |4 | 1.6| 100| 5750| 2475|Yes | 11.1| 4| 166| 95| 65| 36| 19.0| 6| 2450|USA |Mercury Capri | |Mercury |Cougar |Midsize | 14.9| 14.9| 14.9| 19| 26|None |Rear |6 | 3.8| 140| 3800| 1730|No | 18.0| 5| 199| 113| 73| 38| 28.0| 15| 3610|USA |Mercury Cougar | |Mitsubishi |Mirage |Small | 7.7| 10.3| 12.9| 29| 33|None |Front |4 | 1.5| 92| 6000| 2505|Yes | 13.2| 5| 172| 98| 67| 36| 26.0| 11| 2295|non-USA |Mitsubishi Mirage | |Mitsubishi |Diamante |Midsize | 22.4| 26.1| 29.9| 18| 24|Driver only |Front |6 | 3.0| 202| 6000| 2210|No | 19.0| 5| 190| 107| 70| 43| 27.5| 14| 3730|non-USA |Mitsubishi Diamante | |Nissan |Sentra |Small | 8.7| 11.8| 14.9| 29| 33|Driver only |Front |4 | 1.6| 110| 6000| 2435|Yes | 13.2| 5| 170| 96| 66| 33| 26.0| 12| 2545|non-USA |Nissan Sentra | |Nissan |Altima |Compact | 13.0| 15.7| 18.3| 24| 30|Driver only |Front |4 | 2.4| 150| 5600| 2130|Yes | 15.9| 5| 181| 103| 67| 40| 28.5| 14| 3050|non-USA |Nissan Altima | |Nissan |Quest |Van | 16.7| 19.1| 21.5| 17| 23|None |Front |6 | 3.0| 151| 4800| 2065|No | 20.0| 7| 190| 112| 74| 41| 27.0| NA| 4100|non-USA |Nissan Quest | |Nissan |Maxima |Midsize | 21.0| 21.5| 22.0| 21| 26|Driver only |Front |6 | 3.0| 160| 5200| 2045|No | 18.5| 5| 188| 104| 69| 41| 28.5| 14| 3200|non-USA |Nissan Maxima | |Oldsmobile |Achieva |Compact | 13.0| 13.5| 14.0| 24| 31|None |Front |4 | 2.3| 155| 6000| 2380|No | 15.2| 5| 188| 103| 67| 39| 28.0| 14| 2910|USA |Oldsmobile Achieva | |Oldsmobile |Cutlass_Ciera |Midsize | 14.2| 16.3| 18.4| 23| 31|Driver only |Front |4 | 2.2| 110| 5200| 2565|No | 16.5| 5| 190| 105| 70| 42| 28.0| 16| 2890|USA |Oldsmobile Cutlass_Ciera | |Oldsmobile |Silhouette |Van | 19.5| 19.5| 19.5| 18| 23|None |Front |6 | 3.8| 170| 4800| 1690|No | 20.0| 7| 194| 110| 74| 44| 30.5| NA| 3715|USA |Oldsmobile Silhouette | |Oldsmobile |Eighty-Eight |Large | 19.5| 20.7| 21.9| 19| 28|Driver only |Front |6 | 3.8| 170| 4800| 1570|No | 18.0| 6| 201| 111| 74| 42| 31.5| 17| 3470|USA |Oldsmobile Eighty-Eight | |Plymouth |Laser |Sporty | 11.4| 14.4| 17.4| 23| 30|None |4WD |4 | 1.8| 92| 5000| 2360|Yes | 15.9| 4| 173| 97| 67| 39| 24.5| 8| 2640|USA |Plymouth Laser | |Pontiac |LeMans |Small | 8.2| 9.0| 9.9| 31| 41|None |Front |4 | 1.6| 74| 5600| 3130|Yes | 13.2| 4| 177| 99| 66| 35| 25.5| 17| 2350|USA |Pontiac LeMans | |Pontiac |Sunbird |Compact | 9.4| 11.1| 12.8| 23| 31|None |Front |4 | 2.0| 110| 5200| 2665|Yes | 15.2| 5| 181| 101| 66| 39| 25.0| 13| 2575|USA |Pontiac Sunbird | |Pontiac |Firebird |Sporty | 14.0| 17.7| 21.4| 19| 28|Driver & Passenger |Rear |6 | 3.4| 160| 4600| 1805|Yes | 15.5| 4| 196| 101| 75| 43| 25.0| 13| 3240|USA |Pontiac Firebird | |Pontiac |Grand_Prix |Midsize | 15.4| 18.5| 21.6| 19| 27|None |Front |6 | 3.4| 200| 5000| 1890|Yes | 16.5| 5| 195| 108| 72| 41| 28.5| 16| 3450|USA |Pontiac Grand_Prix | |Pontiac |Bonneville |Large | 19.4| 24.4| 29.4| 19| 28|Driver & Passenger |Front |6 | 3.8| 170| 4800| 1565|No | 18.0| 6| 177| 111| 74| 43| 30.5| 18| 3495|USA |Pontiac Bonneville | |Saab |900 |Compact | 20.3| 28.7| 37.1| 20| 26|Driver only |Front |4 | 2.1| 140| 6000| 2910|Yes | 18.0| 5| 184| 99| 67| 37| 26.5| 14| 2775|non-USA |Saab 900 | |Saturn |SL |Small | 9.2| 11.1| 12.9| 28| 38|Driver only |Front |4 | 1.9| 85| 5000| 2145|Yes | 12.8| 5| 176| 102| 68| 40| 26.5| 12| 2495|USA |Saturn SL | |Subaru |Justy |Small | 7.3| 8.4| 9.5| 33| 37|None |4WD |3 | 1.2| 73| 5600| 2875|Yes | 9.2| 4| 146| 90| 60| 32| 23.5| 10| 2045|non-USA |Subaru Justy | |Subaru |Loyale |Small | 10.5| 10.9| 11.3| 25| 30|None |4WD |4 | 1.8| 90| 5200| 3375|Yes | 15.9| 5| 175| 97| 65| 35| 27.5| 15| 2490|non-USA |Subaru Loyale | |Subaru |Legacy |Compact | 16.3| 19.5| 22.7| 23| 30|Driver only |4WD |4 | 2.2| 130| 5600| 2330|Yes | 15.9| 5| 179| 102| 67| 37| 27.0| 14| 3085|non-USA |Subaru Legacy | |Suzuki |Swift |Small | 7.3| 8.6| 10.0| 39| 43|None |Front |3 | 1.3| 70| 6000| 3360|Yes | 10.6| 4| 161| 93| 63| 34| 27.5| 10| 1965|non-USA |Suzuki Swift | |Toyota |Tercel |Small | 7.8| 9.8| 11.8| 32| 37|Driver only |Front |4 | 1.5| 82| 5200| 3505|Yes | 11.9| 5| 162| 94| 65| 36| 24.0| 11| 2055|non-USA |Toyota Tercel | |Toyota |Celica |Sporty | 14.2| 18.4| 22.6| 25| 32|Driver only |Front |4 | 2.2| 135| 5400| 2405|Yes | 15.9| 4| 174| 99| 69| 39| 23.0| 13| 2950|non-USA |Toyota Celica | |Toyota |Camry |Midsize | 15.2| 18.2| 21.2| 22| 29|Driver only |Front |4 | 2.2| 130| 5400| 2340|Yes | 18.5| 5| 188| 103| 70| 38| 28.5| 15| 3030|non-USA |Toyota Camry | |Toyota |Previa |Van | 18.9| 22.7| 26.6| 18| 22|Driver only |4WD |4 | 2.4| 138| 5000| 2515|Yes | 19.8| 7| 187| 113| 71| 41| 35.0| NA| 3785|non-USA |Toyota Previa | |Volkswagen |Fox |Small | 8.7| 9.1| 9.5| 25| 33|None |Front |4 | 1.8| 81| 5500| 2550|Yes | 12.4| 4| 163| 93| 63| 34| 26.0| 10| 2240|non-USA |Volkswagen Fox | |Volkswagen |Eurovan |Van | 16.6| 19.7| 22.7| 17| 21|None |Front |5 | 2.5| 109| 4500| 2915|Yes | 21.1| 7| 187| 115| 72| 38| 34.0| NA| 3960|non-USA |Volkswagen Eurovan | |Volkswagen |Passat |Compact | 17.6| 20.0| 22.4| 21| 30|None |Front |4 | 2.0| 134| 5800| 2685|Yes | 18.5| 5| 180| 103| 67| 35| 31.5| 14| 2985|non-USA |Volkswagen Passat | |Volkswagen |Corrado |Sporty | 22.9| 23.3| 23.7| 18| 25|None |Front |6 | 2.8| 178| 5800| 2385|Yes | 18.5| 4| 159| 97| 66| 36| 26.0| 15| 2810|non-USA |Volkswagen Corrado | |Volvo |240 |Compact | 21.8| 22.7| 23.5| 21| 28|Driver only |Rear |4 | 2.3| 114| 5400| 2215|Yes | 15.8| 5| 190| 104| 67| 37| 29.5| 14| 2985|non-USA |Volvo 240 | |Volvo |850 |Midsize | 24.8| 26.7| 28.5| 20| 28|Driver & Passenger |Front |5 | 2.4| 168| 6200| 2310|Yes | 19.3| 5| 184| 105| 69| 38| 30.0| 15| 3245|non-USA |Volvo 850 | ] ??? In principle, each row must have a value for each column, although in practice the value can be unobserved or missing. In addition, all the values within a column must meaure the _same_ attribute, so the valus should all be of the same _type_. If these defining characteristics hold, then the dataset can be represented by a __data frame__ in R. --- layout: true # Data Frames --- * Rectangular (matrix-like) structure -- * Each column is (usually) an atomic vector * Different columns can be of different types -- * Every column must have the same length * Every column must have a name --- * Most built-in data sets in R are data frames .scrollable400[ ```r data(Cars93, package = "MASS") Cars93 ``` ``` Manufacturer Model Type Min.Price Price Max.Price MPG.city 1 Acura Integra Small 12.9 15.9 18.8 25 2 Acura Legend Midsize 29.2 33.9 38.7 18 3 Audi 90 Compact 25.9 29.1 32.3 20 4 Audi 100 Midsize 30.8 37.7 44.6 19 5 BMW 535i Midsize 23.7 30.0 36.2 22 6 Buick Century Midsize 14.2 15.7 17.3 22 7 Buick LeSabre Large 19.9 20.8 21.7 19 8 Buick Roadmaster Large 22.6 23.7 24.9 16 9 Buick Riviera Midsize 26.3 26.3 26.3 19 10 Cadillac DeVille Large 33.0 34.7 36.3 16 11 Cadillac Seville Midsize 37.5 40.1 42.7 16 12 Chevrolet Cavalier Compact 8.5 13.4 18.3 25 13 Chevrolet Corsica Compact 11.4 11.4 11.4 25 14 Chevrolet Camaro Sporty 13.4 15.1 16.8 19 15 Chevrolet Lumina Midsize 13.4 15.9 18.4 21 16 Chevrolet Lumina_APV Van 14.7 16.3 18.0 18 17 Chevrolet Astro Van 14.7 16.6 18.6 15 18 Chevrolet Caprice Large 18.0 18.8 19.6 17 19 Chevrolet Corvette Sporty 34.6 38.0 41.5 17 20 Chrylser Concorde Large 18.4 18.4 18.4 20 21 Chrysler LeBaron Compact 14.5 15.8 17.1 23 22 Chrysler Imperial Large 29.5 29.5 29.5 20 23 Dodge Colt Small 7.9 9.2 10.6 29 24 Dodge Shadow Small 8.4 11.3 14.2 23 25 Dodge Spirit Compact 11.9 13.3 14.7 22 26 Dodge Caravan Van 13.6 19.0 24.4 17 27 Dodge Dynasty Midsize 14.8 15.6 16.4 21 28 Dodge Stealth Sporty 18.5 25.8 33.1 18 29 Eagle Summit Small 7.9 12.2 16.5 29 30 Eagle Vision Large 17.5 19.3 21.2 20 31 Ford Festiva Small 6.9 7.4 7.9 31 32 Ford Escort Small 8.4 10.1 11.9 23 33 Ford Tempo Compact 10.4 11.3 12.2 22 34 Ford Mustang Sporty 10.8 15.9 21.0 22 35 Ford Probe Sporty 12.8 14.0 15.2 24 36 Ford Aerostar Van 14.5 19.9 25.3 15 37 Ford Taurus Midsize 15.6 20.2 24.8 21 38 Ford Crown_Victoria Large 20.1 20.9 21.7 18 39 Geo Metro Small 6.7 8.4 10.0 46 40 Geo Storm Sporty 11.5 12.5 13.5 30 41 Honda Prelude Sporty 17.0 19.8 22.7 24 42 Honda Civic Small 8.4 12.1 15.8 42 43 Honda Accord Compact 13.8 17.5 21.2 24 44 Hyundai Excel Small 6.8 8.0 9.2 29 45 Hyundai Elantra Small 9.0 10.0 11.0 22 46 Hyundai Scoupe Sporty 9.1 10.0 11.0 26 47 Hyundai Sonata Midsize 12.4 13.9 15.3 20 48 Infiniti Q45 Midsize 45.4 47.9 50.4 17 49 Lexus ES300 Midsize 27.5 28.0 28.4 18 50 Lexus SC300 Midsize 34.7 35.2 35.6 18 51 Lincoln Continental Midsize 33.3 34.3 35.3 17 52 Lincoln Town_Car Large 34.4 36.1 37.8 18 53 Mazda 323 Small 7.4 8.3 9.1 29 54 Mazda Protege Small 10.9 11.6 12.3 28 55 Mazda 626 Compact 14.3 16.5 18.7 26 56 Mazda MPV Van 16.6 19.1 21.7 18 57 Mazda RX-7 Sporty 32.5 32.5 32.5 17 58 Mercedes-Benz 190E Compact 29.0 31.9 34.9 20 59 Mercedes-Benz 300E Midsize 43.8 61.9 80.0 19 60 Mercury Capri Sporty 13.3 14.1 15.0 23 61 Mercury Cougar Midsize 14.9 14.9 14.9 19 62 Mitsubishi Mirage Small 7.7 10.3 12.9 29 63 Mitsubishi Diamante Midsize 22.4 26.1 29.9 18 64 Nissan Sentra Small 8.7 11.8 14.9 29 65 Nissan Altima Compact 13.0 15.7 18.3 24 66 Nissan Quest Van 16.7 19.1 21.5 17 67 Nissan Maxima Midsize 21.0 21.5 22.0 21 68 Oldsmobile Achieva Compact 13.0 13.5 14.0 24 69 Oldsmobile Cutlass_Ciera Midsize 14.2 16.3 18.4 23 70 Oldsmobile Silhouette Van 19.5 19.5 19.5 18 71 Oldsmobile Eighty-Eight Large 19.5 20.7 21.9 19 72 Plymouth Laser Sporty 11.4 14.4 17.4 23 73 Pontiac LeMans Small 8.2 9.0 9.9 31 74 Pontiac Sunbird Compact 9.4 11.1 12.8 23 75 Pontiac Firebird Sporty 14.0 17.7 21.4 19 76 Pontiac Grand_Prix Midsize 15.4 18.5 21.6 19 77 Pontiac Bonneville Large 19.4 24.4 29.4 19 78 Saab 900 Compact 20.3 28.7 37.1 20 79 Saturn SL Small 9.2 11.1 12.9 28 80 Subaru Justy Small 7.3 8.4 9.5 33 81 Subaru Loyale Small 10.5 10.9 11.3 25 82 Subaru Legacy Compact 16.3 19.5 22.7 23 83 Suzuki Swift Small 7.3 8.6 10.0 39 84 Toyota Tercel Small 7.8 9.8 11.8 32 85 Toyota Celica Sporty 14.2 18.4 22.6 25 86 Toyota Camry Midsize 15.2 18.2 21.2 22 87 Toyota Previa Van 18.9 22.7 26.6 18 88 Volkswagen Fox Small 8.7 9.1 9.5 25 89 Volkswagen Eurovan Van 16.6 19.7 22.7 17 90 Volkswagen Passat Compact 17.6 20.0 22.4 21 91 Volkswagen Corrado Sporty 22.9 23.3 23.7 18 92 Volvo 240 Compact 21.8 22.7 23.5 21 93 Volvo 850 Midsize 24.8 26.7 28.5 20 MPG.highway AirBags DriveTrain Cylinders EngineSize Horsepower 1 31 None Front 4 1.8 140 2 25 Driver & Passenger Front 6 3.2 200 3 26 Driver only Front 6 2.8 172 4 26 Driver & Passenger Front 6 2.8 172 5 30 Driver only Rear 4 3.5 208 6 31 Driver only Front 4 2.2 110 7 28 Driver only Front 6 3.8 170 8 25 Driver only Rear 6 5.7 180 9 27 Driver only Front 6 3.8 170 10 25 Driver only Front 8 4.9 200 11 25 Driver & Passenger Front 8 4.6 295 12 36 None Front 4 2.2 110 13 34 Driver only Front 4 2.2 110 14 28 Driver & Passenger Rear 6 3.4 160 15 29 None Front 4 2.2 110 16 23 None Front 6 3.8 170 17 20 None 4WD 6 4.3 165 18 26 Driver only Rear 8 5.0 170 19 25 Driver only Rear 8 5.7 300 20 28 Driver & Passenger Front 6 3.3 153 21 28 Driver & Passenger Front 4 3.0 141 22 26 Driver only Front 6 3.3 147 23 33 None Front 4 1.5 92 24 29 Driver only Front 4 2.2 93 25 27 Driver only Front 4 2.5 100 26 21 Driver only 4WD 6 3.0 142 27 27 Driver only Front 4 2.5 100 28 24 Driver only 4WD 6 3.0 300 29 33 None Front 4 1.5 92 30 28 Driver & Passenger Front 6 3.5 214 31 33 None Front 4 1.3 63 32 30 None Front 4 1.8 127 33 27 None Front 4 2.3 96 34 29 Driver only Rear 4 2.3 105 35 30 Driver only Front 4 2.0 115 36 20 Driver only 4WD 6 3.0 145 37 30 Driver only Front 6 3.0 140 38 26 Driver only Rear 8 4.6 190 39 50 None Front 3 1.0 55 40 36 Driver only Front 4 1.6 90 41 31 Driver & Passenger Front 4 2.3 160 42 46 Driver only Front 4 1.5 102 43 31 Driver & Passenger Front 4 2.2 140 44 33 None Front 4 1.5 81 45 29 None Front 4 1.8 124 46 34 None Front 4 1.5 92 47 27 None Front 4 2.0 128 48 22 Driver only Rear 8 4.5 278 49 24 Driver only Front 6 3.0 185 50 23 Driver & Passenger Rear 6 3.0 225 51 26 Driver & Passenger Front 6 3.8 160 52 26 Driver & Passenger Rear 8 4.6 210 53 37 None Front 4 1.6 82 54 36 None Front 4 1.8 103 55 34 Driver only Front 4 2.5 164 56 24 None 4WD 6 3.0 155 57 25 Driver only Rear rotary 1.3 255 58 29 Driver only Rear 4 2.3 130 59 25 Driver & Passenger Rear 6 3.2 217 60 26 Driver only Front 4 1.6 100 61 26 None Rear 6 3.8 140 62 33 None Front 4 1.5 92 63 24 Driver only Front 6 3.0 202 64 33 Driver only Front 4 1.6 110 65 30 Driver only Front 4 2.4 150 66 23 None Front 6 3.0 151 67 26 Driver only Front 6 3.0 160 68 31 None Front 4 2.3 155 69 31 Driver only Front 4 2.2 110 70 23 None Front 6 3.8 170 71 28 Driver only Front 6 3.8 170 72 30 None 4WD 4 1.8 92 73 41 None Front 4 1.6 74 74 31 None Front 4 2.0 110 75 28 Driver & Passenger Rear 6 3.4 160 76 27 None Front 6 3.4 200 77 28 Driver & Passenger Front 6 3.8 170 78 26 Driver only Front 4 2.1 140 79 38 Driver only Front 4 1.9 85 80 37 None 4WD 3 1.2 73 81 30 None 4WD 4 1.8 90 82 30 Driver only 4WD 4 2.2 130 83 43 None Front 3 1.3 70 84 37 Driver only Front 4 1.5 82 85 32 Driver only Front 4 2.2 135 86 29 Driver only Front 4 2.2 130 87 22 Driver only 4WD 4 2.4 138 88 33 None Front 4 1.8 81 89 21 None Front 5 2.5 109 90 30 None Front 4 2.0 134 91 25 None Front 6 2.8 178 92 28 Driver only Rear 4 2.3 114 93 28 Driver & Passenger Front 5 2.4 168 RPM Rev.per.mile Man.trans.avail Fuel.tank.capacity Passengers Length 1 6300 2890 Yes 13.2 5 177 2 5500 2335 Yes 18.0 5 195 3 5500 2280 Yes 16.9 5 180 4 5500 2535 Yes 21.1 6 193 5 5700 2545 Yes 21.1 4 186 6 5200 2565 No 16.4 6 189 7 4800 1570 No 18.0 6 200 8 4000 1320 No 23.0 6 216 9 4800 1690 No 18.8 5 198 10 4100 1510 No 18.0 6 206 11 6000 1985 No 20.0 5 204 12 5200 2380 Yes 15.2 5 182 13 5200 2665 Yes 15.6 5 184 14 4600 1805 Yes 15.5 4 193 15 5200 2595 No 16.5 6 198 16 4800 1690 No 20.0 7 178 17 4000 1790 No 27.0 8 194 18 4200 1350 No 23.0 6 214 19 5000 1450 Yes 20.0 2 179 20 5300 1990 No 18.0 6 203 21 5000 2090 No 16.0 6 183 22 4800 1785 No 16.0 6 203 23 6000 3285 Yes 13.2 5 174 24 4800 2595 Yes 14.0 5 172 25 4800 2535 Yes 16.0 6 181 26 5000 1970 No 20.0 7 175 27 4800 2465 No 16.0 6 192 28 6000 2120 Yes 19.8 4 180 29 6000 2505 Yes 13.2 5 174 30 5800 1980 No 18.0 6 202 31 5000 3150 Yes 10.0 4 141 32 6500 2410 Yes 13.2 5 171 33 4200 2805 Yes 15.9 5 177 34 4600 2285 Yes 15.4 4 180 35 5500 2340 Yes 15.5 4 179 36 4800 2080 Yes 21.0 7 176 37 4800 1885 No 16.0 5 192 38 4200 1415 No 20.0 6 212 39 5700 3755 Yes 10.6 4 151 40 5400 3250 Yes 12.4 4 164 41 5800 2855 Yes 15.9 4 175 42 5900 2650 Yes 11.9 4 173 43 5600 2610 Yes 17.0 4 185 44 5500 2710 Yes 11.9 5 168 45 6000 2745 Yes 13.7 5 172 46 5550 2540 Yes 11.9 4 166 47 6000 2335 Yes 17.2 5 184 48 6000 1955 No 22.5 5 200 49 5200 2325 Yes 18.5 5 188 50 6000 2510 Yes 20.6 4 191 51 4400 1835 No 18.4 6 205 52 4600 1840 No 20.0 6 219 53 5000 2370 Yes 13.2 4 164 54 5500 2220 Yes 14.5 5 172 55 5600 2505 Yes 15.5 5 184 56 5000 2240 No 19.6 7 190 57 6500 2325 Yes 20.0 2 169 58 5100 2425 Yes 14.5 5 175 59 5500 2220 No 18.5 5 187 60 5750 2475 Yes 11.1 4 166 61 3800 1730 No 18.0 5 199 62 6000 2505 Yes 13.2 5 172 63 6000 2210 No 19.0 5 190 64 6000 2435 Yes 13.2 5 170 65 5600 2130 Yes 15.9 5 181 66 4800 2065 No 20.0 7 190 67 5200 2045 No 18.5 5 188 68 6000 2380 No 15.2 5 188 69 5200 2565 No 16.5 5 190 70 4800 1690 No 20.0 7 194 71 4800 1570 No 18.0 6 201 72 5000 2360 Yes 15.9 4 173 73 5600 3130 Yes 13.2 4 177 74 5200 2665 Yes 15.2 5 181 75 4600 1805 Yes 15.5 4 196 76 5000 1890 Yes 16.5 5 195 77 4800 1565 No 18.0 6 177 78 6000 2910 Yes 18.0 5 184 79 5000 2145 Yes 12.8 5 176 80 5600 2875 Yes 9.2 4 146 81 5200 3375 Yes 15.9 5 175 82 5600 2330 Yes 15.9 5 179 83 6000 3360 Yes 10.6 4 161 84 5200 3505 Yes 11.9 5 162 85 5400 2405 Yes 15.9 4 174 86 5400 2340 Yes 18.5 5 188 87 5000 2515 Yes 19.8 7 187 88 5500 2550 Yes 12.4 4 163 89 4500 2915 Yes 21.1 7 187 90 5800 2685 Yes 18.5 5 180 91 5800 2385 Yes 18.5 4 159 92 5400 2215 Yes 15.8 5 190 93 6200 2310 Yes 19.3 5 184 Wheelbase Width Turn.circle Rear.seat.room Luggage.room Weight Origin 1 102 68 37 26.5 11 2705 non-USA 2 115 71 38 30.0 15 3560 non-USA 3 102 67 37 28.0 14 3375 non-USA 4 106 70 37 31.0 17 3405 non-USA 5 109 69 39 27.0 13 3640 non-USA 6 105 69 41 28.0 16 2880 USA 7 111 74 42 30.5 17 3470 USA 8 116 78 45 30.5 21 4105 USA 9 108 73 41 26.5 14 3495 USA 10 114 73 43 35.0 18 3620 USA 11 111 74 44 31.0 14 3935 USA 12 101 66 38 25.0 13 2490 USA 13 103 68 39 26.0 14 2785 USA 14 101 74 43 25.0 13 3240 USA 15 108 71 40 28.5 16 3195 USA 16 110 74 44 30.5 NA 3715 USA 17 111 78 42 33.5 NA 4025 USA 18 116 77 42 29.5 20 3910 USA 19 96 74 43 NA NA 3380 USA 20 113 74 40 31.0 15 3515 USA 21 104 68 41 30.5 14 3085 USA 22 110 69 44 36.0 17 3570 USA 23 98 66 32 26.5 11 2270 USA 24 97 67 38 26.5 13 2670 USA 25 104 68 39 30.5 14 2970 USA 26 112 72 42 26.5 NA 3705 USA 27 105 69 42 30.5 16 3080 USA 28 97 72 40 20.0 11 3805 USA 29 98 66 36 26.5 11 2295 USA 30 113 74 40 30.0 15 3490 USA 31 90 63 33 26.0 12 1845 USA 32 98 67 36 28.0 12 2530 USA 33 100 68 39 27.5 13 2690 USA 34 101 68 40 24.0 12 2850 USA 35 103 70 38 23.0 18 2710 USA 36 119 72 45 30.0 NA 3735 USA 37 106 71 40 27.5 18 3325 USA 38 114 78 43 30.0 21 3950 USA 39 93 63 34 27.5 10 1695 non-USA 40 97 67 37 24.5 11 2475 non-USA 41 100 70 39 23.5 8 2865 non-USA 42 103 67 36 28.0 12 2350 non-USA 43 107 67 41 28.0 14 3040 non-USA 44 94 63 35 26.0 11 2345 non-USA 45 98 66 36 28.0 12 2620 non-USA 46 94 64 34 23.5 9 2285 non-USA 47 104 69 41 31.0 14 2885 non-USA 48 113 72 42 29.0 15 4000 non-USA 49 103 70 40 27.5 14 3510 non-USA 50 106 71 39 25.0 9 3515 non-USA 51 109 73 42 30.0 19 3695 USA 52 117 77 45 31.5 22 4055 USA 53 97 66 34 27.0 16 2325 non-USA 54 98 66 36 26.5 13 2440 non-USA 55 103 69 40 29.5 14 2970 non-USA 56 110 72 39 27.5 NA 3735 non-USA 57 96 69 37 NA NA 2895 non-USA 58 105 67 34 26.0 12 2920 non-USA 59 110 69 37 27.0 15 3525 non-USA 60 95 65 36 19.0 6 2450 USA 61 113 73 38 28.0 15 3610 USA 62 98 67 36 26.0 11 2295 non-USA 63 107 70 43 27.5 14 3730 non-USA 64 96 66 33 26.0 12 2545 non-USA 65 103 67 40 28.5 14 3050 non-USA 66 112 74 41 27.0 NA 4100 non-USA 67 104 69 41 28.5 14 3200 non-USA 68 103 67 39 28.0 14 2910 USA 69 105 70 42 28.0 16 2890 USA 70 110 74 44 30.5 NA 3715 USA 71 111 74 42 31.5 17 3470 USA 72 97 67 39 24.5 8 2640 USA 73 99 66 35 25.5 17 2350 USA 74 101 66 39 25.0 13 2575 USA 75 101 75 43 25.0 13 3240 USA 76 108 72 41 28.5 16 3450 USA 77 111 74 43 30.5 18 3495 USA 78 99 67 37 26.5 14 2775 non-USA 79 102 68 40 26.5 12 2495 USA 80 90 60 32 23.5 10 2045 non-USA 81 97 65 35 27.5 15 2490 non-USA 82 102 67 37 27.0 14 3085 non-USA 83 93 63 34 27.5 10 1965 non-USA 84 94 65 36 24.0 11 2055 non-USA 85 99 69 39 23.0 13 2950 non-USA 86 103 70 38 28.5 15 3030 non-USA 87 113 71 41 35.0 NA 3785 non-USA 88 93 63 34 26.0 10 2240 non-USA 89 115 72 38 34.0 NA 3960 non-USA 90 103 67 35 31.5 14 2985 non-USA 91 97 66 36 26.0 15 2810 non-USA 92 104 67 37 29.5 14 2985 non-USA 93 105 69 38 30.0 15 3245 non-USA Make 1 Acura Integra 2 Acura Legend 3 Audi 90 4 Audi 100 5 BMW 535i 6 Buick Century 7 Buick LeSabre 8 Buick Roadmaster 9 Buick Riviera 10 Cadillac DeVille 11 Cadillac Seville 12 Chevrolet Cavalier 13 Chevrolet Corsica 14 Chevrolet Camaro 15 Chevrolet Lumina 16 Chevrolet Lumina_APV 17 Chevrolet Astro 18 Chevrolet Caprice 19 Chevrolet Corvette 20 Chrylser Concorde 21 Chrysler LeBaron 22 Chrysler Imperial 23 Dodge Colt 24 Dodge Shadow 25 Dodge Spirit 26 Dodge Caravan 27 Dodge Dynasty 28 Dodge Stealth 29 Eagle Summit 30 Eagle Vision 31 Ford Festiva 32 Ford Escort 33 Ford Tempo 34 Ford Mustang 35 Ford Probe 36 Ford Aerostar 37 Ford Taurus 38 Ford Crown_Victoria 39 Geo Metro 40 Geo Storm 41 Honda Prelude 42 Honda Civic 43 Honda Accord 44 Hyundai Excel 45 Hyundai Elantra 46 Hyundai Scoupe 47 Hyundai Sonata 48 Infiniti Q45 49 Lexus ES300 50 Lexus SC300 51 Lincoln Continental 52 Lincoln Town_Car 53 Mazda 323 54 Mazda Protege 55 Mazda 626 56 Mazda MPV 57 Mazda RX-7 58 Mercedes-Benz 190E 59 Mercedes-Benz 300E 60 Mercury Capri 61 Mercury Cougar 62 Mitsubishi Mirage 63 Mitsubishi Diamante 64 Nissan Sentra 65 Nissan Altima 66 Nissan Quest 67 Nissan Maxima 68 Oldsmobile Achieva 69 Oldsmobile Cutlass_Ciera 70 Oldsmobile Silhouette 71 Oldsmobile Eighty-Eight 72 Plymouth Laser 73 Pontiac LeMans 74 Pontiac Sunbird 75 Pontiac Firebird 76 Pontiac Grand_Prix 77 Pontiac Bonneville 78 Saab 900 79 Saturn SL 80 Subaru Justy 81 Subaru Loyale 82 Subaru Legacy 83 Suzuki Swift 84 Toyota Tercel 85 Toyota Celica 86 Toyota Camry 87 Toyota Previa 88 Volkswagen Fox 89 Volkswagen Eurovan 90 Volkswagen Passat 91 Volkswagen Corrado 92 Volvo 240 93 Volvo 850 ``` ] --- * Data frames are internally stored as lists (with constraints) .scrollable400[ ```r str(Cars93) ``` ``` 'data.frame': 93 obs. of 27 variables: $ Manufacturer : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ... $ Model : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ... $ Type : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ... $ Min.Price : num 12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ... $ Price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ... $ Max.Price : num 18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ... $ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ... $ MPG.highway : int 31 25 26 26 30 31 28 25 27 25 ... $ AirBags : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ... $ DriveTrain : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ... $ Cylinders : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ... $ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ... $ Horsepower : int 140 200 172 172 208 110 170 180 170 200 ... $ RPM : int 6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ... $ Rev.per.mile : int 2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ... $ Man.trans.avail : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ... $ Fuel.tank.capacity: num 13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ... $ Passengers : int 5 5 5 6 4 6 6 6 5 6 ... $ Length : int 177 195 180 193 186 189 200 216 198 206 ... $ Wheelbase : int 102 115 102 106 109 105 111 116 108 114 ... $ Width : int 68 71 67 70 69 69 74 78 73 73 ... $ Turn.circle : int 37 38 37 37 39 41 42 45 41 43 ... $ Rear.seat.room : num 26.5 30 28 31 27 28 30.5 30.5 26.5 35 ... $ Luggage.room : int 11 15 14 17 13 16 17 21 14 18 ... $ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ... $ Origin : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ... $ Make : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ... ``` ] --- * List-like behaviour: Columns can be extracted like a list ```r Cars93$MPG.city ``` ``` [1] 25 18 20 19 22 22 19 16 19 16 16 25 25 19 21 18 15 17 17 20 23 20 29 23 22 [26] 17 21 18 29 20 31 23 22 22 24 15 21 18 46 30 24 42 24 29 22 26 20 17 18 18 [51] 17 18 29 28 26 18 17 20 19 23 19 29 18 29 24 17 21 24 23 18 19 23 31 23 19 [76] 19 19 20 28 33 25 23 39 32 25 22 18 25 17 21 18 21 20 ``` * Vector indexing extracts multiple columns ```r head(Cars93[c(1, 4, 7)]) ``` ``` Manufacturer Min.Price MPG.city 1 Acura 12.9 25 2 Acura 29.2 18 3 Audi 25.9 20 4 Audi 30.8 19 5 BMW 23.7 22 6 Buick 14.2 22 ``` --- ```r carsub <- Cars93[c("Make", "MPG.city", "Weight", "Length", "EngineSize", "Man.trans.avail")] str(carsub) ``` ``` 'data.frame': 93 obs. of 6 variables: $ Make : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ... $ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ... $ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ... $ Length : int 177 195 180 193 186 189 200 216 198 206 ... $ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ... $ Man.trans.avail: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ... ``` --- * Two-dimensional indexing ```r carsub[1:6, ] ``` ``` Make MPG.city Weight Length EngineSize Man.trans.avail 1 Acura Integra 25 2705 177 1.8 Yes 2 Acura Legend 18 3560 195 3.2 Yes 3 Audi 90 20 3375 180 2.8 Yes 4 Audi 100 19 3405 193 2.8 Yes 5 BMW 535i 22 3640 186 3.5 Yes 6 Buick Century 22 2880 189 2.2 No ``` ```r carsub[1:6, c(1, 4, 6)] ``` ``` Make Length Man.trans.avail 1 Acura Integra 177 Yes 2 Acura Legend 195 Yes 3 Audi 90 180 Yes 4 Audi 100 193 Yes 5 BMW 535i 186 Yes 6 Buick Century 189 No ``` --- * Two-dimensional indexing ```r nrow(carsub) ``` ``` [1] 93 ``` ```r carsub[sample(nrow(carsub), 6), ] ``` ``` Make MPG.city Weight Length EngineSize Man.trans.avail 20 Chrylser Concorde 20 3515 203 3.3 No 27 Dodge Dynasty 21 3080 192 2.5 No 83 Suzuki Swift 39 1965 161 1.3 Yes 22 Chrysler Imperial 20 3570 203 3.3 No 21 Chrysler LeBaron 23 3085 183 3.0 No 54 Mazda Protege 28 2440 172 1.8 Yes ``` --- * Two-dimensional indexing ```r carsub[sample(nrow(carsub), 6), c("MPG.city", "Weight", "Length")] ``` ``` MPG.city Weight Length 49 18 3510 188 56 18 3735 190 38 18 3950 212 4 19 3405 193 71 19 3470 201 57 17 2895 169 ``` --- layout: true # Data import --- * Statistical data are usually structured like a spreadsheet (Excel, CSV) ??? Real-life data analysis start with data. Statistical data are usually in the form of a spreadsheet, and there are standard file formats that are used to store such data, such as Excel files of CSV files. -- * Typical approach: read data from spreadsheet file into data frame -- * Easiest route: * R itself cannot read Excel files directly * Save as CSV file from Excel / LibreOffice * Read with `read.csv()` or `read.table()` (more flexible) -- * Alternative: Use "Import Dataset" tool in RStudio -- * Data frames can be exported as a spreadsheet file using `write.csv()` or `write.table()` --- layout: false # Example: data export ```r data(Cars93, package = "MASS") carsub <- Cars93[c("Make", "MPG.city", "Weight", "Length", "EngineSize", "Man.trans.avail")] ``` ```r str(carsub) ``` ``` 'data.frame': 93 obs. of 6 variables: $ Make : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ... $ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ... $ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ... $ Length : int 177 195 180 193 186 189 200 216 198 206 ... $ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ... $ Man.trans.avail: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ... ``` ```r write.csv(carsub, file = "cars.csv") ``` ??? Do this in Rstudio --- # Summary * These are the most basic data input / output functions * There are many other other specialized functions * Low-level utilties: `scan()`, `readLines()`, `readChar()`, `readBin()` * Various packages provide import / export to formats used by other software * R has its own "serialization" format using `save()` and `load()` --- # Demo * Simple data analysis examples:
--- class: center middle # Questions?