R Language Fundamentals

# R Language Fundamentals

## Introductory Computer Programming

### Deepayan Sarkar

---

# How does R work as a language?

* We have already seen R being used in practice

* We will now try to describe some important ideas more formally

* Not that important if we only use built-in tools, but...

* This will help us to use the help system more efficiently

* It will also be useful when using R as a programming language

<div>
$$
\newcommand{\sub}{_}
$$
</div>

---

# Objects and Expressions

---

* The first important concept: objects

* A simple way to think about objects:

> _anything that __can__ be assigned to a variable is an object,
> whether or not it actually is_

---

* Objects are created by evaluating _expressions_

```r
sin(sort(4 * pi * runif(10)))
```

```
 [1]  0.02051191 -0.90356712 -0.45827622 -0.40821717  0.97407891  0.87362266
 [7] -0.92540534 -0.90390825 -0.84693594 -0.72401314
```

---

* We can assign such "objects" to variables

```r
x <- sin(sort(4 * pi * runif(10)))
print(x)
```

```
 [1]  0.9997611 -0.9059077 -0.8039393  0.9776684  0.9992587  0.7709054
 [7]  0.2058093 -0.8572744 -0.9733658 -0.9453520
```

---

* Or we could pass them as arguments, e.g., to the `plot()` function

```r
plot(sin(sort(4 * pi * runif(10))))
```

![plot of chunk unnamed-chunk-3](figures/rfundamentals-unnamed-chunk-3-1.svg)

---

* But this is itself an expression, so it should produce an object

```r
p <- plot(sin(sort(4 * pi * runif(10))))
```

* This is not a very interesting object, but is still an object

```r
print(p)
```

```
NULL
```

* Anything already defined in R (including functions and data) are objects.

---

# Mode and Class

* All objects have two fundamental properties: mode and class

---

```r
mode(x)
```

```
[1] "numeric"
```

```r
class(x)
```

```
[1] "numeric"
```

---

```r
mode(p)
```

```
[1] "NULL"
```

```r
class(p)
```

```
[1] "NULL"
```

---

```r
mode(sin)
```

```
[1] "function"
```

```r
class(sin)
```

```
[1] "function"
```

---

* The mode essentially indicates how the object is stored

* The class indicates how the object should be _interpreted_

---

* These are sometimes the same, but often they are not

```r
cars <- read.csv("cars.csv")
mode(cars)
```

```
[1] "list"
```

```r
class(cars)
```

```
[1] "data.frame"
```

---

* These are sometimes the same, but often they are not

```r
fm <- lm(MPG.city ~ Weight, cars)
mode(fm)
```

```
[1] "list"
```

```r
class(fm)
```

```
[1] "lm"
```

---

# Expression Blocks

---

* Evaluating an expression results in an object

* This object may be thought of as the "value" of the expression

* R also allows expression _blocks_

* multiple expressions enclosed within _braces_

```r
{
    expr1
    expr2
    ...
}
```

---

* Alternative calculation of binomial coefficient $n \choose r$

```r
n <- 1500
r <- 2
ans <- {
    sln <- sum(log(1:n))
    slr <- sum(log(1:r))
    slnr <- sum(log(1:(n-r)))
    log.ncr <- sln - slr - slnr
    exp(log.ncr)
}
```

* What happens when this block is evaluated?

---

* Each individual expressions inside are evaluated one by one (sequentially)

* All variable assignments are done (including all intermediate variables)

* We can now see their values

```r
slnr
```

```
[1] 9459.78
```

```r
log.ncr
```

```
[1] 13.93263
```

---

* More importantly, the entire block can _also_ be treated as a single expression

* Its value on evaluation will be the value of the _last expression_ evaluated

```r
ans
```

```
[1] 1124250
```

```r
exp(log.ncr)  # should be same as ans
```

```
[1] 1124250
```

---

# Functions as Expression Blocks

---

* Expression blocks are one of the building blocks of _functions_

* Specifically, the _body_ of a function is an expression block

* The other building block of a function are its _arguments_

* Example:

```r
nchooser <-
    function(n, r)
    {
        sln <- sum(log(1:n))
        slr <- sum(log(1:r))
        slnr <- sum(log(1:(n-r)))
        log.ncr <- sln - slr - slnr
        exp(log.ncr)
    }
```

* Here, the body is the same expression block

* The main difference is that arguments are available as variables

---

* So now we can do

```r
nchooser(20, 5)
```

```
[1] 15504
```

```r
nchooser(50, 10)
```

```
[1] 10272278170
```

* In a fresh R session, the intermediate variables will _not_ be available

* This is because the function body is not evaluated in the _global environment_

* We will discuss these details later

---

* Notice that defining a function looks like an _assignment_

```r
nchooser <- function(n, r)
{
    ...
}
```

* This is exactly what is happening

* The right hand side (starting with `function`) is an expression

* Evaluating it creates a `"function"` object which is assigned to the
  variable `nchooser`

---

# Function Arguments

---

* Functions can have zero, one or more arguments.

* All arguments must have names

* Arguments may be compulsory or optional

* Optional arguments usually have a "default" value

* When a function is called, its arguments may or may not be named
	
* Named arguments are matched by name first

* Remaining _unnamed_ arguments are matched by position

---

```r
str(lm) # see arguments of a function
```

```
function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, 
    x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, 
    offset, ...)  
```

* These two calls are equivalent

```r
fm1 <- lm(MPG.city ~ Weight, cars, (Manual == "No"))
fm2 <- lm(MPG.city ~ Weight, data = cars, method = "qr", subset = (Manual == "No"))
```

* Compare coefficients

```r
coef(fm1)
```

```
 (Intercept)       Weight 
39.173472133 -0.005670325 
```

```r
coef(fm2)
```

```
 (Intercept)       Weight 
39.173472133 -0.005670325 
```

---

# Attributes

---

* Functions usually have a useful "return" value

* The return value of `lm()` is a list

```r
str(fm2)
```

```
List of 12
 $ coefficients : Named num [1:2] 39.17347 -0.00567
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "Weight"
 $ residuals    : Named num [1:32] -0.843 -0.497 0.103 -0.356 -2.647 ...
  ..- attr(*, "names")= chr [1:32] "6" "7" "8" "9" ...
 $ effects      : Named num [1:32] -107.127 11.269 0.349 -0.224 -2.491 ...
  ..- attr(*, "names")= chr [1:32] "(Intercept)" "Weight" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:32] 22.8 19.5 15.9 19.4 18.6 ...
  ..- attr(*, "names")= chr [1:32] "6" "7" "8" "9" ...
 $ assign       : int [1:2] 0 1
 $ qr           :List of 5
  ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:32] "6" "7" "8" "9" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "Weight"
  .. ..- attr(*, "assign")= int [1:2] 0 1
  ..$ qraux: num [1:2] 1.18 1
  ..$ pivot: int [1:2] 1 2
  ..$ tol  : num 1e-07
  ..$ rank : int 2
  ..- attr(*, "class")= chr "qr"
 $ df.residual  : int 30
 $ xlevels      : Named list()
 $ call         : language lm(formula = MPG.city ~ Weight, data = cars, subset = (Manual == "No"),      method = "qr")
 $ terms        :Classes 'terms', 'formula'  language MPG.city ~ Weight
  .. ..- attr(*, "variables")= language list(MPG.city, Weight)
  .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2] "MPG.city" "Weight"
  .. .. .. ..$ : chr "Weight"
  .. ..- attr(*, "term.labels")= chr "Weight"
  .. ..- attr(*, "order")= int 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(MPG.city, Weight)
  .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:2] "MPG.city" "Weight"
 $ model        :'data.frame':	32 obs. of  2 variables:
  ..$ MPG.city: int [1:32] 22 19 16 19 16 16 21 18 15 17 ...
  ..$ Weight  : int [1:32] 2880 3470 4105 3495 3620 3935 3195 3715 4025 3910 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula'  language MPG.city ~ Weight
  .. .. ..- attr(*, "variables")= language list(MPG.city, Weight)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "MPG.city" "Weight"
  .. .. .. .. ..$ : chr "Weight"
  .. .. ..- attr(*, "term.labels")= chr "Weight"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(MPG.city, Weight)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "MPG.city" "Weight"
 - attr(*, "class")= chr "lm"
```

]

---

* The output has many references to the term `attr` which stands for _attribute_.

* The attribute at the very end (class) is special

* It happens to be same as the class property we have seen before

```r
class(fm2)
```

```
[1] "lm"
```

* To understand their link, we need to first understand attributes

---

* Example: indexing by names

```r
fm2$coefficients["Weight"]
```

```
      Weight 
-0.005670325 
```

* Names associated with a vector can be obtained using `names()`

```r
names(fm2$coefficients)
```

```
[1] "(Intercept)" "Weight"     
```

* These are actually stored with the object as an _attribute_ called
"names"

```r
attr(fm2$coefficients, "names")
```

```
[1] "(Intercept)" "Weight"     
```

---

* What exactly is an attribute?

* Arbitrary R objects that can be attached to any other object

* Typically used for programming convenience, normally not seen by users

* However, some attributes are special, like "names" and "class"

* `fm2` itself is a named list, so it also has a "names" attribute

* All attributes of an object can be extracted (as a list) using the
`attributes()` function

```r
attributes(fm2)
```

```
$names
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"

$class
[1] "lm"
```

---

* Another example: table of coefficients reported by `summary()`

```r
s <- coefficients(summary(fm2))
s
```

```
                Estimate   Std. Error   t value     Pr(>|t|)
(Intercept) 39.173472133 1.7077310021  22.93890 1.410797e-20
Weight      -0.005670325 0.0004762214 -11.90691 6.777213e-13
```

```r
attributes(s) # matrix with dimensions
```

```
$dim
[1] 2 4

$dimnames
$dimnames[[1]]
[1] "(Intercept)" "Weight"

$dimnames[[2]]
[1] "Estimate"   "Std. Error" "t value"    "Pr(>|t|)"  
```

---

* These can also be extracted using specialized functions

```r
dim(s)
```

```
[1] 2 4
```

```r
dimnames(s) # list
```

```
[[1]]
[1] "(Intercept)" "Weight"

[[2]]
[1] "Estimate"   "Std. Error" "t value"    "Pr(>|t|)"  
```

* There are specific functions to obtain row names and column names

```r
colnames(s)
```

```
[1] "Estimate"   "Std. Error" "t value"    "Pr(>|t|)"  
```

```r
rownames(s)
```

```
[1] "(Intercept)" "Weight"     
```

---

* But these are some special cases

* In genereal, attributes need to be extracted using the `attr()` function

```r
attr(s, "dimnames")
```

```
[[1]]
[1] "(Intercept)" "Weight"

[[2]]
[1] "Estimate"   "Std. Error" "t value"    "Pr(>|t|)"  
```

* Most of this is not particularly relevant for regular use

* However, the _class_ attribute is one we should be familiar with

---

# Class, Generic Functions and Methods

---

* Compare output when we print `fm1` and `fm2` (essentially same calls to `lm()`

```r
fm1
```

```

Call:
lm(formula = MPG.city ~ Weight, data = cars, subset = (Manual == 
    "No"))

Coefficients:
(Intercept)       Weight  
   39.17347     -0.00567  
```

```r
fm2
```

```

Call:
lm(formula = MPG.city ~ Weight, data = cars, subset = (Manual == 
    "No"), method = "qr")

Coefficients:
(Intercept)       Weight  
   39.17347     -0.00567  
```

---

We also extracted a table of coefficients calculated by SUMMARY

```r
s <- coefficients(summary(fm2))
s
```

```
                Estimate   Std. Error   t value     Pr(>|t|)
(Intercept) 39.173472133 1.7077310021  22.93890 1.410797e-20
Weight      -0.005670325 0.0004762214 -11.90691 6.777213e-13
```

---

* `fm2` has a class attribute but `s` does not

```r
attr(fm2, "class")
```

```
[1] "lm"
```

```r
attr(s, "class")
```

```
NULL
```

* But both of them have a "class", as reported by the `class()`
  function

```r
class(fm2)
```

```
[1] "lm"
```

```r
class(s)    # 2-element vector
```

```
[1] "matrix" "array" 
```

---

* All objects have an _implicit_ class, even if they have no class _attribute_

* The explicit class can be removed (by setting it to `NULL`)

```r
class(fm2) <- NULL  # NOT recommended!
attr(fm2, "class")
```

```
NULL
```

```r
class(fm2)
```

```
[1] "list"
```

---

* Why does it matter?

* Compare the result of printing `fm1` and `fm2`

```r
fm1
```

```

Call:
lm(formula = MPG.city ~ Weight, data = cars, subset = (Manual == 
    "No"))

Coefficients:
(Intercept)       Weight  
   39.17347     -0.00567  
```

```r
fm2
```

```
$coefficients
 (Intercept)       Weight 
39.173472133 -0.005670325

$residuals
           6            7            8            9           10           11 
-0.842936303 -0.497444588  0.103211750 -0.355686464 -2.646895847 -0.860743490 
          15           16           17           18           20           21 
-0.056783947 -0.108214977 -1.350414246 -0.002501614  0.757720034  1.319480310 
          22           26           27           30           37           38 
 1.069587906 -1.164918227 -0.708871315  0.615961911  0.680358296  1.224311384 
          48           51           52           56           59           61 
 0.507827631 -1.221621476  1.819695502  0.005191521 -0.185576716  0.296400904 
          63           66           67           68           69           70 
-0.023160103  1.074860125 -0.028432322  1.327173445  0.213766946 -0.108214977 
          71           77 
-0.497444588 -0.355686464

$effects
  (Intercept)        Weight                                           
-107.12667735   11.26947660    0.34891398   -0.22352858   -2.49147068 
                                                                      
  -0.64668477    0.01953245    0.06489332   -1.11960308    0.20690365 
                                                                      
   0.89360068    1.37532150    1.21570616   -0.99367131   -0.65396082 
                                                                      
   0.74718910    0.78087267    1.44116218    0.73398534   -1.05223594 
                                                                      
   2.05609082    0.18202259   -0.04783469    0.44996469    0.15274027 
                                                                      
   1.31963166    0.04881477    1.35044044    0.23331117    0.06489332 
                            
  -0.36994016   -0.22352858

$rank
[1] 2

$fitted.values
       6        7        8        9       10       11       15       16 
22.84294 19.49744 15.89679 19.35569 18.64690 16.86074 21.05678 18.10821 
      17       18       20       21       22       26       27       30 
16.35041 17.00250 19.24228 21.68052 18.93041 18.16492 21.70887 19.38404 
      37       38       48       51       52       56       59       61 
20.31964 16.77569 16.49217 18.22162 16.18030 17.99481 19.18558 18.70360 
      63       66       67       68       69       70       71       77 
18.02316 15.92514 21.02843 22.67283 22.78623 18.10821 19.49744 19.35569

$assign
[1] 0 1

$qr
$qr
   (Intercept)        Weight
6   -5.6568542 -2.018790e+04
7    0.1767767 -1.987448e+03
8    0.1767767  3.218775e-01
9    0.1767767  1.495123e-02
10   0.1767767  7.784595e-02
11   0.1767767  2.363407e-01
15   0.1767767 -1.359961e-01
16   0.1767767  1.256459e-01
17   0.1767767  2.816249e-01
18   0.1767767  2.237617e-01
20   0.1767767  2.501438e-02
21   0.1767767 -1.913435e-01
22   0.1767767  5.268806e-02
26   0.1767767  1.206144e-01
27   0.1767767 -1.938593e-01
30   0.1767767  1.243544e-02
37   0.1767767 -7.058560e-02
38   0.1767767  2.438880e-01
48   0.1767767  2.690459e-01
51   0.1767767  1.155828e-01
52   0.1767767  2.967196e-01
56   0.1767767  1.357091e-01
59   0.1767767  3.004596e-02
61   0.1767767  7.281437e-02
63   0.1767767  1.331933e-01
66   0.1767767  3.193617e-01
67   0.1767767 -1.334803e-01
68   0.1767767 -2.793961e-01
69   0.1767767 -2.894592e-01
70   0.1767767  1.256459e-01
71   0.1767767  2.372281e-03
77   0.1767767  1.495123e-02
attr(,"assign")
[1] 0 1

$qraux
[1] 1.176777 1.002372

$pivot
[1] 1 2

$tol
[1] 1e-07

$rank
[1] 2

attr(,"class")
[1] "qr"

$df.residual
[1] 30

$xlevels
named list()

$call
lm(formula = MPG.city ~ Weight, data = cars, subset = (Manual == 
    "No"), method = "qr")

$terms
MPG.city ~ Weight
attr(,"variables")
list(MPG.city, Weight)
attr(,"factors")
         Weight
MPG.city      0
Weight        1
attr(,"term.labels")
[1] "Weight"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(MPG.city, Weight)
attr(,"dataClasses")
 MPG.city    Weight 
"numeric" "numeric"

$model
   MPG.city Weight
6        22   2880
7        19   3470
8        16   4105
9        19   3495
10       16   3620
11       16   3935
15       21   3195
16       18   3715
17       15   4025
18       17   3910
20       20   3515
21       23   3085
22       20   3570
26       17   3705
27       21   3080
30       20   3490
37       21   3325
38       18   3950
48       17   4000
51       17   3695
52       18   4055
56       18   3735
59       19   3525
61       19   3610
63       18   3730
66       17   4100
67       21   3200
68       24   2910
69       23   2890
70       18   3715
71       19   3470
77       19   3495
```

]

---

* Or their summary outputs

```r
summary(fm1)
```

```

Call:
lm(formula = MPG.city ~ Weight, data = cars, subset = (Manual == 
    "No"))

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6469 -0.4974 -0.0258  0.6321  1.8197

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 39.1734721  1.7077310   22.94  < 2e-16 ***
Weight      -0.0056703  0.0004762  -11.91 6.78e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9465 on 30 degrees of freedom
Multiple R-squared:  0.8254,	Adjusted R-squared:  0.8195 
F-statistic: 141.8 on 1 and 30 DF,  p-value: 6.777e-13
```

```r
summary(fm2)
```

```
              Length Class      Mode   
coefficients   2     -none-     numeric
residuals     32     -none-     numeric
effects       32     -none-     numeric
rank           1     -none-     numeric
fitted.values 32     -none-     numeric
assign         2     -none-     numeric
qr             5     qr         list   
df.residual    1     -none-     numeric
xlevels        0     -none-     list   
call           5     -none-     call   
terms          3     terms      call   
model          2     data.frame list   
```

]

---

* _Class_ is mainly used to implement a form of _object-oriented_ programming in R

* The design is based on the concept of _generic_ functions and _methods_

* A generic function is a special kind of R function

* Intended to perform a _general_ task (printing, plotting, summarizing, etc.)

* For example, to summarize an object, we should call the `summary()` function

* This should work for a vector, a data frame, or a model fit

---

* These calls to `summary()` produce completely different types of output

```r
summary(fm1)    # class "lm"
```

```

Call:
lm(formula = MPG.city ~ Weight, data = cars, subset = (Manual == 
    "No"))

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6469 -0.4974 -0.0258  0.6321  1.8197

Residual standard error: 0.9465 on 30 degrees of freedom
Multiple R-squared:  0.8254,	Adjusted R-squared:  0.8195 
F-statistic: 141.8 on 1 and 30 DF,  p-value: 6.777e-13
```

---

* These calls to `summary()` produce completely different types of output

```r
summary(cars) # class "data.frame"
```

```
     Make              MPG.city         Weight         Length     
 Length:93          Min.   :15.00   Min.   :1695   Min.   :141.0  
 Class :character   1st Qu.:18.00   1st Qu.:2620   1st Qu.:174.0  
 Mode  :character   Median :21.00   Median :3040   Median :183.0  
                    Mean   :22.37   Mean   :3073   Mean   :183.2  
                    3rd Qu.:25.00   3rd Qu.:3525   3rd Qu.:192.0  
                    Max.   :46.00   Max.   :4105   Max.   :219.0  
  Engine.Size       Manual         
 Min.   :1.000   Length:93         
 1st Qu.:1.800   Class :character  
 Median :2.400   Mode  :character  
 Mean   :2.668                     
 3rd Qu.:3.300                     
 Max.   :5.700                     
```

---

* These customized outputs are created using _methods_

* Methods are specific implementations of a generic function customized input type

* Generic function (such as `summary()` chooses appropriate method based on "class"

---

* The lookup mechanism is conceptually very simple

* All available methods for the generic function `summary()` can be listed by

```r
methods(summary)
```

```
 [1] summary.aov                         summary.aovlist*                   
 [3] summary.aspell*                     summary.check_packages_in_dir*     
 [5] summary.connection                  summary.data.frame                 
 [7] summary.Date                        summary.default                    
 [9] summary.ecdf*                       summary.factor                     
[11] summary.glm                         summary.infl*                      
[13] summary.lm                          summary.loess*                     
[15] summary.manova                      summary.matrix                     
[17] summary.mlm*                        summary.nls*                       
[19] summary.packageStatus*              summary.POSIXct                    
[21] summary.POSIXlt                     summary.ppr*                       
[23] summary.prcomp*                     summary.princomp*                  
[25] summary.proc_time                   summary.rlang_error*               
[27] summary.rlang_message*              summary.rlang_trace*               
[29] summary.rlang_warning*              summary.rlang:::list_of_conditions*
[31] summary.srcfile                     summary.srcref                     
[33] summary.stepfun                     summary.stl*                       
[35] summary.table                       summary.tukeysmooth*               
[37] summary.vctrs_sclr*                 summary.vctrs_vctr*                
[39] summary.warnings                   
see '?methods' for accessing help and source code
```

---

* We can also list all methods listed for the `"lm"` class

```r
methods(class = "lm")
```

```
 [1] add1           alias          anova          case.names     coerce        
 [6] confint        cooks.distance deviance       dfbeta         dfbetas       
[11] drop1          dummy.coef     effects        extractAIC     family        
[16] formula        hatvalues      influence      initialize     kappa         
[21] labels         logLik         model.frame    model.matrix   nobs          
[26] plot           predict        print          proj           qr            
[31] residuals      rstandard      rstudent       show           simulate      
[36] slotsFromS3    summary        variable.names vcov          
see '?methods' for accessing help and source code
```

* `fm2` no longer has class `"lm"`, so `summary(fm2)` will not call `summary.lm()`

* Instead, `summary.default)` is used (because there is no `summary.list()` )

---

# The Help System in R

* R has an extensive collection of functions

* Help system essential to make use of them

* Documentation can be accessed using the `help()` function, or the `?` operator

* RStudio has a nice searchable interface

* Examples:

```r
?pnorm
?seq
?summary
?summary.lm
```

---

# Help for Generic Functions and Methods

* To get help for the generic function `summary()`, type `?summary`

* To get help for the `summary()` method for "matrix" objects, type `?summary.matrix`

* To get help for the `summary()` method for "lm" objects, type `?summary.lm`

* The first two happen to be the same help page, but the third is different

* This style of object-oriented programming is called S3 (short for "S version 3")

* There are other styles which we will not discuss now

---

# Replacement Functions

---

* Recall that we removed the class attribute of `fm2` using

```r
class(fm2) <- NULL
```

* We can similarly restore it using

```r
class(fm2) <- "lm"
```

* This expression looks like an assignment, but the LHS is a function call (not a variable)!

* This is a feature of R that is not usually found in other languages

---

* Why are replacement functions needed?

* Consider the following function

$$
f(x) = 
\begin{cases}
x & \text{if } x > 0 \cr
0 & \text{otherwise}
\end{cases}
$$

* This function is useful in many contexts

* In the context of neural networks, it is called the _rectified
  linear unit_ or _ReLU_ function

---

* Suppose we want to implement this in Python and R

```python
def ReLU(u):
    for i in range(len(u)):
        if u[i] < 0:
            u[i] = 0
```

```r
ReLU <- function(u) {
    for (i in seq_len(length(u))) {
        if (u[i] < 0)
            u[i] <- 0
    }
}
```

---

* Python: simulate `x` and make a copy called `y`

```python
from numpy import *
x = random.normal(size = 10)
y = x
x
```

```
array([ 0.1596582 ,  0.28510536,  2.61657224, -1.28915735, -0.32343814,
       -1.20063915, -0.99459842, -0.63582348,  0.43606861,  0.58840014])
```

```python
y
```

```
array([ 0.1596582 ,  0.28510536,  2.61657224, -1.28915735, -0.32343814,
       -1.20063915, -0.99459842, -0.63582348,  0.43606861,  0.58840014])
```

---

* Python: Call `ReLU(x)` and inspect `x` and `y`

```python
ReLU(y)
```

```python
x
```

```
array([0.1596582 , 0.28510536, 2.61657224, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.43606861, 0.58840014])
```

```python
y
```

```
array([0.1596582 , 0.28510536, 2.61657224, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.43606861, 0.58840014])
```

---

* R: simulate `x` and make a copy called `y`

```r
x <- rnorm(10)
y <- x
x
```

```
 [1] -0.05495269 -0.61783158 -1.77035032  0.07690820 -1.04666927 -0.50438528
 [7] -0.85297871 -1.24811182 -1.16928925 -0.04325258
```

```r
y
```

```
 [1] -0.05495269 -0.61783158 -1.77035032  0.07690820 -1.04666927 -0.50438528
 [7] -0.85297871 -1.24811182 -1.16928925 -0.04325258
```

---

* R: Call `ReLU(y)` and inspect `x` and `y`

```r
ReLU(y)
```

```r
x
```

```
 [1] -0.05495269 -0.61783158 -1.77035032  0.07690820 -1.04666927 -0.50438528
 [7] -0.85297871 -1.24811182 -1.16928925 -0.04325258
```

```r
y
```

```
 [1] -0.05495269 -0.61783158 -1.77035032  0.07690820 -1.04666927 -0.50438528
 [7] -0.85297871 -1.24811182 -1.16928925 -0.04325258
```

---

* R: Try again --- return modified value and save

```r
ReLU <- function(u) {
    for (i in seq_len(length(u))) {
        if (u[i] < 0)
            u[i] <- 0
    }
    u
}
z <- ReLU(y)
x
```

```
 [1] -0.05495269 -0.61783158 -1.77035032  0.07690820 -1.04666927 -0.50438528
 [7] -0.85297871 -1.24811182 -1.16928925 -0.04325258
```

```r
y
```

```
 [1] -0.05495269 -0.61783158 -1.77035032  0.07690820 -1.04666927 -0.50438528
 [7] -0.85297871 -1.24811182 -1.16928925 -0.04325258
```

```r
z
```

```
 [1] 0.0000000 0.0000000 0.0000000 0.0769082 0.0000000 0.0000000 0.0000000
 [8] 0.0000000 0.0000000 0.0000000
```

---

* Which behaviour is more natural?

* Regardless of what you think, this is a _defining_ feature of R

* One consequence of this is that we need an unusual approach to modify an existing object

* Replacement functions serve this purpose, and are very natural once you get used to them

---

# Lazy Evaluation

---

* Example: function to choose and return one of two arguments with specified probability

```r
choose1 <- function(a, b, p = 0.5) {
    u <- runif(1)
    if (u < p) a else b
}
```

* Could be used to assign treatments to subjects in a randomized trial

```r
choose1("control", "treatment")
```

```
[1] "treatment"
```

```r
choose1("control", "treatment")
```

```
[1] "treatment"
```

```r
choose1("control", "treatment")
```

```
[1] "control"
```

---

* How does this function work?

```r
choose1 <- function(a, b, p = 0.5) {
    u <- runif(1)
    if (u < p) a else b
}
```

* We might expect the following:

* Step 1: R evaluates the values of arguments `a`, `b`, `p`

* Step 2: Function `choose1()` is called with those values

* But this is _not_ how R actually works

* Function arguments are _not_ evaluated until they are used

---

* This feature allows for some interesting behaviour in R

* Consider

```r
choose1(sqrt(2), sqrt(-2), p = 0)
```

```
Warning in sqrt(-2): NaNs produced
```

```
[1] NaN
```

```r
choose1(sqrt(2), sqrt(-2), p = 1)
```

```
[1] 1.414214
```

* This feature of R is known as _lazy evaluation_

---

* Compare with behaviour in Python

```python
def choose1(a, b, p = 0.5):
    u = random.uniform()
    if u < p: return a
    else: return b
```

```python
import warnings
warnings.simplefilter("always")
choose1(sqrt(2), sqrt(-2), p = 0)
```

```
nan

<string>:1: RuntimeWarning: invalid value encountered in sqrt
```

```python
choose1(sqrt(2), sqrt(-2), p = 1)
```

```
1.4142135623730951

<string>:1: RuntimeWarning: invalid value encountered in sqrt
```

---

# How are expressions evaluated?

---

* To understand why this is important, consider how expressions are
evaluated

* Expressions entered at the R prompt are immediately evaluated

```r
a < b
```

```
Error in eval(expr, envir, enclos): object 'a' not found
```

* Fails because `a` and `b` are not defined

---

* It is possible to store an expression without evaluating it

* There are two common ways to do this; for example

```r
e1 <- quote(a < b)
e2 <- expression(a < b)
```

* These objects are known as QUOTED expressions.

---

* We can explore the nature of these objects in the usual ways

```r
class(e1)
```

```
[1] "call"
```

```r
length(e1)
```

```
[1] 3
```

```r
str(e1)
```

```
 language a < b
```

---

* We can explore the nature of these objects in the usual ways

```r
class(e2)
```

```
[1] "expression"
```

```r
length(e2)
```

```
[1] 1
```

```r
str(e2)
```

```
  expression(a < b)
```

---

* Actually, an "expression" is just a vector that can contain multiple quoted expressions

* For example, compare

```r
str(e1)
```

```
 language a < b
```

```r
str(e2[[1]])
```

```
 language a < b
```

---

* But why is this useful?

* Expressions can be evaluated using the `eval()` function

```r
rm(x)
e <- quote(sqrt(x))
eval(e)
```

```
Error in eval(e): object 'x' not found
```

```r
x <- 10
eval(e)
```

```
[1] 3.162278
```

---

* More importantly, `eval()` has an (optional) `envir` argument

* This can be used to provide values of variables in the expression

```r
e1
```

```
a < b
```

```r
eval(e1)
```

```
Error in eval(e1): object 'a' not found
```

```r
d <- list(a = rnorm(10), b = rexp(10))
eval(e1, envir = d)
```

```
 [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
```

---

* Lazy evaluation allows a special trick

* Any function can _quote_ an argument without evaluating it

* This is done using the `substitute()` function

```r
getExpression <- function(x) {
    substitute(x)
}
```

```r
getExpression(sqrt(10))
```

```
sqrt(10)
```

```r
getExpression(a + b)
```

```
a + b
```

---

* The following function attaches its "argument" as an attribute of the return value

```r
withCall <- function(e)
{
    ans <- eval(e)
    attr(ans, "call") <- substitute(e)
    ans
}
```

```r
withCall(sqrt(10))
```

```
[1] 3.162278
attr(,"call")
sqrt(10)
```

```r
withCall(rnorm(5) < rexp(5))
```

```
[1]  TRUE  TRUE  TRUE  TRUE FALSE
attr(,"call")
rnorm(5) < rexp(5)
```

---

* This is how the `plot()` function can construct nice axis labels

```r
x <- seq(-10, 10, length.out = 201)
plot(x = x, y = x * sin(x), type = "l")
```

![plot of chunk unnamed-chunk-66](figures/rfundamentals-unnamed-chunk-66-1.svg)

---

* This is how the `curve()` function determines what function to plot

```r
curve(x * sin(1 / x), from = -0.5, to = 0.5, n = 500)
```

![plot of chunk unnamed-chunk-67](figures/rfundamentals-unnamed-chunk-67-1.svg)

---

# Non-standard Evaluation

---

* Example: Find all subsets of size 2 from a set of size 5

* Easy to create all _ordered_ pairs using the `expand.grid()` function

```r
g <- expand.grid(a = 1:5, b = 1:5)
g
```

```
   a b
1  1 1
2  2 1
3  3 1
4  4 1
5  5 1
6  1 2
7  2 2
8  3 2
9  4 2
10 5 2
11 1 3
12 2 3
13 3 3
14 4 3
15 5 3
16 1 4
17 2 4
18 3 4
19 4 4
20 5 4
21 1 5
22 2 5
23 3 5
24 4 5
25 5 5
```

]

---

* We want to retain only the $5 \choose 2$ _distinct_ combinations of size 2

* Not difficult: Find the rows for which `a` is larger than `b`

```r
g$b > g$a
```

```
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
[13] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
[25] FALSE
```

* Then use this as a row index

```r
g[g$b > g$a, ]
```

```
   a b
6  1 2
11 1 3
12 2 3
16 1 4
17 2 4
18 3 4
21 1 5
22 2 5
23 3 5
24 4 5
```

---

* As noted earlier, referring to `g` multiple times like this is not ideal

* We can avoid doing this by using the `eval()` function

```r
eval(quote(b > a), g) # instead of g$b > g$a
```

```
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
[13] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
[25] FALSE
```

* A general function that makes this even simpler is `with()`

```r
with(g, b > a)
```

```
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
[13] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
[25] FALSE
```

---

* Not surprisingly, `with()` is implemented using the `substitute()` function

```r
with
```

```
function (data, expr, ...) 
UseMethod("with")
<bytecode: 0x7fdeec6a6740>
<environment: namespace:base>
```

```r
getS3method("with", "default")
```

```
function (data, expr, ...) 
eval(substitute(expr), data, enclos = parent.frame())
<bytecode: 0x7fdee88fe000>
<environment: namespace:base>
```

---

* Combine all this to get all the $5 \choose 2$ distinct 2-subsets

```r
g[with(g, b > a), ]
```

```
   a b
6  1 2
11 1 3
12 2 3
16 1 4
17 2 4
18 3 4
21 1 5
22 2 5
23 3 5
24 4 5
```

---

* Obtaining subsets like this is very common, and has a built-in function

```r
subset(g, b > a)
```

```
   a b
6  1 2
11 1 3
12 2 3
16 1 4
17 2 4
18 3 4
21 1 5
22 2 5
23 3 5
24 4 5
```

---

* A popular add-on package called `dplyr` refers to this operation as filtering

```r
dplyr::filter(g, b > a)
```

```
   a b
1  1 2
2  1 3
3  2 3
4  1 4
5  2 4
6  3 4
7  1 5
8  2 5
9  3 5
10 4 5
```

---

* Why is this called _non-standard_ evaluation?

* Does not follow intuitive / standard expectations

* Depends critically on lazy evaluation

* Extremely common in R

---

# Questions?