Lattice: Introduction

Deepayan Sarkar

R graphics

  • R has a reputation for being a good system for graphics

  • This is mainly based on its ability to produce good publication-quality statistical plots

  • This course is about one specific graphics system in R called lattice

R graphics

  • R actually has two largely independent graphics subsystems

    • Traditional graphics

      • Available in R from the beginning
      • Rich collection of tools
      • Not very flexible
    • Grid graphics

      • Relatively recent (2000)
      • Low-level tool, highly flexible

Grid graphics, lattice and ggplot2

  • Grid graphics is not usually used directly by the user

  • But it forms the basis of two high-level graphics systems:

    • lattice: based on Trellis graphics (Cleveland)

    • ggplot2: inspired by “Grammar of Graphics” (Wilkinson)

  • These represent two very different philosophical approaches to graphics

  • lattice is in many ways a natural successor to traditional graphics

  • ggplot2 represents a completely different declarative approach

An example: Anscombe’s quartet

  • I will try to illustrate this with an example

  • Anscombe (1973) introduced four artificial bivariate datasets to emphasize the importance of graphics

  • The datasets all had the same means, standard deviations, and correlation

'data.frame':   11 obs. of  8 variables:
 $ x1: num  10 8 13 9 11 14 6 4 12 7 ...
 $ x2: num  10 8 13 9 11 14 6 4 12 7 ...
 $ x3: num  10 8 13 9 11 14 6 4 12 7 ...
 $ x4: num  8 8 8 8 8 8 8 19 8 8 ...
 $ y1: num  8.04 6.95 7.58 8.81 8.33 ...
 $ y2: num  9.14 8.14 8.74 8.77 9.26 8.1 6.13 3.1 9.13 7.26 ...
 $ y3: num  7.46 6.77 12.74 7.11 7.81 ...
 $ y4: num  6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 ...

An example: Anscombe’s quartet

      x1       x2       x3       x4       y1       y2       y3       y4 
9.000000 9.000000 9.000000 9.000000 7.500909 7.500909 7.500000 7.500909 
      x1       x2       x3       x4       y1       y2       y3       y4 
3.316625 3.316625 3.316625 3.316625 2.031568 2.031657 2.030424 2.030579 
[1] 0.8164205 0.8162365 0.8162867 0.8165214


  • How can we plot all four datasets together?

Anscombe’s quartet using traditional graphics

  • Traditional graphics thinks of this as four different data sets

  • The function to create scatter plots is plot()

  • Multiple plots can be put in the same figure using par(mfrow = ...)

  • Several ways of specifying variable names inside dataset:

Anscombe’s quartet using traditional graphics

plot of chunk unnamed-chunk-3

Anscombe’s quartet using traditional graphics

plot of chunk unnamed-chunk-4

Anscombe’s quartet using lattice and ggplot2

  • Both lattice and ggplot2 are capable of producing a single plot with all four datasets

  • But this requires the dataset to be in the “long format” (one row per data point)

'data.frame':   44 obs. of  3 variables:
 $ x    : num  10 8 13 9 11 14 6 4 12 7 ...
 $ y    : num  8.04 6.95 7.58 8.81 8.33 ...
 $ which: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...

Anscombe’s quartet using lattice

plot of chunk unnamed-chunk-6

Anscombe’s quartet using ggplot2

plot of chunk unnamed-chunk-7

Anscombe’s quartet using lattice and ggplot2

  • The approaches share many common features

  • Both Capable of plotting subsets of data (indexed by categorical variables)

    • This idea is known by several names: small multiples, conditioning, facetting
  • Both makes efficient use of available space (common scales, common axes)

  • Different visual appearance, but that is superficial (different default themes)

Anscombe’s quartet using lattice and ggplot2

  • However, the way in which we specify the display is very different

    • lattice uses an extension of the formula-data interface (with function xyplot() instead of plot())

    • ggplot2 specifies type of rendering (geom) and mapping of variables to coordinates (aesthetics)

  • The differences become clearer if we try to customize the display further

  • A natural modification in this example is to add a linear regression line to each scatter plot

Anscombe’s quartet with regression lines

plot of chunk unnamed-chunk-8

Anscombe’s quartet with regression lines

  • The traditional graphics approach is to add the line after the plot is drawn

  • In general, a plot is never finished, you can always add more points, lines, text, …

  • This is possible because there is only one plot !

  • lattice and ggplot2 need alternative solutions

  • The ggplot2 solution is to allow plots to have multiple layers

  • The lattice solution is to allow user to fully specify the procedure used to display data

Regression lines: the ggplot2 solution

  • Plot with points only

plot of chunk unnamed-chunk-9

Regression lines: the ggplot2 solution

  • Plot with regression line only

plot of chunk unnamed-chunk-10

Regression lines: the ggplot2 solution

  • Plot with both points and regression lines

plot of chunk unnamed-chunk-11

Regression lines: the lattice solution

  • The lattice solution is actually very similar to the traditional graphics solution

  • Basically, we want to do the following for each data subset x, y:

    • Draw points at (x, y)

    • Draw the linear regression line through x, y

  • For lattice, we need to encapsulate this procedure into a function

  • This function is then supplied to the xyplot() function as the panel argument

Regression lines: the lattice solution

  • Plot with points only

plot of chunk unnamed-chunk-13

Regression lines: the lattice solution

  • Plot with grid, points, and regression line

plot of chunk unnamed-chunk-14

Regression lines: the lattice solution

  • lattice also supports a layering mechanism inspired by ggplot2

plot of chunk unnamed-chunk-15

Regression lines: the lattice solution

  • Common customizations like these are also supported directly through optional arguments

plot of chunk unnamed-chunk-16

Plan

  • Day 1

    • Background and basic usage of lattice high-level functions

    • Themes and annotation: customizing graphical parameters, legends, axes, labels

  • Day 2

    • Customization of panel display using panel functions and layers

    • Exercises: Recreate some regression diagnostic plots

    • Discuss audience problems

References

Anscombe, Francis J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27 (1). Taylor & Francis Group:17–21.