Lattice Graphics: Basic Usage

Deepayan Sarkar

What is the lattice package?

Implementation of Trellis graphics for R
Powerful high-level data visualization system
Traditional user interface:
- Collection of high level functions: xyplot(), dotplot(), etc.
- Interface based on formula and data source

Origins

As we know, R is a Free Software re-implementation / dialect of S
The original implementation, available commercially as S-PLUS, was developed at Bell Labs
Both traditional graphics and Trellis graphics were part of the original S
Remembering this helps in understanding the design philosophy of lattice

Good graphical principles

Two important influences on graphics in S:
- John W. Tukey
- William Cleveland

John Tukey

Among the most influential modern statisticians
Champion of “Exploratory Data Analysis”
Worked at Bell Labs (where the S language was created)
Did not write software, but influenced the spirit of S

William Cleveland

Also worked at Bell Labs for a long time, and directly influenced the design of S graphics
Two important books:
- The Elements of Graphing Data (1985)
- Visualizing Data (1993)
Trellis graphics is essentially an implementation of ideas in the second book

Philosophy of data graphics in S

There are various designs or types of graphs for displaying data
Each design usually has a name (scatter plot, histogram, box plot, bar chart)
S has a high-level function corresponding to each such design (to be directly invoked by user)
The display produced should have reasonable defaults
Some customization through optional arguments (esp. graphical parameters)
Further customization can be done by adding to or replacing the display procedurally

Implicit expectation: S users will eventually turn into programmers
John Chambers, Preface of “Programming with Data” (Chambers 1998):

“S encourages you to slide into programming, perhaps without noticing”

Examples of high-level traditional graphics functions

Function	Default Display
`plot()`	Scatter Plot, Time-series Plot (with `type="l"`)
`boxplot()`	Comparative Box-and-Whisker Plots
`barplot()`	Bar Plot
`dotchart()`	Cleveland Dot Plot
`hist()`	Histogram
`plot(density())`	Kernel Density Plot
`qqnorm()`	Normal Quantile-Quantile Plot
`qqplot()`	Two-sample Quantile-Quantile Plot
`stripchart()`	Stripchart (Comparative 1-D Scatter Plots)
`pairs()`	Scatter-Plot Matrix

lattice defines analogous functions with different names

Function	Default Display
`xyplot()`	Scatter Plot, Time-series Plot (with `type="l"`)
`bwplot()`	Comparative Box-and-Whisker Plots
`barchart()`	Bar Plot
`dotplot()`	Cleveland Dot Plot
`histogram()`	Histogram
`densityplot()`	Kernel Density Plot
`qqmath()`	Normal Quantile-Quantile Plot
`qq()`	Two-sample Quantile-Quantile Plot
`stripplot()`	Stripchart (Comparative 1-D Scatter Plots)
`splom()`	Scatter-Plot Matrix

lattice defines analogous functions with different names

Learning lattice essentially means
- Learning about these functions (and a few others I didn’t mention)
- Learning how to customize the default displays through optional arguments
- Learning how to customize displays by writing alternative panel functions
- Learning how to customize other parts (annotation, axis, themes)
We will now quickly go through some examples covering the first two points

Examples

Dataset for illustration: The `Chem97` dataset

1997 A-level Chemistry examination in Britain

data(Chem97, package = "mlmRev")
str(Chem97)

'data.frame':   31022 obs. of  8 variables:
 $ lea      : Factor w/ 131 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ school   : Factor w/ 2410 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ student  : Factor w/ 31022 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ score    : num  4 10 10 10 8 10 6 8 4 10 ...
 $ gender   : Factor w/ 2 levels "M","F": 2 2 2 2 2 2 2 2 2 2 ...
 $ age      : num  3 -3 -4 -2 -1 4 1 4 3 0 ...
 $ gcsescore: num  6.62 7.62 7.25 7.5 6.44 ...
 $ gcsecnt  : num  0.339 1.339 0.964 1.214 0.158 ...

Dataset for illustration: The `Chem97` dataset

We are only interested in
- score : Point score on A-level Chemistry in 1997 (advanced level)
- gender : Student’s gender
- gcsescore : Average GCSE score of individual (secondary level)

head(Chem97[c("score", "gender", "gcsescore")])

  score gender gcsescore
1     4      F     6.625
2    10      F     7.625
3    10      F     7.250
4    10      F     7.500
5     8      F     6.444
6    10      F     7.750

A basic histogram

histogram(Chem97$gcsescore)       # analogous to hist(Chem97$gcsescore), but not recommended

plot of chunk unnamed-chunk-3

A basic histogram using the formula interface

histogram(~ gcsescore, data = Chem97)

plot of chunk unnamed-chunk-4

Histograms with multipanel conditioning

histogram(~ gcsescore | factor(score), data = Chem97)

plot of chunk unnamed-chunk-5

Innovations

The most visible innovation in lattice over traditional graphics is multipanel conditioning
- Common scales and shared axis labeling by default
- Strips above each panel describing subset
- Optimal use of space (e.g., no extra space left for main label unless present)
This is the origin of the name “Trellis graphics” and “lattice”
Makes use of the formula-data interface — similar to modeling functions like lm()
All high-level lattice calls will usually have a formula and a data= argument

Density plots with multipanel conditioning

densityplot(~ gcsescore | factor(score), data = Chem97, plot.points = FALSE)

plot of chunk unnamed-chunk-6

Density plots with multipanel conditioning

densityplot(~ gcsescore | factor(score), data = Chem97, plot.points = FALSE, as.table = TRUE)

plot of chunk unnamed-chunk-7

Density plots with multipanel conditioning

densityplot(~ gcsescore | factor(score) + gender, data = Chem97, plot.points = FALSE)

plot of chunk unnamed-chunk-8

Density plots with conditioning and within-panel grouping

densityplot(~ gcsescore | factor(score), data = Chem97, plot.points = FALSE, 
            groups = gender, auto.key = TRUE)

plot of chunk unnamed-chunk-9

Trellis Philosophy: Part I

Display specified in terms of
- Type of display (histogram, densityplot, etc.)
- Variables with specific roles

Typical roles for variables
- Primary variables: used for the main graphical display
- Conditioning variables: used to divide into subgroups and juxtapose (multipanel conditioning)
- Grouping variables: divide into subgroups and superpose

Primary interface: high-level functions
- Each function corresponds to a display type
- Specification of roles depends on display type
- Usually specified through a formula and the groups argument

Plots to summarize univariate distributions

We have used histograms and density plots to understand distribution of gcsescore
We will next see some variations and some other displays with the same goal
Useful to keep in mind that good data graphics should enable comparison

Variations: density histograms with 50 bins

histogram(~ gcsescore | factor(score), data = Chem97, nint = 50, type = "count")

plot of chunk unnamed-chunk-10

Variations: histograms with unequal-width bins

histogram(~ gcsescore | factor(score), data = Chem97, 
          nint = 10, breaks = NULL, equal.widths = FALSE)

plot of chunk unnamed-chunk-11

Variations: density plots with triangular kernel (ASH)

densityplot(~ gcsescore | factor(score), data = Chem97, plot.points = FALSE, 
            groups = gender, kernel = "triangular")

plot of chunk unnamed-chunk-12

Variations: bandwidth chosen by biased cross-validation

densityplot(~ gcsescore | factor(score), data = Chem97, plot.points = FALSE, 
            groups = gender, bw = "bcv")

plot of chunk unnamed-chunk-13

Normal quantile-quantile plots

qqmath(~ gcsescore | factor(score), data = Chem97, groups = gender, auto.key = TRUE, 
       grid = TRUE, alpha = 0.2)

plot of chunk unnamed-chunk-14

Normal quantile-quantile plots with banking

qqmath(~ gcsescore | factor(score), data = Chem97, groups = gender, auto.key = TRUE, grid = TRUE,
       f.value = ppoints(100),   ## plot fewer quantiles
       aspect = "xy")            ## adjust aspect ratio to 'bank' to 45 degrees

plot of chunk unnamed-chunk-15

Two-sample quantile-quantile plots

qq(gender ~ gcsescore | factor(score), data = Chem97, grid = TRUE, 
   f.value = ppoints(100), aspect = "iso")

plot of chunk unnamed-chunk-16

Box and whisker plots for multi-sample comparisons

bwplot(factor(score) ~ gcsescore | gender, Chem97)

plot of chunk unnamed-chunk-17

Box and whisker plots with categorical variable on x-axis

bwplot(gcsescore ~ gender | factor(score), Chem97)

plot of chunk unnamed-chunk-18

Box and whisker plots with explicit panel layout and gaps

bwplot(gcsescore ~ gender | factor(score), Chem97, layout = c(6, 1), between = list(x = 0.5))

plot of chunk unnamed-chunk-19

Box and whisker plots with notches and variable width

bwplot(gcsescore ~ gender | factor(score), Chem97, layout = c(6, 1), 
       notch = TRUE, varwidth = TRUE)

plot of chunk unnamed-chunk-20

Optional arguments

What are the available arguments available?
Where can we find more details about them?
To answer this, we need to learn some details about how lattice works

Summary:
- Some optional arguments are common to all high-level lattice functions
- Some are specific to the high-level function
- Some are specific to the default display panel function

Common optional arguments

Documented in help(xyplot) (for the most part)
Main categories:
- as.table, between, layout, skip : control panel layout; see Chapter 2 of the Lattice book
- xlab, ylab, main, sub : labels
- xlim, ylim : axis limits
- scales : list controlling many details about scales
- aspect : aspect ratio
- key, auto.key : legend
- par.settings : default graphical parameters (theme)
- lattice.options : non-graphical settings
Will not discuss all, but will encounter some later (see documentation for details)

Display-specific optional arguments

Most high-level functions will have some specific optional arguments
All of these have a default panel function to produce the default display
These have names of the form panel.<high-level-function>
For example, panel.histogram, panel.densityplot, panel.bwplot, etc.

Optional arguments of the panel function can also be specified in the high-level call

Display-specific optional arguments: `histogram`

     ## S3 method for class 'formula'
     histogram(x, data, allow.multiple, outer,
               auto.key = FALSE,
               aspect = "fill",
               panel = lattice.getOption("panel.histogram"),
               prepanel, scales, strip, groups,
               xlab, xlim, ylab, ylim,
               type = c("percent", "count", "density"),
               nint = if (is.factor(x)) nlevels(x) else round(log2(length(x)) + 1),
               endpoints = extend.limits(range(as.numeric(x), finite = TRUE), prop = 0.04),
               breaks,
               equal.widths = TRUE,
               drop.unused.levels = lattice.getOption("drop.unused.levels"),
               ...,
               lattice.options = NULL,
               default.scales = list(),
               default.prepanel = lattice.getOption("prepanel.default.histogram"),
               subscripts,
               subset)

     panel.histogram(x, breaks,
                     equal.widths = TRUE,
                     type = "density",
                     nint = round(log2(length(x)) + 1), 
                     alpha, col, border, lty, lwd,
                     ...,
                     identifier = "histogram")

Display-specific optional arguments: `bwplot`

     ## S3 method for class 'formula'
     bwplot(x, data, allow.multiple, outer,
            auto.key = FALSE,
            aspect = "fill",
            panel = lattice.getOption("panel.bwplot"),
            prepanel = NULL,
            scales = list(),
            strip = TRUE,
            groups = NULL,
            xlab, xlim, ylab, ylim,
            box.ratio = 1,
            horizontal = NULL,
            drop.unused.levels = lattice.getOption("drop.unused.levels"),
            ...,
            lattice.options = NULL,
            default.scales,
            default.prepanel = lattice.getOption("prepanel.default.bwplot"),
            subscripts, subset)

     panel.bwplot(x, y, box.ratio = 1,
                  box.width = box.ratio / (1 + box.ratio),
                  horizontal = TRUE,
                  pch, col, alpha, cex, 
                  font, fontfamily, fontface, 
                  fill, varwidth = FALSE,
                  notch = FALSE, notch.frac = 0.5,
                  ...,
                  levels.fos,
                  stats = boxplot.stats,
                  coef = 1.5, do.out = TRUE,
                  identifier = "bwplot")

Example: Specifying optional parameters in `bwplot()`

bwplot(factor(score) ~ gcsescore | gender, Chem97, layout = c(2, 1), coef = 0, 
       pch = "|", fill = hcl(h = 240, l = 85))

plot of chunk unnamed-chunk-21

Specifying graphical parameters

Graphical parameters are an important part of any graphical display
lattice allows common parameters to be specified as optional arguments to high-level functions
Again, this follows the standard practice in traditional graphics
However, this is not a good idea in general, particularly if the plot includes a legend
We will discuss customizing graphical parameters in the next presentation

Summary

Trellis Philosophy: Part I

Display specified in terms of
- Type of display (histogram, densityplot, etc.)
- Variables with specific roles
Typical roles for variables
- Primary variables: used for the main graphical display
- Conditioning variables: used to divide into subgroups and juxtapose (multipanel conditioning)
- Grouping variables: divide into subgroups and superpose
Primary interface: high-level functions
- Each function corresponds to a display type
- Specification of roles depends on display type
- Usually specified through a formula and the groups argument

Trellis Philosophy: Part II

Design goals:
- Enable effective graphics by encouraging good graphical practice; e.g., see Cleveland (1985)
- Remove the burden from the user as much as possible by building in good defaults into software
Some obvious examples:
- Use as much of the available space as possible
- Encourage direct comparsion by superposition (grouping)
- Enable comparison when juxtaposing (conditioning):
  - use common axes
  - add common reference objects (such as grids)
Inevitable departure from traditional R graphics paradigms

Trellis Philosophy: Part III

Any serious graphics system must also be flexible
lattice tries to balance flexibility and ease of use using the following model:
- A display is made up of various elements
- Coordinated defaults provide meaningful results, but
- Each element can be controlled independently
- The main elements are:
  - the primary (panel) display
  - axis annotation
  - strip annotation (describing the conditioning process)
  - legends (typically describing the grouping process)
We will discuss some of these elements in the rest of this course

Exercises

Load the Cars93 dataset from the MASS package
We are interested in understanding features that explain the MPG.city of a car model
We start by comparing the distribution of MPG.city for the two levels of Man.trans.avail
Draw a strip plot (basically a one-dimensional scatter plot)
- The relevant lattice high-level function is stripplot()
- Put Man.trans.avail on the y-axis and MPG.city on the x-axis
Modify the plot by adding the optional argument jitter = TRUE
- Which help page documents the jitter argument? What does it do?
- Which version of the plot would you prefer? Why?

Exercises

Draw and box-and-whisker plot of the same data with notches
- Where is the meaning of notch = TRUE documented?
- Can you conclude from this that manual transmission cars are more fuel-efficient?
Perform a two-sample \(t\)-test to compare the means of the two distributions
- Is it reasonable to assume equal variance in the two subgroups?
- Would your answer change if we take log(MPG.city) as the response?

Exercises

Draw a scatter plot of MPG.city against Weight
Does fuel efficiency depend on weight?
In the scatter plot of MPG.city against Weight, add Man.trans.avail as a grouping variable
Does fuel efficiency depend on Man.trans.avail after accounting for weight?
Fit a linear model and perform a formal test for this question

References

Chambers, John M. 1998. Programming with Data: A Guide to the S Language. New York: Springer.

Cleveland, William S. 1985. The Elements of Graphing Data. Monterey, California: Wadsworth.

Lattice Graphics: Basic Usage

What is the lattice package?

Origins

Good graphical principles

John Tukey

William Cleveland

Philosophy of data graphics in S

Examples of high-level traditional graphics functions

lattice defines analogous functions with different names

lattice defines analogous functions with different names

Examples

Dataset for illustration: The Chem97 dataset

Dataset for illustration: The Chem97 dataset

A basic histogram

A basic histogram using the formula interface

Histograms with multipanel conditioning

Innovations

Density plots with multipanel conditioning

Density plots with multipanel conditioning

Density plots with multipanel conditioning

Density plots with conditioning and within-panel grouping

Trellis Philosophy: Part I

Plots to summarize univariate distributions

Variations: density histograms with 50 bins

Variations: histograms with unequal-width bins

Variations: density plots with triangular kernel (ASH)

Variations: bandwidth chosen by biased cross-validation

Normal quantile-quantile plots

Normal quantile-quantile plots with banking

Two-sample quantile-quantile plots

Box and whisker plots for multi-sample comparisons

Box and whisker plots with categorical variable on x-axis

Box and whisker plots with explicit panel layout and gaps

Box and whisker plots with notches and variable width

Optional arguments

Common optional arguments

Display-specific optional arguments

Display-specific optional arguments: histogram

Display-specific optional arguments: bwplot

Example: Specifying optional parameters in bwplot()

Specifying graphical parameters

Summary

Trellis Philosophy: Part I

Trellis Philosophy: Part II

Trellis Philosophy: Part III

Exercises

Exercises

Exercises

References

Dataset for illustration: The `Chem97` dataset

Dataset for illustration: The `Chem97` dataset

Display-specific optional arguments: `histogram`

Display-specific optional arguments: `bwplot`

Example: Specifying optional parameters in `bwplot()`