Lattice Graphics: Annotation, Themes, and Scales

Deepayan Sarkar

Datasets for illustration

carData::Anscombe : U. S. states plus Washington, D. C. in 1970
- education : Per-capita education expenditures (USD)
- income : Per-capita income (USD)
- young : Proportion under 18 (per 1000)
- urban : Proportion urban (per 1000)

data(Anscombe, package = "carData")
head(Anscombe)

   education income young urban
ME       189   2824 350.7   508
NH       169   3259 345.9   564
VT       230   3072 348.5   322
MA       168   3835 335.3   846
RI       180   3549 327.1   871
CT       193   4256 341.0   774

Datasets for illustration

lattice::USMortality : Rate of mortality in the US by cause
Source: Rural Health Reform Policy Research Center, University of North Dakota

data(USMortality, package = "lattice")
str(USMortality)

'data.frame':   40 obs. of  5 variables:
 $ Status: Factor w/ 2 levels "Rural","Urban": 2 1 2 1 2 1 2 1 2 1 ...
 $ Sex   : Factor w/ 2 levels "Female","Male": 2 2 1 1 2 2 1 1 2 2 ...
 $ Cause : Factor w/ 10 levels "Alzheimers","Cancer",..: 6 6 6 6 2 2 2 2 7 7 ...
 $ Rate  : num  210 243 132 155 196 ...
 $ SE    : num  0.2 0.6 0.2 0.4 0.2 0.5 0.2 0.4 0.1 0.3 ...

Datasets for illustration

MASS::Cars93 : Data from cars on sale in the USA in 1993

data(Cars93, package = "MASS")
str(Cars93)

'data.frame':   93 obs. of  27 variables:
 $ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
 $ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
 $ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
 $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
 $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
 $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
 $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
 $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
 $ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
 $ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
 $ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
 $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
 $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
 $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
 $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
 $ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
 $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
 $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
 $ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
 $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
 $ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
 $ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
 $ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
 $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
 $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
 $ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
 $ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...

Fuel efficiency by number of cylinders

stripplot(Cylinders ~ MPG.highway, data = Cars93, jitter = TRUE)

plot of chunk unnamed-chunk-4

Fuel efficiency by number of cylinders and weight

xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders)

plot of chunk unnamed-chunk-5

Fuel efficiency by number of cylinders and weight

This plot can be improved in a number of ways
Most importantly, there is no legend by default: can be added using auto.key = TRUE
To make a version of the plot for presentation, we would usually want to add
- Nice decscriptive labels
- Units of variables plotted
- Reference grids and possibly other relevant reference objects

Fuel efficiency by number of cylinders and weight

xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE, auto.key = TRUE, 
       xlab = "Weight (pounds)", ylab = "Fuel efficiency on highway \n (miles per gallon)", 
       main = "Cars on Sale in the USA in 1993")

plot of chunk unnamed-chunk-6

Legends in lattice graphics

Two general purpose arguments: key and legend (see help(xyplot))
- key allows structured legends with columns of text, lines, points, and rectangles.
- legend allows arbitrary grid objects to be used as legends
- Both need detailed specification by user (will not discuss in detail)
More useful argument: auto.key = TRUE
- Uses groups argument and display type to construct a legend using key
- Allows limited customization by specifying as a list: auto.key = list(...)
- See help(simpleKey) and help(xyplot) for details

Legends in lattice graphics

The most useful components when specifying auto.key = list(...) are:
- space : location of legend, usually "left", "right", "top", "bottom"
- columns : number of columns into which to arrange the legend
- title : a title for the legend
- text : labels to replace default levels of groups

Using `auto.key`

xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE, 
       auto.key = list(columns = 6))

plot of chunk unnamed-chunk-7

Using `auto.key`

xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE, 
       auto.key = list(space = "right", title = "Cylinders"))

plot of chunk unnamed-chunk-8

Modifying graphical parameters

xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE, 
       auto.key = list(space = "right", title = "Cylinders"), pch = 16, cex = 1.5, alpha = 0.5)

plot of chunk unnamed-chunk-9

Modifying graphical parameters

Some graphical parameters can be modified through optional arguments
Unfortunately, this does not change the corresponding legend
This happens because
- When it is rendered, a lattice display uses a theme consisting of graphical parameter settings
- The panel display and the legend are actually created by completely different functions
- The only common information they have access to is the theme

To change graphical parameters in the display and legend together, we need to change the theme itself
The good news is that this is very easy to do:
- We can change the global theme used for all subsequent plots
- We can temporarily change settings for a specific plot using par.settings

Modifying graphical parameters

xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE,
       auto.key = list(space = "right", title = "Cylinders"), 
       par.settings = simpleTheme(pch = 16, cex = 1.5, alpha = 0.5))

plot of chunk unnamed-chunk-10

Global themes

There are a few global themes defined in lattice (see help(trellis.device))
Themes can be set globally using trellis.par.set() (as well as individual components)
latticeExtra defines additional themes: see ?theEconomist.theme and ?ggplot2like
latticeExtra also defines a custom.theme() function to construct new themes

Global themes

trellis.par.set(standard.theme("x11"))                    # 'classic' S-PLUS theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, 
       auto.key = list(space = "right", title = "Cylinders"), 
       par.settings = simpleTheme(pch = 16, cex = 1.5))   # further customization

plot of chunk unnamed-chunk-11

Global themes

trellis.par.set(standard.theme(color = FALSE))            # black and white theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, 
       par.settings = simpleTheme(cex = 1.5),
       auto.key = list(space = "right", title = "Cylinders", padding.text = 4))

plot of chunk unnamed-chunk-12

Global themes

library(package = "latticeExtra")
trellis.par.set(theEconomist.theme())                     # The Economist theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, 
       auto.key = list(space = "right", title = "Cylinders"))

plot of chunk unnamed-chunk-13

Global themes

library(package = "latticeExtra")
trellis.par.set(ggplot2like())                                  # ggplot2 theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, 
       auto.key = list(space = "right", title = "Cylinders"))

plot of chunk unnamed-chunk-14

Global themes and global settings

The last plot looks somewhat like a default ggplot2 plot, but not completely
This is because certain other (non-graphical) settings are also different
Many of these can be customized through a global “options” setting
- The main interface is through lattice.options()
- Can be temporarily modified through the optional argument lattice.options
- The latter is preferred unless you want to change the settings globally

Global themes and global settings

trellis.device(new = FALSE, retain = FALSE) # reset to default theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, 
       auto.key = list(space = "right", title = "Cylinders"),
       par.settings = ggplot2like(), lattice.options = ggplot2like.opts())

plot of chunk unnamed-chunk-15

Global themes and global settings

xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, 
       par.settings = custom.theme(hcl.colors(6, "Dark 3"), pch = 16, cex = 1.2, alpha = 0.5),
       auto.key = list(space = "right", title = "Cylinders")) # user-provided colors

plot of chunk unnamed-chunk-16

Themes and legends in other high-level plots

barchart(Cause ~ Rate | Status, data = USMortality, groups = Sex)

plot of chunk unnamed-chunk-17

Themes and legends in other high-level plots

barchart(Cause ~ Rate | Status, data = USMortality, groups = Sex, auto.key = list(columns = 2), 
         origin = 0, par.settings = custom.theme(fill = hcl.colors(2, "Pastel 1")))

plot of chunk unnamed-chunk-18

Themes and legends in other high-level plots

barchart(reorder(Cause, Rate) ~ Rate | Sex, data = USMortality, groups = Status, 
         auto.key = list(columns = 2), origin = 0, stack = TRUE, 
         par.settings = custom.theme(fill = hcl.colors(2, "Pastel 1")))

plot of chunk unnamed-chunk-19

Themes and legends in other high-level plots

dotplot(reorder(Cause, Rate, mean) ~ Rate | Sex, data = USMortality, groups = Status, 
        auto.key = list(columns = 2), 
        par.settings = custom.theme(pch = 16, col = hcl.colors(2, "Dark 3")))

plot of chunk unnamed-chunk-20

Displaying tables: bar charts vs dot plots

The last few plots are typical visualizations of cross-tabulated (group-wise summary) data
The previous plot is known as a Cleveland dot plot
Recommended by Cleveland because
- Barcharts encode data by both position and length, which is redundant
- Position is better encoding of a quantity than length (Cleveland and McGill 1984; Heer and Bostock 2010)
Cleveland also recommends reordering categories by outcome when there is no inherent ordering
This is accomplished by the reorder() function

Finer control of scales

So far we have used the default scales / axes, but we may want to customize these as well
This is achieved using the scales argument, which does three things
- Control how range of data in individual panels are combined
- Whether an axis is log-transformed
- How the axis is annotated (with tick marks and labels)

Finer control of scales: examples

dotplot(reorder(Cause, Rate, mean) ~ Rate | Sex, data = USMortality, groups = Status, 
        auto.key = list(space = "right"), 
        par.settings = simpleTheme(pch = 16), scales = list(x = list(relation = "free")))

plot of chunk unnamed-chunk-21

Finer control of scales: examples

dotplot(reorder(Cause, Rate, mean) ~ Rate | Sex, data = USMortality, groups = Status, 
        auto.key = list(space = "right"), par.settings = simpleTheme(pch = 16), 
        scales = list(x = list(log = TRUE, alternating = 3)))

plot of chunk unnamed-chunk-22

Finer control of scales: examples

dotplot(reorder(Cause, Rate, mean) ~ Rate | Status, data = USMortality, groups = Sex, 
        auto.key = list(space = "right"), par.settings = simpleTheme(pch = 16), 
        scales = list(x = list(log = TRUE, equispaced.log = FALSE, alternating = FALSE)))

plot of chunk unnamed-chunk-23

`Anscombe` data: model education expenditure

splom(Anscombe) # scatter plot matrix summarizes relationship between all variables

plot of chunk unnamed-chunk-24

`Anscombe` data: model education expenditure

We do not necessarily want to see all pairs, only response vs predictors
lattice supports this by allowing multiple terms to separated by + in the formula
By default all terms are plotted in the same panel (superposed as groups)
Can be split into different panels using outer = TRUE
Default labels usually need further customization

Scatter plot with multiple terms

xyplot(education ~ income + young + urban, data = Anscombe, grid = TRUE)

plot of chunk unnamed-chunk-25

Scatter plot with multiple terms

xyplot(education ~ income + young + urban, data = Anscombe, outer = TRUE, grid = TRUE)

plot of chunk unnamed-chunk-26

Scatter plot with multiple terms

xyplot(education ~ income + young + urban, data = Anscombe, outer = TRUE, grid = TRUE,
       scales = list(x = list(relation = "free")))

plot of chunk unnamed-chunk-27

Scatter plot with multiple terms

xyplot(education ~ income + young + urban, data = Anscombe, outer = TRUE, grid = TRUE, 
       scales = list(x = list(relation = "free")), between = list(x = 1), 
       xlab = "predictor") # strips indicate term; safer for arbitrary layouts

plot of chunk unnamed-chunk-28

Scatter plot with multiple terms

xyplot(education ~ income + young + urban, data = Anscombe, outer = TRUE, grid = TRUE, 
       scales = list(x = list(relation = "free")), between = list(x = 1), strip = FALSE, 
       xlab = c("income", "young", "urban"), layout = c(3, 1)) # vector labels (fixed layout)

plot of chunk unnamed-chunk-29

Exercises

High-level lattice functions are S3 generic functions
The formula methods are the primary interface, but some specialized methods are also available
One such useful method is xyplot() for time-series objects
Visualize yearly number of sunspots using xyplot(sunspot.year)
Add the optional argument aspect = "xy". Does this make it easier to see some features of the time series?
Add the optional argument cut = 4. What does this do? Does it improve the visualization?

Exercises

Another class of useful methods are barchart() and dotplot() methods for tables (array, matrix, etc.)
Use these methods to recreate the following plots for the VADeaths data set (see ?dotplot.table)

plot of chunk unnamed-chunk-30

References

Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79:531–54.

Heer, Jeffrey, and Michael Bostock. 2010. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 203–12. ACM.

Lattice Graphics: Annotation, Themes, and Scales

Datasets for illustration

Datasets for illustration

Datasets for illustration

Fuel efficiency by number of cylinders

Fuel efficiency by number of cylinders and weight

Fuel efficiency by number of cylinders and weight

Fuel efficiency by number of cylinders and weight

Legends in lattice graphics

Legends in lattice graphics

Using auto.key

Using auto.key

Modifying graphical parameters

Modifying graphical parameters

Modifying graphical parameters

Global themes

Global themes

Global themes

Global themes

Global themes

Global themes and global settings

Global themes and global settings

Global themes and global settings

Themes and legends in other high-level plots

Themes and legends in other high-level plots

Themes and legends in other high-level plots

Themes and legends in other high-level plots

Displaying tables: bar charts vs dot plots

Finer control of scales

Finer control of scales: examples

Finer control of scales: examples

Finer control of scales: examples

Anscombe data: model education expenditure

Anscombe data: model education expenditure

Scatter plot with multiple terms

Scatter plot with multiple terms

Scatter plot with multiple terms

Scatter plot with multiple terms

Scatter plot with multiple terms

Exercises

Exercises

References

Using `auto.key`

Using `auto.key`

`Anscombe` data: model education expenditure

`Anscombe` data: model education expenditure