Deepayan Sarkar
carData::Anscombe
: U. S. states plus Washington, D. C. in 1970
education
: Per-capita education expenditures (USD)
income
: Per-capita income (USD)
young
: Proportion under 18 (per 1000)
urban
: Proportion urban (per 1000)
education income young urban
ME 189 2824 350.7 508
NH 169 3259 345.9 564
VT 230 3072 348.5 322
MA 168 3835 335.3 846
RI 180 3549 327.1 871
CT 193 4256 341.0 774
lattice::USMortality
: Rate of mortality in the US by cause
Source: Rural Health Reform Policy Research Center, University of North Dakota
'data.frame': 40 obs. of 5 variables:
$ Status: Factor w/ 2 levels "Rural","Urban": 2 1 2 1 2 1 2 1 2 1 ...
$ Sex : Factor w/ 2 levels "Female","Male": 2 2 1 1 2 2 1 1 2 2 ...
$ Cause : Factor w/ 10 levels "Alzheimers","Cancer",..: 6 6 6 6 2 2 2 2 7 7 ...
$ Rate : num 210 243 132 155 196 ...
$ SE : num 0.2 0.6 0.2 0.4 0.2 0.5 0.2 0.4 0.1 0.3 ...
MASS::Cars93
: Data from cars on sale in the USA in 1993'data.frame': 93 obs. of 27 variables:
$ Manufacturer : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
$ Model : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
$ Type : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
$ Min.Price : num 12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
$ Price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
$ Max.Price : num 18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
$ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ...
$ MPG.highway : int 31 25 26 26 30 31 28 25 27 25 ...
$ AirBags : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
$ DriveTrain : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
$ Cylinders : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
$ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
$ Horsepower : int 140 200 172 172 208 110 170 180 170 200 ...
$ RPM : int 6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
$ Rev.per.mile : int 2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
$ Man.trans.avail : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
$ Fuel.tank.capacity: num 13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
$ Passengers : int 5 5 5 6 4 6 6 6 5 6 ...
$ Length : int 177 195 180 193 186 189 200 216 198 206 ...
$ Wheelbase : int 102 115 102 106 109 105 111 116 108 114 ...
$ Width : int 68 71 67 70 69 69 74 78 73 73 ...
$ Turn.circle : int 37 38 37 37 39 41 42 45 41 43 ...
$ Rear.seat.room : num 26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
$ Luggage.room : int 11 15 14 17 13 16 17 21 14 18 ...
$ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
$ Origin : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
$ Make : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...
This plot can be improved in a number of ways
Most importantly, there is no legend by default: can be added using auto.key = TRUE
To make a version of the plot for presentation, we would usually want to add
Nice decscriptive labels
Units of variables plotted
Reference grids and possibly other relevant reference objects
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE, auto.key = TRUE,
xlab = "Weight (pounds)", ylab = "Fuel efficiency on highway \n (miles per gallon)",
main = "Cars on Sale in the USA in 1993")
Two general purpose arguments: key
and legend
(see help(xyplot)
)
key
allows structured legends with columns of text, lines, points, and rectangles.
legend
allows arbitrary grid objects to be used as legends
Both need detailed specification by user (will not discuss in detail)
More useful argument: auto.key = TRUE
Uses groups
argument and display type to construct a legend using key
Allows limited customization by specifying as a list: auto.key = list(...)
See help(simpleKey)
and help(xyplot)
for details
The most useful components when specifying auto.key = list(...)
are:
space
: location of legend, usually "left", "right", "top", "bottom"
columns
: number of columns into which to arrange the legend
title
: a title for the legend
text
: labels to replace default levels of groups
auto.key
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE,
auto.key = list(columns = 6))
auto.key
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE,
auto.key = list(space = "right", title = "Cylinders"))
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE,
auto.key = list(space = "right", title = "Cylinders"), pch = 16, cex = 1.5, alpha = 0.5)
Some graphical parameters can be modified through optional arguments
Unfortunately, this does not change the corresponding legend
This happens because
When it is rendered, a lattice display uses a theme consisting of graphical parameter settings
The panel display and the legend are actually created by completely different functions
The only common information they have access to is the theme
To change graphical parameters in the display and legend together, we need to change the theme itself
The good news is that this is very easy to do:
We can change the global theme used for all subsequent plots
We can temporarily change settings for a specific plot using par.settings
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders, grid = TRUE,
auto.key = list(space = "right", title = "Cylinders"),
par.settings = simpleTheme(pch = 16, cex = 1.5, alpha = 0.5))
There are a few global themes defined in lattice (see help(trellis.device)
)
Themes can be set globally using trellis.par.set()
(as well as individual components)
latticeExtra defines additional themes: see ?theEconomist.theme
and ?ggplot2like
latticeExtra also defines a custom.theme()
function to construct new themes
trellis.par.set(standard.theme("x11")) # 'classic' S-PLUS theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders,
auto.key = list(space = "right", title = "Cylinders"),
par.settings = simpleTheme(pch = 16, cex = 1.5)) # further customization
trellis.par.set(standard.theme(color = FALSE)) # black and white theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders,
par.settings = simpleTheme(cex = 1.5),
auto.key = list(space = "right", title = "Cylinders", padding.text = 4))
library(package = "latticeExtra")
trellis.par.set(theEconomist.theme()) # The Economist theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders,
auto.key = list(space = "right", title = "Cylinders"))
library(package = "latticeExtra")
trellis.par.set(ggplot2like()) # ggplot2 theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders,
auto.key = list(space = "right", title = "Cylinders"))
The last plot looks somewhat like a default ggplot2 plot, but not completely
This is because certain other (non-graphical) settings are also different
Many of these can be customized through a global “options” setting
The main interface is through lattice.options()
Can be temporarily modified through the optional argument lattice.options
The latter is preferred unless you want to change the settings globally
trellis.device(new = FALSE, retain = FALSE) # reset to default theme
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders,
auto.key = list(space = "right", title = "Cylinders"),
par.settings = ggplot2like(), lattice.options = ggplot2like.opts())
xyplot(MPG.highway ~ Weight, data = Cars93, groups = Cylinders,
par.settings = custom.theme(hcl.colors(6, "Dark 3"), pch = 16, cex = 1.2, alpha = 0.5),
auto.key = list(space = "right", title = "Cylinders")) # user-provided colors
barchart(Cause ~ Rate | Status, data = USMortality, groups = Sex, auto.key = list(columns = 2),
origin = 0, par.settings = custom.theme(fill = hcl.colors(2, "Pastel 1")))
barchart(reorder(Cause, Rate) ~ Rate | Sex, data = USMortality, groups = Status,
auto.key = list(columns = 2), origin = 0, stack = TRUE,
par.settings = custom.theme(fill = hcl.colors(2, "Pastel 1")))
dotplot(reorder(Cause, Rate, mean) ~ Rate | Sex, data = USMortality, groups = Status,
auto.key = list(columns = 2),
par.settings = custom.theme(pch = 16, col = hcl.colors(2, "Dark 3")))
The last few plots are typical visualizations of cross-tabulated (group-wise summary) data
The previous plot is known as a Cleveland dot plot
Recommended by Cleveland because
Barcharts encode data by both position and length, which is redundant
Position is better encoding of a quantity than length (Cleveland and McGill 1984; Heer and Bostock 2010)
Cleveland also recommends reordering categories by outcome when there is no inherent ordering
This is accomplished by the reorder()
function
So far we have used the default scales / axes, but we may want to customize these as well
This is achieved using the scales
argument, which does three things
Control how range of data in individual panels are combined
Whether an axis is log-transformed
How the axis is annotated (with tick marks and labels)
dotplot(reorder(Cause, Rate, mean) ~ Rate | Sex, data = USMortality, groups = Status,
auto.key = list(space = "right"),
par.settings = simpleTheme(pch = 16), scales = list(x = list(relation = "free")))
dotplot(reorder(Cause, Rate, mean) ~ Rate | Sex, data = USMortality, groups = Status,
auto.key = list(space = "right"), par.settings = simpleTheme(pch = 16),
scales = list(x = list(log = TRUE, alternating = 3)))
dotplot(reorder(Cause, Rate, mean) ~ Rate | Status, data = USMortality, groups = Sex,
auto.key = list(space = "right"), par.settings = simpleTheme(pch = 16),
scales = list(x = list(log = TRUE, equispaced.log = FALSE, alternating = FALSE)))
Anscombe
data: model education expenditure
Anscombe
data: model education expenditureWe do not necessarily want to see all pairs, only response vs predictors
lattice supports this by allowing multiple terms to separated by +
in the formula
By default all terms are plotted in the same panel (superposed as groups)
Can be split into different panels using outer = TRUE
Default labels usually need further customization
xyplot(education ~ income + young + urban, data = Anscombe, outer = TRUE, grid = TRUE,
scales = list(x = list(relation = "free")))
xyplot(education ~ income + young + urban, data = Anscombe, outer = TRUE, grid = TRUE,
scales = list(x = list(relation = "free")), between = list(x = 1),
xlab = "predictor") # strips indicate term; safer for arbitrary layouts
xyplot(education ~ income + young + urban, data = Anscombe, outer = TRUE, grid = TRUE,
scales = list(x = list(relation = "free")), between = list(x = 1), strip = FALSE,
xlab = c("income", "young", "urban"), layout = c(3, 1)) # vector labels (fixed layout)
High-level lattice functions are S3 generic functions
The formula methods are the primary interface, but some specialized methods are also available
One such useful method is xyplot()
for time-series objects
Visualize yearly number of sunspots using xyplot(sunspot.year)
Add the optional argument aspect = "xy"
. Does this make it easier to see some features of the time series?
Add the optional argument cut = 4
. What does this do? Does it improve the visualization?
Another class of useful methods are barchart()
and dotplot()
methods for tables (array, matrix, etc.)
Use these methods to recreate the following plots for the VADeaths
data set (see ?dotplot.table
)
Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79:531–54.
Heer, Jeffrey, and Michael Bostock. 2010. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 203–12. ACM.