class: center, middle # Introduction to Data Visualization in R ## Introductory Computer Programming ### Deepayan Sarkar
--- layout: true # Data Visualization --- * Important component of data analysis -- * Main purposes - Exploration - Presentation --- * Learning objectives - What kind of visualization to use - How to create them --- layout: false # Example datasets: `airquality` (size: small) ```r str(airquality) # built-in dataset ``` ``` 'data.frame': 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... $ Month : int 5 5 5 5 5 5 5 5 5 5 ... $ Day : int 1 2 3 4 5 6 7 8 9 10 ... ``` ??? The first dataset we will consider is a simple built-in dataset in R, - giving daily air quality measurements in New York City, - over five months in 1973. It has 153 observations, one for each day from May through September. --- # Example datasets: `airquality` (size: small) ```r head(airquality, 15) ``` ``` Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6 7 23 299 8.6 65 5 7 8 19 99 13.8 59 5 8 9 8 19 20.1 61 5 9 10 NA 194 8.6 69 5 10 11 7 NA 6.9 74 5 11 12 16 256 9.7 69 5 12 13 11 290 9.2 66 5 13 14 14 274 10.9 68 5 14 15 18 65 13.2 58 5 15 ``` ??? There are some NA values, which indicate missing data. Also notice that dates are specified separately in MONTH and DAY columns, and MONTH is indicated by number rather than name. --- # Example datasets: `gapminder` (size: moderate) ```r gapminder <- read.table("data/gapminder.tsv", sep = "\t", header = TRUE) str(gapminder) ``` ``` 'data.frame': 1698 obs. of 6 variables: $ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ... $ continent: chr "Asia" "Asia" "Asia" "Asia" ... $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ... $ lifeExp : num 28.8 30.3 32 34 36.1 ... $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ... $ gdpPercap: num 779 821 853 836 740 ... ``` --- # Example datasets: `gapminder` (size: moderate) ```r subset(gapminder, country == "Australia") ``` ``` country continent year lifeExp pop gdpPercap 61 Australia Oceania 1952 69.120 8691212 10039.60 62 Australia Oceania 1957 70.330 9712569 10949.65 63 Australia Oceania 1962 70.930 10794968 12217.23 64 Australia Oceania 1967 71.100 11872264 14526.12 65 Australia Oceania 1972 71.930 13177000 16788.63 66 Australia Oceania 1977 73.490 14074100 18334.20 67 Australia Oceania 1982 74.740 15184200 19477.01 68 Australia Oceania 1987 76.320 16257249 21888.89 69 Australia Oceania 1992 77.560 17481977 23424.77 70 Australia Oceania 1997 78.830 18565243 26997.94 71 Australia Oceania 2002 80.370 19546792 30687.75 72 Australia Oceania 2007 81.235 20434176 34435.37 ``` ??? But this large size is mainly due to the fact that the dataset contains records for many countries. If we restrict our attention to the subset for Australia, for example, we see that there are only 12 observations. --- # Example datasets: `NHANES` (size: somewhat large) ```r library(package = "NHANES") str(NHANES) ``` ``` tbl_df [10,000 × 76] (S3: tbl_df/tbl/data.frame) $ ID : int [1:10000] 51624 51624 51624 51625 51630 51638 51646 51647 51647 51647 ... $ SurveyYr : Factor w/ 2 levels "2009_10","2011_12": 1 1 1 1 1 1 1 1 1 1 ... $ Gender : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 2 1 1 1 ... $ Age : int [1:10000] 34 34 34 4 49 9 8 45 45 45 ... $ AgeDecade : Factor w/ 8 levels " 0-9"," 10-19",..: 4 4 4 1 5 1 1 5 5 5 ... $ AgeMonths : int [1:10000] 409 409 409 49 596 115 101 541 541 541 ... $ Race1 : Factor w/ 5 levels "Black","Hispanic",..: 4 4 4 5 4 4 4 4 4 4 ... $ Race3 : Factor w/ 6 levels "Asian","Black",..: NA NA NA NA NA NA NA NA NA NA ... $ Education : Factor w/ 5 levels "8th Grade","9 - 11th Grade",..: 3 3 3 NA 4 NA NA 5 5 5 ... $ MaritalStatus : Factor w/ 6 levels "Divorced","LivePartner",..: 3 3 3 NA 2 NA NA 3 3 3 ... $ HHIncome : Factor w/ 12 levels " 0-4999"," 5000-9999",..: 6 6 6 5 7 11 9 11 11 11 ... $ HHIncomeMid : int [1:10000] 30000 30000 30000 22500 40000 87500 60000 87500 87500 87500 ... $ Poverty : num [1:10000] 1.36 1.36 1.36 1.07 1.91 1.84 2.33 5 5 5 ... $ HomeRooms : int [1:10000] 6 6 6 9 5 6 7 6 6 6 ... $ HomeOwn : Factor w/ 3 levels "Own","Rent","Other": 1 1 1 1 2 2 1 1 1 1 ... $ Work : Factor w/ 3 levels "Looking","NotWorking",..: 2 2 2 NA 2 NA NA 3 3 3 ... $ Weight : num [1:10000] 87.4 87.4 87.4 17 86.7 29.8 35.2 75.7 75.7 75.7 ... $ Length : num [1:10000] NA NA NA NA NA NA NA NA NA NA ... $ HeadCirc : num [1:10000] NA NA NA NA NA NA NA NA NA NA ... $ Height : num [1:10000] 165 165 165 105 168 ... $ BMI : num [1:10000] 32.2 32.2 32.2 15.3 30.6 ... $ BMICatUnder20yrs: Factor w/ 4 levels "UnderWeight",..: NA NA NA NA NA NA NA NA NA NA ... $ BMI_WHO : Factor w/ 4 levels "12.0_18.5","18.5_to_24.9",..: 4 4 4 1 4 1 2 3 3 3 ... $ Pulse : int [1:10000] 70 70 70 NA 86 82 72 62 62 62 ... $ BPSysAve : int [1:10000] 113 113 113 NA 112 86 107 118 118 118 ... $ BPDiaAve : int [1:10000] 85 85 85 NA 75 47 37 64 64 64 ... $ BPSys1 : int [1:10000] 114 114 114 NA 118 84 114 106 106 106 ... $ BPDia1 : int [1:10000] 88 88 88 NA 82 50 46 62 62 62 ... $ BPSys2 : int [1:10000] 114 114 114 NA 108 84 108 118 118 118 ... $ BPDia2 : int [1:10000] 88 88 88 NA 74 50 36 68 68 68 ... $ BPSys3 : int [1:10000] 112 112 112 NA 116 88 106 118 118 118 ... $ BPDia3 : int [1:10000] 82 82 82 NA 76 44 38 60 60 60 ... $ Testosterone : num [1:10000] NA NA NA NA NA NA NA NA NA NA ... $ DirectChol : num [1:10000] 1.29 1.29 1.29 NA 1.16 1.34 1.55 2.12 2.12 2.12 ... $ TotChol : num [1:10000] 3.49 3.49 3.49 NA 6.7 4.86 4.09 5.82 5.82 5.82 ... $ UrineVol1 : int [1:10000] 352 352 352 NA 77 123 238 106 106 106 ... $ UrineFlow1 : num [1:10000] NA NA NA NA 0.094 ... $ UrineVol2 : int [1:10000] NA NA NA NA NA NA NA NA NA NA ... $ UrineFlow2 : num [1:10000] NA NA NA NA NA NA NA NA NA NA ... $ Diabetes : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ... $ DiabetesAge : int [1:10000] NA NA NA NA NA NA NA NA NA NA ... $ HealthGen : Factor w/ 5 levels "Excellent","Vgood",..: 3 3 3 NA 3 NA NA 2 2 2 ... $ DaysPhysHlthBad : int [1:10000] 0 0 0 NA 0 NA NA 0 0 0 ... $ DaysMentHlthBad : int [1:10000] 15 15 15 NA 10 NA NA 3 3 3 ... $ LittleInterest : Factor w/ 3 levels "None","Several",..: 3 3 3 NA 2 NA NA 1 1 1 ... $ Depressed : Factor w/ 3 levels "None","Several",..: 2 2 2 NA 2 NA NA 1 1 1 ... $ nPregnancies : int [1:10000] NA NA NA NA 2 NA NA 1 1 1 ... $ nBabies : int [1:10000] NA NA NA NA 2 NA NA NA NA NA ... $ Age1stBaby : int [1:10000] NA NA NA NA 27 NA NA NA NA NA ... $ SleepHrsNight : int [1:10000] 4 4 4 NA 8 NA NA 8 8 8 ... $ SleepTrouble : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ... $ PhysActive : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 2 2 2 ... $ PhysActiveDays : int [1:10000] NA NA NA NA NA NA NA 5 5 5 ... $ TVHrsDay : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ... $ CompHrsDay : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ... $ TVHrsDayChild : int [1:10000] NA NA NA 4 NA 5 1 NA NA NA ... $ CompHrsDayChild : int [1:10000] NA NA NA 1 NA 0 6 NA NA NA ... $ Alcohol12PlusYr : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ... $ AlcoholDay : int [1:10000] NA NA NA NA 2 NA NA 3 3 3 ... $ AlcoholYear : int [1:10000] 0 0 0 NA 20 NA NA 52 52 52 ... $ SmokeNow : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA NA NA NA ... $ Smoke100 : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ... $ Smoke100n : Factor w/ 2 levels "Non-Smoker","Smoker": 2 2 2 NA 2 NA NA 1 1 1 ... $ SmokeAge : int [1:10000] 18 18 18 NA 38 NA NA NA NA NA ... $ Marijuana : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ... $ AgeFirstMarij : int [1:10000] 17 17 17 NA 18 NA NA 13 13 13 ... $ RegularMarij : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 1 1 1 ... $ AgeRegMarij : int [1:10000] NA NA NA NA NA NA NA NA NA NA ... $ HardDrugs : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ... $ SexEver : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ... $ SexAge : int [1:10000] 16 16 16 NA 12 NA NA 13 13 13 ... $ SexNumPartnLife : int [1:10000] 8 8 8 NA 10 NA NA 20 20 20 ... $ SexNumPartYear : int [1:10000] 1 1 1 NA 1 NA NA 0 0 0 ... $ SameSex : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA 2 2 2 ... $ SexOrientation : Factor w/ 3 levels "Bisexual","Heterosexual",..: 2 2 2 NA 2 NA NA 1 1 1 ... $ PregnantNow : Factor w/ 3 levels "Yes","No","Unknown": NA NA NA NA NA NA NA NA NA NA ... ``` ??? Data originally come from a health and nutrition survey conducted regularly in the USA. Each row in this dataset represents a respondent in the study. The actual study uses a fairly complex survey design. This is not the full dataset, but rather a carefully chosen subset that can be treated as a _random sample_ from the US population. --- # The goal of data visualization * Visualizations help us study relationships * This is enabled by comparison ??? visual comparisons require the data values being plotted to be converted into something that _can_ be plotted. The most common and obvious mapping is from a value to a _coordinate position_ on the plot. But the mapping can also be to _length_, _area_, or even _color_. We will see some of these mappings in the examples that follow. --- # What do we study using visualization? - Univariate distributions - Bivariate and trivariate (generally multivariate) relationships - Special case: Relationship with time (time-series) or space (spatial) ??? Let us now dive into some visualization _examples_, keeping our earlier discussion in mind. We will start with a simple univariate data vector, namely, the vector of ozone concentrations in the AIR QUALITY data set. --- # The `plot()` function ```r plot(airquality$Ozone) ```  --- # Univariate distributions: strip charts or dot plots ```r stripchart(airquality$Ozone) ```  --- # Univariate distributions: strip charts or dot plots ```r stripchart(airquality$Ozone, method = "stack", pch = 16) ```  --- # Univariate distributions: comparative strip charts ```r stripchart(Ozone ~ factor(Month), data = airquality, method = "stack", pch = 16) ```  --- # Univariate distributions: comparative strip charts ```r airquality$Month <- factor(airquality$Month, labels = month.abb[5:9]) stripchart(Temp ~ factor(Month, labels = month.abb[5:9]), data = airquality, method = "stack", pch = 16) ```  --- # Univariate distributions: comparative strip charts ```r stripchart(lifeExp ~ reorder(continent, lifeExp), data = gapminder, method = "stack", pch = 16, las = 1) ```  --- # Univariate distributions: comparative strip charts ```r stripchart(lifeExp ~ reorder(continent, lifeExp), data = gapminder, method = "jitter", pch = 16, las = 1, col = rgb(0, 0, 0, alpha = 0.25)) ```  --- # Univariate distributions: comparative strip charts ```r rgb(0, 0, 0, alpha = 0.25) ``` ``` [1] "#00000040" ``` ```r stripchart(DirectChol ~ Gender, data = NHANES, method = "jitter", pch = 16, col = "#00000040") ```  --- # Univariate distributions: comparative box and whisker plots ```r boxplot(DirectChol ~ Gender, data = NHANES, horizontal = TRUE, range = 0) ```  --- # Univariate distributions: comparative box and whisker plots ```r boxplot(DirectChol ~ Gender, data = NHANES, horizontal = TRUE) ```  --- # Univariate distributions: comparative box and whisker plots ```r library(package = "lattice") bwplot(Gender ~ BPSysAve, data = NHANES) ```  --- # Univariate distributions: comparative histograms ```r histogram( ~ BPSysAve | Gender, data = NHANES, layout = c(1, 2), nint = 25) ```  --- # Univariate distributions: kernel density estimates ```r densityplot( ~ BPSysAve | Gender, data = NHANES, layout = c(1, 2), plot.points = FALSE) ```  --- # Univariate distributions: comparative violin plots ```r bwplot(Gender ~ BPSysAve, data = NHANES, panel = panel.violin) ```  --- # Univariate distributions: comparative violin plots ```r bwplot(Race1 ~ Weight | Gender, data = NHANES, panel = panel.violin) ```  --- # Univariate distributions: comparative violin plots ```r bwplot(Race1 ~ Weight | Gender, data = NHANES) ```  ??? Notice that in the last two plots, we have used a combination of two grouping attributes to separate the data, race and gender, using conditioning, along with the usual comparative boxplot design. This kind of comparison among subgroups is something that a well-designed plot can convey very effectively. For example, this plot suggests that weight distribution varies across racial subgroups, and the pattern of changes is largely the same for males and females, except for blacks. --- # Summary: Univariate distributions * Basic design: Strip charts * Generalizations: box and whisker plots, density estimates --- # Bivariate distributions: scatter plot * Encodes two variables as x- and y-coordinates ```r plot(Ozone ~ Solar.R, data = airquality) ```  --- # Bivariate distributions: scatter plot ```r xyplot(Ozone ~ Solar.R, data = airquality, grid = TRUE) ```  --- # Bivariate distributions: comparative scatter plots ```r xyplot(Ozone ~ Solar.R | Month, data = airquality, grid = TRUE, layout = c(5, 1), pch = 16) ```  --- # Bivariate distributions: semi-transparent colors ```r xyplot(Weight ~ Height | Gender, data = NHANES, grid = TRUE, pch = 16, alpha = 0.1) ```  --- # Bivariate distributions: ggplot2 ```r library(ggplot2) ggplot(data = NHANES) + facet_grid(~ Gender) + geom_point(mapping = aes(x = Height, y = Weight, alpha = 0.1)) ```  --- # Bivariate distributions: hexagonal binning * Hexagons are preferable over rectangles * Bin counts are usually indicated by color ```r library(hexbin); hexbinplot(Weight ~ Height | Gender, data = NHANES, aspect = 1) ```  --- # Bivariate distributions: hexagonal binning ```r ggplot(data = NHANES) + facet_grid(~ Gender) + geom_hex(mapping = aes(x = Height, y = Weight)) ```  --- # Bivariate distributions: kernel density estimates ```r xyplot(Weight ~ Height | Gender, data = NHANES, grid = TRUE, panel = panel.smoothScatter) ```  --- # Trivariate data: projection into two-dimensional space * Up to three variables can be mapped to x, y, z-coordinates ```r cloud(Ozone ~ Solar.R + Temp, data = airquality, scales = list(arrows = FALSE)) ```  --- # Trivariate data: projection into two-dimensional space * Better: Interactive version using `rgl` package (interface to OpenGL) ```r library(rgl) with(airquality, plot3d(Solar.R, Temp, Ozone, type = "s", col = "red", size = 1)) ``` --- layout: true # Conditioning / faceting --- * Categorical variables can be compared using superposition ```r xyplot(lifeExp ~ log2(gdpPercap) | reorder(continent, lifeExp), data = gapminder, grid = TRUE) ```  ??? We can see that that these two are related. But this plot shows data for all years together, so we cannot see how things have changed over time. --- * To avoid overlaps, can look at subsets ```r xyplot(lifeExp ~ log2(gdpPercap) | reorder(continent, lifeExp), data = gapminder, grid = TRUE, subset = (year == 2002)) ```  --- * For too many comparisons, single display page may not be enough ```r xyplot(lifeExp ~ log2(gdpPercap) | reorder(continent, lifeExp) + factor(year), data = gapminder, grid = TRUE, layout = c(5, 1)) ```
??? Here we need to make some sort of compromise. If we plot all contintent-year combinations, the individual panels may become too small to see. Sometimes, it is useful to create multi-page plots that can be viewed one after the other, for example, using animation as we have done here. --- * Another option: use a different attribute such as color (not very helpful here) ```r xyplot(lifeExp ~ log2(gdpPercap) | reorder(continent, lifeExp), data = gapminder, grid = TRUE, group = year, auto.key = list(columns = 6)) ```  ??? Of course, if we really wanted to study how life expectancy or per capita GDP has changed over time, this may not be the best possible plot for that. --- layout: true # Tables: Summary measures on categorical attributes --- * Two common types of visualization: pie charts and bar charts * These are used to visualize _tabular_ data -- * Usually obtained from a larger dataset by cross-classification on categorical variables * But actually much more common ```r str(gapminder) ``` ``` 'data.frame': 1698 obs. of 6 variables: $ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ... $ continent: chr "Asia" "Asia" "Asia" "Asia" ... $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ... $ lifeExp : num 28.8 30.3 32 34 36.1 ... $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ... $ gdpPercap: num 779 821 853 836 740 ... ``` --- .scrollable500[ ```r gapminder_sub <- subset(gapminder, continent == "Asia" & year %in% c(1967, 1987, 2007)) (letab <- xtabs(lifeExp ~ country + year, data = gapminder_sub)) ``` ``` year country 1967 1987 2007 Afghanistan 34.02000 40.82200 43.82800 Bahrain 59.92300 70.75000 75.63500 Bangladesh 43.45300 52.81900 64.06200 Cambodia 45.41500 53.91400 59.72300 China 58.38112 67.27400 72.96100 Hong Kong, China 70.00000 76.20000 82.20800 India 47.19300 58.55300 64.69800 Indonesia 45.96400 60.13700 70.65000 Iran 52.46900 63.04000 70.96400 Iraq 54.45900 65.04400 59.54500 Israel 70.75000 75.60000 80.74500 Japan 71.43000 78.67000 82.60300 Jordan 51.62900 65.86900 72.53500 Korea, Dem. Rep. 59.94200 70.64700 67.29700 Korea, Rep. 57.71600 69.81000 78.62300 Kuwait 64.62400 74.17400 77.58800 Lebanon 63.87000 67.92600 71.99300 Malaysia 59.37100 69.50000 74.24100 Mongolia 51.25300 60.22200 66.80300 Myanmar 49.37900 58.33900 62.06900 Nepal 41.47200 52.53700 63.78500 Oman 46.98800 67.73400 75.64000 Pakistan 49.80000 58.24500 65.48300 Philippines 56.39300 64.15100 71.68800 Saudi Arabia 49.90100 66.29500 72.77700 Singapore 67.94600 73.56000 79.97200 Sri Lanka 64.26600 69.01100 72.39600 Syria 53.65500 66.97400 74.14300 Taiwan 67.50000 73.40000 78.40000 Thailand 58.28500 66.08400 70.61600 Vietnam 47.83800 62.82000 74.24900 West Bank and Gaza 51.63100 67.04600 73.42200 Yemen, Rep. 36.98400 52.92200 62.69800 ``` ] --- layout: false # Pie charts ```r pie(letab[, "2007"]) ```  ??? Can only compare values over one category, and they take up a large amount of space. Moreover, they cannot really convey small differences. In this example, it is essentially impossible to decipher which countries have low or high life expectancy. --- # Bar charts ```r letab <- xtabs(lifeExp ~ reorder(country, lifeExp, max) + year, data = gapminder_sub) barchart(letab[, "2007"], origin = 0, xlab = "Life Expectancy in Asia (2007)") ```  ??? Whenever pie charts can be used, bar charts are usually a better idea. Bar charts map values to a coordinate position, making comparisons much more effective. They also encode the value through the length of the bar, which provides an additional means for comparison. In this case, we have also ordered the countries by maximum life expectancy. This kind of data-driven ordering does not change the underlying data in any way, but visually they are much more helpful than, say, alphabetical ordering. --- # Bar charts ```r barchart(letab, ylab = "Life Expectancy in Asia (1967 - 2007)", horizontal = FALSE, stack = FALSE, auto.key = list(columns = 3), scales = list(x = list(rot = 45))) ```  ??? The big advantage of bar charts over pie charts is that multiple subgroups can be compared using stacked or superposed bars. Here, we can see how life expectancy has changed, over two twenty-year periods, in each country in Asia. --- # Dot plots ```r dotplot(letab, ylab = "Life Expectancy in Asia (1967 - 2007)", horizontal = FALSE, par.settings = simpleTheme(pch = 16), auto.key = list(columns = 3), scales = list(x = list(rot = 45))) ```  ??? But sometimes, the mapping of the values to two different attributes, position and length, does not serve any useful purpose. Dot plots are a simpler alternative, that maps the values being plotted only to the POSITION of a point. This can often be more useful, especially when we compare different subgroups. --- # Dot plots ```r dotplot(letab, ylab = "Life Expectancy in Asia (1967 - 2007)", horizontal = FALSE, type = "o", par.settings = simpleTheme(pch = 16), auto.key = list(columns = 3), scales = list(x = list(rot = 45))) ```  --- # Time-series plots ```r xyplot(lifeExp ~ year | continent, data = gapminder, groups = country, type = "l", col = "black") ```  --- layout: true # Graphics Systems in R ---  ---  ---  --- * Primary implementation in four packages: * `graphics` * `grid` * `lattice` * `ggplot2` * Many other add-on packages that build on these basic packages * Will not discuss in much detail, but will see some more practical examples