class: center, middle, inverse, title-slide # Data visualization with
ggplot2
### AECN 396/896-002 --- <style type="text/css"> .remark-slide-content.hljs-github h1 { margin-top: 5px; margin-bottom: 25px; } .remark-slide-content.hljs-github { padding-top: 10px; padding-left: 30px; padding-right: 30px; } .panel-tabs { <!-- color: #062A00; --> color: #841F27; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 0px; } .panel-tab { margin-top: 0px; margin-bottom: 0px; margin-left: 3px; margin-right: 3px; padding-top: 0px; padding-bottom: 0px; } .panelset .panel-tabs .panel-tab { min-height: 40px; } .remark-slide th { border-bottom: 1px solid #ddd; } .remark-slide thead { border-bottom: 0px; } .gt_footnote { padding: 2px; } .remark-slide table { border-collapse: collapse; } .remark-slide tbody { border-bottom: 2px solid #666; } .important { background-color: lightpink; border: 2px solid blue; font-weight: bold; } .remark-code { display: block; overflow-x: auto; padding: .5em; background: #ffe7e7; } .hljs-github .hljs { background: #f2f2fd; } .remark-inline-code { padding-top: 0px; padding-bottom: 0px; background-color: #e6e6e6; } .r.hljs.remark-code.remark-inline-code{ font-size: 0.9em } .left-full { width: 80%; float: left; } .left-code { width: 38%; height: 92%; float: left; } .right-plot { width: 60%; float: right; padding-left: 1%; } .left6 { width: 60%; height: 92%; float: left; } .left5 { width: 49%; <!-- height: 92%; --> float: left; } .right5 { width: 49%; float: right; padding-left: 1%; } .right4 { width: 39%; float: right; padding-left: 1%; } .left3 { width: 29%; height: 92%; float: left; } .right7 { width: 69%; float: right; padding-left: 1%; } .left4 { width: 38%; float: left; } .right6 { width: 60%; float: right; padding-left: 1%; } ul li{ margin: 7px; } ul, li{ margin-left: 15px; padding-left: 0px; } ol li{ margin: 7px; } ol, li{ margin-left: 15px; padding-left: 0px; } </style> <style type="text/css"> .content-box { box-sizing: border-box; background-color: #e2e2e2; } .content-box-blue, .content-box-gray, .content-box-grey, .content-box-army, .content-box-green, .content-box-purple, .content-box-red, .content-box-yellow { box-sizing: border-box; border-radius: 5px; margin: 0 0 10px; overflow: hidden; padding: 0px 5px 0px 5px; width: 100%; } .content-box-blue { background-color: #F0F8FF; } .content-box-gray { background-color: #e2e2e2; } .content-box-grey { background-color: #F5F5F5; } .content-box-army { background-color: #737a36; } .content-box-green { background-color: #d9edc2; } .content-box-purple { background-color: #e2e2f9; } .content-box-red { background-color: #ffcccc; } .content-box-yellow { background-color: #fef5c4; } .content-box-blue .remark-inline-code, .content-box-blue .remark-inline-code, .content-box-gray .remark-inline-code, .content-box-grey .remark-inline-code, .content-box-army .remark-inline-code, .content-box-green .remark-inline-code, .content-box-purple .remark-inline-code, .content-box-red .remark-inline-code, .content-box-yellow .remark-inline-code { background: none; } .full-width { display: flex; width: 100%; flex: 1 1 auto; } </style> <style type="text/css"> blockquote, .blockquote { display: block; margin-top: 0.1em; margin-bottom: 0.2em; margin-left: 5px; margin-right: 5px; border-left: solid 10px #0148A4; border-top: solid 2px #0148A4; border-bottom: solid 2px #0148A4; border-right: solid 2px #0148A4; box-shadow: 0 0 6px rgba(0,0,0,0.5); /* background-color: #e64626; */ color: #e64626; padding: 0.5em; -moz-border-radius: 5px; -webkit-border-radius: 5px; } .blockquote p { margin-top: 0px; margin-bottom: 5px; } .blockquote > h1:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h2:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h3:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h4:first-of-type { margin-top: 0px; margin-bottom: 5px; } .text-shadow { text-shadow: 0 0 4px #424242; } </style> <style type="text/css"> /****************** * Slide scrolling * (non-functional) * not sure if it is a good idea anyway slides > slide { overflow: scroll; padding: 5px 40px; } .scrollable-slide .remark-slide { height: 400px; overflow: scroll !important; } ******************/ .scroll-box-8 { height:8em; overflow-y: scroll; } .scroll-box-10 { height:10em; overflow-y: scroll; } .scroll-box-12 { height:12em; overflow-y: scroll; } .scroll-box-14 { height:14em; overflow-y: scroll; } .scroll-box-16 { height:16em; overflow-y: scroll; } .scroll-box-18 { height:18em; overflow-y: scroll; } .scroll-box-20 { height:20em; overflow-y: scroll; } .scroll-box-24 { height:24em; overflow-y: scroll; } .scroll-box-30 { height:30em; overflow-y: scroll; } .scroll-output { height: 90%; overflow-y: scroll; } </style> # Before you start ## Learning objectives The objectives of this chapter is to learn how to use the `ggplot2` package to create figures for effective communication ## Table of contents 1. [`ggplot2` basics](#ggplot2-basics) 2. [Different types of figures](#dif-geoms) 3. [Placing more information in one figure](#more-info) 4. [Faceted figures](#faceting) 5. [Other supplementary `geom_*()`](#other-geoms) 6. [Make your figures presentable to others](#fine-tune) 7. [Tips](#tips) 8. [Gallery of other type of figures](#gallery) 9. [Animated figures](#animated) <br> <span style="color:red"> Tips: </span>hitting letter "o" key will give you a panel view of the slides --- # `ggplot2` package .left-full[ Install the package if you have not. ```r install.packages('ggplot2') ``` When you load the `tidyverse` package, it automatically load it. ```r #--- load ggplot2 along with others in the tidyverse package ---# library(tidyverse) #--- or ---# *library(ggplot2) ``` ] --- # The datasets we use .panelset[ .panel[.panel-name[Instruction] Go [here](https://www.dropbox.com/sh/63rlp4ydmyjm1ui/AACYSeN0f_WAgKPQKzgpGVe0a?dl=0) and download **county_yield.rds** and then read the file onto R: ] .panel[.panel-name[R Code] ```r county_yield <- readRDS("county_yield.rds") %>% dplyr::select(soy_yield, corn_yield, year, county_code, state_name, d0_5_9, d1_5_9, d2_5_9, d3_5_9, d4_5_9) %>% filter(state_name %in% c("Nebraska", "Kansas", "Colorado")) ``` ] .panel[.panel-name[Output] ``` ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 d2_5_9 ## 1: NA NA 2018 053 Kansas 0.8980 3.8186 13.5279 ## 2: NA NA 2017 053 Kansas 3.9994 7.0006 0.0000 ## 3: NA NA 2016 053 Kansas 0.5724 0.0996 0.0000 ## 4: NA NA 2015 053 Kansas 4.4283 1.6177 0.0000 ## 5: NA NA 2014 053 Kansas 4.7032 9.9327 3.5824 ## --- ## 2960: 53 181 2004 073 Nebraska 0.0000 3.2915 19.7085 ## 2961: 57 195 2003 073 Nebraska 0.0000 7.7427 11.8459 ## 2962: 51 170 2002 073 Nebraska 0.0000 7.0000 1.2978 ## 2963: 56 195 2001 073 Nebraska 5.7915 0.0000 0.0000 ## 2964: 54 147 2000 073 Nebraska 0.0000 4.7386 17.6887 ## d3_5_9 d4_5_9 ## 1: 0.0000 0 ## 2: 0.0000 0 ## 3: 0.0000 0 ## 4: 0.0000 0 ## 5: 4.7817 0 ## --- ## 2960: 0.0000 0 ## 2961: 3.4114 0 ## 2962: 4.7022 9 ## 2963: 0.0000 0 ## 2964: 0.5727 0 ``` ] .panel[.panel-name[Variable Definitions] + `soy_yield`: soybean yield (bu/acre) + `corn_yield`: corn yield (bu/acre) + `d0_5_9`: ratio of weeks under drought severity of 0 from May to September + `d1_5_9`: ~ drought severity of 1 from May to September + `d2_5_9`: ~ drought severity of 2 from May to September + `d3_5_9`: ~ drought severity of 3 from May to September + `d4_5_9`: ~ drought severity of 4 from May to September ] ] <!-- #========================================= # ggplot2 Basics #========================================= --> --- class: inverse, center, middle name: ggplot2-basics # `ggplot2` basics <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r *county_yield ``` ] .panel2-taste-user[ ``` ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 d2_5_9 ## 1: NA NA 2018 053 Kansas 0.8980 3.8186 13.5279 ## 2: NA NA 2017 053 Kansas 3.9994 7.0006 0.0000 ## 3: NA NA 2016 053 Kansas 0.5724 0.0996 0.0000 ## 4: NA NA 2015 053 Kansas 4.4283 1.6177 0.0000 ## 5: NA NA 2014 053 Kansas 4.7032 9.9327 3.5824 ## --- ## 2960: 53 181 2004 073 Nebraska 0.0000 3.2915 19.7085 ## 2961: 57 195 2003 073 Nebraska 0.0000 7.7427 11.8459 ## 2962: 51 170 2002 073 Nebraska 0.0000 7.0000 1.2978 ## 2963: 56 195 2001 073 Nebraska 5.7915 0.0000 0.0000 ## 2964: 54 147 2000 073 Nebraska 0.0000 4.7386 17.6887 ## d3_5_9 d4_5_9 ## 1: 0.0000 0 ## 2: 0.0000 0 ## 3: 0.0000 0 ## 4: 0.0000 0 ## 5: 4.7817 0 ## --- ## 2960: 0.0000 0 ## 2961: 3.4114 0 ## 2962: 4.7022 9 ## 2963: 0.0000 0 ## 2964: 0.5727 0 ``` ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% * ggplot(data = .) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + * aes(x = factor(year)) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + * aes(y = corn_yield) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_04_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + * geom_boxplot() ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_05_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + * aes(fill = state_name) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_06_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + * facet_grid(state_name ~ .) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_07_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + * xlab("Year") ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_08_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + * ylab("Corn Yield (bu/acre)") ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_09_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + * ylim(c(100, 200)) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_10_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + * scale_fill_viridis_d( * name = "State", * guide = guide_legend( * title.position = "left" * ) * ) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_11_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + * theme_bw() ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_12_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + * theme(axis.text.x = element_text(angle = 90, size = 6)) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_13_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + * theme(axis.text.y = element_text(size = 6)) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_14_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + * theme(legend.position = "bottom") ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_15_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + theme(legend.position = "bottom") + * theme( * legend.title = element_text(size = 6), * legend.text = element_text(size = 6) * ) ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_16_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + theme(legend.position = "bottom") + theme( legend.title = element_text(size = 6), legend.text = element_text(size = 6) ) + * labs(title = "Corn Yield (bu/acre) by State") ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_17_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + theme(legend.position = "bottom") + theme( legend.title = element_text(size = 6), legend.text = element_text(size = 6) ) + labs(title = "Corn Yield (bu/acre) by State") + * labs(caption = "Design: Taro Mieno") ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_18_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Taste of ggplot2 .panel1-taste-user[ ```r county_yield %>% ggplot(data = .) + aes(x = factor(year)) + aes(y = corn_yield) + geom_boxplot() + aes(fill = state_name) + facet_grid(state_name ~ .) + xlab("Year") + ylab("Corn Yield (bu/acre)") + ylim(c(100, 200)) + scale_fill_viridis_d( name = "State", guide = guide_legend( title.position = "left" ) ) + theme_bw() + theme(axis.text.x = element_text(angle = 90, size = 6)) + theme(axis.text.y = element_text(size = 6)) + theme(legend.position = "bottom") + theme( legend.title = element_text(size = 6), legend.text = element_text(size = 6) ) + labs(title = "Corn Yield (bu/acre) by State") + labs(caption = "Design: Taro Mieno") + * labs(subtitle = "Data Source: USDA-NASS") ``` ] .panel2-taste-user[ <img src="data_visualization_x_files/figure-html/taste_user_19_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-taste-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-taste-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-taste-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Basics .panelset[ .panel[.panel-name[Step 1] .left-full[ The very first job you need to do to create a figure using the `ggplot2` package is to let R know the dataset you are trying to visualize, which can be done using `ggplot()` as follows: ```r g_fig <- ggplot(data = county_yield) ``` When you create a figure using the `ggplot2` package, `ggplot()` is always the function you call first. ] ] .panel[.panel-name[g_fig] .left-code[ Let's now see what is inside `g_fig`: ```r g_fig ``` Well, it's blank. Obviously, `g_fig` still does not have enough information to create any kind of figures. You have not told R anything specific about how you would like to use the information in the dataset. ] .right-plot[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Step 2] .left-full[ So, the next thing you need to do is tell `g_fig` what type of figure you want by `geom_*()` functions. For example, we use `geom_point()` to create a scatter plot. To create a scatter plot, R needs to know which variables should be on the y-axis and x-axis. These information can be passed to `g_fig` by the following code: ```r g_fig_scatter <- g_fig + geom_point(aes(x = d3_5_9, y = corn_yield)) ``` Here, + `geom_point()` was added to `g_fig` to declare that you want a scatter plot + `aes(x = d3_5_9, y = corn_yield)` inside `geom_point()` tells R that you want to create a scatter plot where you have `d3_5_9` on the x-axis and `corn_yield` on the y-axis ] ] .panel[.panel-name[Output] .left-code[ This is what `g_fig_scatter` looks: ```r g_fig_scatter ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/scatter-plot-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[`aes()`] .left-full[ Going back to the code, ```r g_fig_scatter <- g_fig + geom_point(aes(x = d3_5_9, y = corn_yield)) ``` Note that `x = d3_5_9`, `y = corn_yield` are inside `aes()`. <span style="color:red"> Important:</span> `aes()` is used to make the <span style='color:red'>aes</span>thetic of the figure to be a function of variables in the dataset that you told `ggplot` to use (here, `county_yield`). `aes(x = d3_5_9, y = corn_yield)` is telling `ggplot` to use `d3_5_9` and `corn_yield` variables in the `county_yield` dataset for the x-axis and y-axis, respectively. If you do not have `x = d3_5_9`, `y = corn_yield` inside `aes()`, R is going to look for `d3_5_9` and `corn_yield` themselves (but not in `county_yield`), which you have not defined. Try ```r g_fig + geom_point(x = d3_5_9, y = corn_yield) ``` ] ] .panel[.panel-name[summary] .left-full[ + `ggplot(data = dataset)` to initiate the process of creating a figure + add `geom_*()` to declare what kind of figure you would like to make + specify what variables in the dataset to use and how they are used inside `aes()` + place the `aes()` you defined above in the `geom_*()` you specified above ] ] ] <!-- #========================================= # Different types of figures #========================================= --> --- class: inverse, center, middle name: dif-geoms # Different types of figures <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Different types of figures .panelset[ .panel[.panel-name[Figure types] <br> `ggplot2` lets you create lots of different kinds of figures via various `geom_*()` functions. + `geom_histogram()`/`geom_density()` + `geom_line()` + `geom_boxplot()` + `geom_bar()` How to specify aesthetics vary by `geom_*()`. ] .panel[.panel-name[Histogram] .left-code[ ```r g_fig + geom_histogram( aes(x = corn_yield) ) ``` `geom_histogram()` only needs `x`. ] .right-plot[ <img src="data_visualization_x_files/figure-html/hist-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Density Plot] .left-code[ ```r g_fig + geom_density( aes(x = corn_yield) ) ``` `geom_density()` only needs `x`. ] .right-plot[ <img src="data_visualization_x_files/figure-html/density-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Line plot] .left-code[ Create a dataset first: ```r mean_yield <- county_yield %>% group_by(year) %>% summarize( corn_yield = mean(corn_yield, na.rm = TRUE) ) %>% filter(!is.na(year)) ``` Create a line plot: ```r ggplot(data = mean_yield) + geom_line(aes(x = year, y = corn_yield)) ``` + `geom_line()` needs `x` and `y`. ] .right-plot[ <img src="data_visualization_x_files/figure-html/line-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Boxplot] .left-code[ ```r ggplot(data = county_yield) + geom_boxplot( aes(x = factor(year), y = corn_yield) ) ``` + `geom_boxplot()` needs `x` and `y` ] .right-plot[ <img src="data_visualization_x_files/figure-html/box-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Bar plot] .left-code[ ```r ggplot(data = mean_yield) + geom_bar( aes( x = year, y = corn_yield ), stat = "identity" ) ``` + `geom_bar()` needs `x` and `y` ] .right-plot[ <img src="data_visualization_x_files/figure-html/bar-ex-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Modifying how figures look .left-full[ All the elements in the figures we have created so far are in black and white. You can change how figure elements look by providing options inside `geom_*()`. Here are the list of options to control the aesthetics of figures: + fill + color + size + shape + linetype Elements of figures that you can modify differ by `geom` types The same element name can mean different things based on `geom` types ] --- count: false # Scatter Plot .panel1-fig-scatter-f-non_seq[ ```r g_fig + geom_point( aes(x = d3_5_9, y = corn_yield), ) ``` ] .panel2-fig-scatter-f-non_seq[ <img src="data_visualization_x_files/figure-html/fig-scatter-f_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Scatter Plot .panel1-fig-scatter-f-non_seq[ ```r g_fig + geom_point( aes(x = d3_5_9, y = corn_yield), * color = "red", ) ``` ] .panel2-fig-scatter-f-non_seq[ <img src="data_visualization_x_files/figure-html/fig-scatter-f_non_seq_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Scatter Plot .panel1-fig-scatter-f-non_seq[ ```r g_fig + geom_point( aes(x = d3_5_9, y = corn_yield), color = "red", * size = 0.7, ) ``` ] .panel2-fig-scatter-f-non_seq[ <img src="data_visualization_x_files/figure-html/fig-scatter-f_non_seq_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Scatter Plot .panel1-fig-scatter-f-non_seq[ ```r g_fig + geom_point( aes(x = d3_5_9, y = corn_yield), color = "red", size = 0.7, * shape = 0 ) ``` ] .panel2-fig-scatter-f-non_seq[ <img src="data_visualization_x_files/figure-html/fig-scatter-f_non_seq_04_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-fig-scatter-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-fig-scatter-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-fig-scatter-f-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> + `color = "red"`: makes all the squares red + `size = 0.5`: makes the size of the squares smaller + `shape = 0`: change the shape of the points (find other shapes [here](http://www.sthda.com/english/wiki/ggplot2-point-shapes)) --- count: false # Histogram .panel1-hist-f-non_seq[ ```r g_fig + geom_histogram( aes(x = corn_yield), ) ``` ] .panel2-hist-f-non_seq[ <img src="data_visualization_x_files/figure-html/hist-f_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Histogram .panel1-hist-f-non_seq[ ```r g_fig + geom_histogram( aes(x = corn_yield), * color = "blue", ) ``` ] .panel2-hist-f-non_seq[ <img src="data_visualization_x_files/figure-html/hist-f_non_seq_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Histogram .panel1-hist-f-non_seq[ ```r g_fig + geom_histogram( aes(x = corn_yield), color = "blue", * fill = "green", ) ``` ] .panel2-hist-f-non_seq[ <img src="data_visualization_x_files/figure-html/hist-f_non_seq_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Histogram .panel1-hist-f-non_seq[ ```r g_fig + geom_histogram( aes(x = corn_yield), color = "blue", fill = "green", * size = 2, ) ``` ] .panel2-hist-f-non_seq[ <img src="data_visualization_x_files/figure-html/hist-f_non_seq_04_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Histogram .panel1-hist-f-non_seq[ ```r g_fig + geom_histogram( aes(x = corn_yield), color = "blue", fill = "green", size = 2, * shape = 2 ) ``` ] .panel2-hist-f-non_seq[ <img src="data_visualization_x_files/figure-html/hist-f_non_seq_05_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-hist-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-hist-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-hist-f-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> + `color = "blue"`: makes all the boundary of the bars blue + `fill = "green"`: makes the fill of the bars green + `size = 2`: makes the line width of the boundary of the bars thicker + `shape = 2`: does nothing --- count: false # Box Plot .panel1-box-f-non_seq[ ```r ggplot(data = county_yield) + geom_boxplot( aes(x = factor(year), y = corn_yield), ) ``` ] .panel2-box-f-non_seq[ <img src="data_visualization_x_files/figure-html/box-f_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Box Plot .panel1-box-f-non_seq[ ```r ggplot(data = county_yield) + geom_boxplot( aes(x = factor(year), y = corn_yield), * color = "red", ) ``` ] .panel2-box-f-non_seq[ <img src="data_visualization_x_files/figure-html/box-f_non_seq_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Box Plot .panel1-box-f-non_seq[ ```r ggplot(data = county_yield) + geom_boxplot( aes(x = factor(year), y = corn_yield), color = "red", * fill = "orange", ) ``` ] .panel2-box-f-non_seq[ <img src="data_visualization_x_files/figure-html/box-f_non_seq_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Box Plot .panel1-box-f-non_seq[ ```r ggplot(data = county_yield) + geom_boxplot( aes(x = factor(year), y = corn_yield), color = "red", fill = "orange", * size = 0.2, ) ``` ] .panel2-box-f-non_seq[ <img src="data_visualization_x_files/figure-html/box-f_non_seq_04_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Box Plot .panel1-box-f-non_seq[ ```r ggplot(data = county_yield) + geom_boxplot( aes(x = factor(year), y = corn_yield), color = "red", fill = "orange", size = 0.2, * shape = 1 ) ``` ] .panel2-box-f-non_seq[ <img src="data_visualization_x_files/figure-html/box-f_non_seq_05_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-box-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-box-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-box-f-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> + `color = "blue"`: makes all the boundary of the boxes red + `fill = "orange"`: makes the fill of the boxes orange + `size = 0.2`: makes the line width of the boundary of the boxes thinner + `shape = 1`: does nothing --- count: false # Line Plot .panel1-line-f-non_seq[ ```r ggplot(data = mean_yield) + geom_line( aes(x = year, y = corn_yield), ) ``` ] .panel2-line-f-non_seq[ <img src="data_visualization_x_files/figure-html/line-f_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Line Plot .panel1-line-f-non_seq[ ```r ggplot(data = mean_yield) + geom_line( aes(x = year, y = corn_yield), * color = "blue", ) ``` ] .panel2-line-f-non_seq[ <img src="data_visualization_x_files/figure-html/line-f_non_seq_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Line Plot .panel1-line-f-non_seq[ ```r ggplot(data = mean_yield) + geom_line( aes(x = year, y = corn_yield), color = "blue", * size = 1.5, ) ``` ] .panel2-line-f-non_seq[ <img src="data_visualization_x_files/figure-html/line-f_non_seq_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Line Plot .panel1-line-f-non_seq[ ```r ggplot(data = mean_yield) + geom_line( aes(x = year, y = corn_yield), color = "blue", size = 1.5, * fill = "red", ) ``` ] .panel2-line-f-non_seq[ <img src="data_visualization_x_files/figure-html/line-f_non_seq_04_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Line Plot .panel1-line-f-non_seq[ ```r ggplot(data = mean_yield) + geom_line( aes(x = year, y = corn_yield), color = "blue", size = 1.5, fill = "red", * linetype = "dotted", ) ``` ] .panel2-line-f-non_seq[ <img src="data_visualization_x_files/figure-html/line-f_non_seq_05_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-line-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-line-f-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-line-f-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> + `color = "blue"`: makes the line blue + `size = 1.5`: makes the line thicker + `fill = "red"`: does nothing + `linetype = "dotted"`: makes the line dotted --- # Exercises .panelset[ .panel[.panel-name[Instruction] This exercise use the `diamonds` dataset from the `ggplot2()` package. First, load the dataset and extract observations with `Premium` cut whose color is one of `E`, `I`, and `F`: ```r data('diamonds') premium <- diamonds %>% filter( cut=='Premium' & color %in% c('E','I','F') ) #--- take a look ---# premium ``` ``` ## # A tibble: 6,096 x 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 2 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 ## 3 0.22 Premium F SI1 60.4 61 342 3.88 3.84 2.33 ## 4 0.2 Premium E SI2 60.2 62 345 3.79 3.75 2.27 ## 5 0.32 Premium E I1 60.9 58 345 4.38 4.42 2.68 ## 6 0.24 Premium I VS1 62.5 57 355 3.97 3.94 2.47 ## 7 0.290 Premium F SI1 62.4 58 403 4.24 4.26 2.65 ## 8 0.22 Premium E VS2 61.6 58 404 3.93 3.89 2.41 ## 9 0.42 Premium I SI2 61.5 59 552 4.78 4.84 2.96 ## 10 0.24 Premium E VVS1 60.7 58 553 4.01 4.03 2.44 ## # … with 6,086 more rows ``` ] .panel[.panel-name[Exercise 1] <br> Using `carat` and `price` variables from `premium`, generate the figure below: <img src="data_visualization_x_files/figure-html/diamond_fig_1-1.png" width="50%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Exercise 2] <br> Using `price` variables from `premium`, generate a histogram of `price` shown below: <img src="data_visualization_x_files/figure-html/diamond_fig_2-1.png" width="50%" style="display: block; margin: auto;" /> ] ] <!-- #========================================= # Placing more information #========================================= --> --- class: inverse, center, middle name: more-info # Placing more information in one figure <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Placing more information in one figure .panelset[ .panel[.panel-name[Motivation] <br> So far, we have learned how to create popular types of figures. We can make a figure much more informative by making its aesthetics data-dependent. For example, suppose you are interested in comparing the history of irrigated corn yield by state in a line plot. So, you want to create a line for each state and make the lines distinguishable so the readers know which line is for which state like this: <img src="data_visualization_x_files/figure-html/more-info-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[How] .left-full[ We can make the aesthetics of a figure data-dependent by specifying which variable you use for aesthetics differentiation <span style="color:red"> INSIDE </span>`aes()`. Here is an example: <code class ='r hljs remark-code'>ggplot(data = county_yield_mean) +<br> geom_line(<br> aes(y = corn_yield, x = year, <span style='background-color:#ffff7f'>color = state_name</span>)<br> )</code> In this code, `color = state_name` is inside `aes()` and it tells R to divide the data into the groups of State and draw a line by `state_name` (by state) where the lines are color-differentiated. A legend is automatically generated. ] ] .panel[.panel-name[Let's do it] <br> .left-code[ Create a data set of corn yield by state-year first: ```r county_yield_mean <- county_yield %>% group_by(state_name, year) %>% summarize(corn_yield = mean(corn_yield, na.rm = T)) ``` Create a plot: ```r ggplot(data = county_yield_mean) + geom_line( aes( y = corn_yield, x = year, * color = state_name ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/do-it-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- count: false # More examples: Density Plot .panel1-density-more-non_seq[ ```r ggplot(data = county_yield_mean) + geom_density( aes( x = corn_yield, ), alpha = 0.3 ) ``` ] .panel2-density-more-non_seq[ <img src="data_visualization_x_files/figure-html/density-more_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # More examples: Density Plot .panel1-density-more-non_seq[ ```r ggplot(data = county_yield_mean) + geom_density( aes( x = corn_yield, * fill = state_name ), alpha = 0.3 ) ``` ] .panel2-density-more-non_seq[ <img src="data_visualization_x_files/figure-html/density-more_non_seq_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-density-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-density-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-density-more-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false # More examples: Boxplot .panel1-box-more-non_seq[ ```r county_yield %>% filter(state_name %in% c("Nebraska", "Kansas")) %>% ggplot(data = .) + geom_boxplot( aes( x = factor(year), y = corn_yield, ) ) ``` ] .panel2-box-more-non_seq[ <img src="data_visualization_x_files/figure-html/box-more_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # More examples: Boxplot .panel1-box-more-non_seq[ ```r county_yield %>% filter(state_name %in% c("Nebraska", "Kansas")) %>% ggplot(data = .) + geom_boxplot( aes( x = factor(year), y = corn_yield, * fill = state_name ) ) ``` ] .panel2-box-more-non_seq[ <img src="data_visualization_x_files/figure-html/box-more_non_seq_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-box-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-box-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-box-more-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false # More examples: Scatter Plot .panel1-scatter-more-non_seq[ ```r county_yield %>% filter(state_name %in% c("Nebraska", "Kansas")) %>% ggplot(data = .) + geom_point( aes( x = d3_5_9, y = corn_yield, ), size = 0.7 ) ``` ] .panel2-scatter-more-non_seq[ <img src="data_visualization_x_files/figure-html/scatter-more_non_seq_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # More examples: Scatter Plot .panel1-scatter-more-non_seq[ ```r county_yield %>% filter(state_name %in% c("Nebraska", "Kansas")) %>% ggplot(data = .) + geom_point( aes( x = d3_5_9, y = corn_yield, * color = state_name, ), size = 0.7 ) ``` ] .panel2-scatter-more-non_seq[ <img src="data_visualization_x_files/figure-html/scatter-more_non_seq_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # More examples: Scatter Plot .panel1-scatter-more-non_seq[ ```r county_yield %>% filter(state_name %in% c("Nebraska", "Kansas")) %>% ggplot(data = .) + geom_point( aes( x = d3_5_9, y = corn_yield, color = state_name, * shape = state_name ), size = 0.7 ) ``` ] .panel2-scatter-more-non_seq[ <img src="data_visualization_x_files/figure-html/scatter-more_non_seq_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-scatter-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-scatter-more-non_seq { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-scatter-more-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Exercises .panelset[ .panel[.panel-name[Exercise 1] <br> Using `premium`, create a scatter plot of `price` (y-axis) against `depth` (x-axis) by `clarity` as shown below: <img src="data_visualization_x_files/figure-html/ex_2_1-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Exercise 2] <br> Using `premium`, create density plots of `carat` by `color` as shown below (set `alpha` to 0.5): <img src="data_visualization_x_files/figure-html/ex_2_2-1.png" width="60%" style="display: block; margin: auto;" /> ] ] <!-- #========================================= # Faceting #========================================= --> --- class: inverse, center, middle name: faceting # Faceting <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Faceting: Basics .panelset[ .panel[.panel-name[Motivation] Sometimes, you would like to visualize information across groups on separate panels. .left5[ Too much information in one panel? <img src="data_visualization_x_files/figure-html/box_all-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ On separate panels (faceting)? <img src="data_visualization_x_files/figure-html/box-faceted-1-1.png" width="100%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[How] .left-full[ We can make faceted figures by adding either `facet_wrap` or `facet_grid()` in which you specify which variable to use for faceting. Here is an example: <code class ='r hljs remark-code'>ggplot(data = county_yield) + <br> geom_boxplot(<br> aes(x = factor(year), y = corn_yield)<br> ) +<br> <span style='background-color:#ffff7f'>facet_wrap(state_name ~ .)</span></code> In this code, `facet_wrap(state_name ~ .)` is added to a simple boxplot, which tells R to make a boxplot by `state_name` (state). What does `~ .` do? ] ] ] --- count: false # Faceting: an Example .panel1-facet-ex-user[ ```r *ggplot(data = county_yield) + * geom_boxplot( * aes(x = factor(year), y = corn_yield) * ) ``` ] .panel2-facet-ex-user[ <img src="data_visualization_x_files/figure-html/facet-ex_user_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Faceting: an Example .panel1-facet-ex-user[ ```r ggplot(data = county_yield) + geom_boxplot( aes(x = factor(year), y = corn_yield) ) + * facet_wrap(state_name ~ .) ``` ] .panel2-facet-ex-user[ <img src="data_visualization_x_files/figure-html/facet-ex_user_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-facet-ex-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-facet-ex-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-facet-ex-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Faceting: Two-way .panelset[ .panel[.panel-name[two-way faceting] .left-full[ Two-way faceting will + divide the data into groups where each group has a unique combination of the two faceting variables + create a plot for each group ```r ggplot(data = county_yield) + geom_histogram( aes(x = corn_yield) ) + * facet_wrap(state_name ~ year) ``` This code will create a histogram of corn yield for each of the unique state-year combination. ] ] .panel[.panel-name[Figure 2] .left-code[ Filter `county_yield` to those in 2017 and 2018. ```r county_yield_s <- county_yield %>% filter(year %in% c(2017, 2018)) ``` Create a faceted density plots. ```r ggplot(data = county_yield_s) + geom_histogram( aes(x = corn_yield) ) + facet_wrap(state_name ~ year) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/two-ex-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Faceting with `facet_grid()` .panelset[ .panel[.panel-name[compare] .left5[ `facet_wrap` ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_wrap(state_name ~ year) ``` <img src="data_visualization_x_files/figure-html/wrap-ex-1-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ `facet_grid` ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid(state_name ~ year) ``` <img src="data_visualization_x_files/figure-html/frig-ex-1-1.png" width="100%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[facet_grid()] .left-code[ Unlike `facet_wrap()`, which side you put faceting variables matters a lot. + left hand side: rows + right hand side: columns ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid(state_name ~ year) ``` In the code above, `state_name` values become the rows, and `year` values become columns. ] .right-plot[ <img src="data_visualization_x_files/figure-html/grid-matter-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[order] .left5[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid(state_name ~ year) ``` <img src="data_visualization_x_files/figure-html/wrap-left-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid(year ~ state_name) ``` <img src="data_visualization_x_files/figure-html/grid-right-1.png" width="100%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[scale] .left-full[ `facet_grid()` allows + the figures in different columns to have different scales for the x-axis (figures in the same column have the same scale for the x-axis) + the figures in different rows to have different scales for the y-axis (figures in the same rows have the same scale for the x-axis) ] ] <!-- panel ends here --> .panel[.panel-name[free x] .left-code[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, * scales = "free_x" ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/free-x-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[free y] .left-code[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, * scales = "free_y" ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/free-y-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[both free] .left-code[ ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, * scales = "free" ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/both-free-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[strip label] .left-code[ You can change strip labels using the `labeller = ` option inside `facet_grid()` (or `facet_wrap()`). To do this, you need to create a vector of labels you want where its element names are the corresponding values of the faceting variables. Define labels first: ```r #--- the vector values are new strip labels ---# year_labels <- paste("Year = ", c(2017, 2018)) #--- the element names are the values to replace ---# names(year_labels) <- c("2017", "2018") ``` Create a faceted figure with new labels: ```r ggplot(data = county_yield_s) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, * labeller = labeller(year = year_labels) ) ``` By `year = year_labels`, you are applying `year_labels` to the faceting variable `year`. ] .right-plot[ <img src="data_visualization_x_files/figure-html/strip-label-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[or] .left-code[ Or, you could just create a variable that has the values you want to use as labels and use it as a faceting variable: ```r county_yield_s %>% mutate( * year_text = paste0("Year = ", year) ) %>% ggplot(data = .) + geom_histogram(aes(x = corn_yield)) + facet_grid( * state_name ~ year_text ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/alt-strip-label-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Exercises .panelset[ .panel[.panel-name[Exercise 1] <br> Using `premium`, create scatter plots of `price` (y-axis) against `carat` (x-axis) by `color` on separate panels as shown below: <img src="data_visualization_x_files/figure-html/ex_3_1-1.png" width="50%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Exercise 2] <br> Using premium, create histogram of `carat` by `color` and `clarity` on separate panels as shown below: <img src="data_visualization_x_files/figure-html/ex_3_2-1.png" width="50%" style="display: block; margin: auto;" /> ] ] --- # Density plot, histogram, boxplot .panelset[ .panel[.panel-name[density-histogram] Density plots and histograms convey basically the same information. .content-box-green[**Key difference**]: + Density plots are normalized version of histograms so that the area under them are 1. + Histograms convey the information about the number of observations in addition to the distribution .left5[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-14-1.png" width="80%" style="display: block; margin: auto;" /> ] .right5[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-15-1.png" width="80%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[density-box] .content-box-green[**Key difference**]: + a density-plot provides complete information about the distribution of a single variable, but important summary statistics like mean or media are not present + a box in box-plot provides incomplete information about the distribution of a single variable, but it takes up much less space in a figure .left3[ For this reason, boxplots are particularly useful when it is desirable to place the distribution information of a single variable across groups and over time in a single panel (see the figure to the right as an example). You can convey similar information using density plots faceted by year. But, it is often the case that full distribution information is not necessary. ] .right7[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-16-1.png" width="80%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] <!-- panel set ends here --> --- # Preparing datasets .left-full[ We have seen + figures where its main elements (points, lines, boxes, etc) are made color differentiated (e.g., with `aes(color = var)` inside the `geom_*()` function) + faceted figures .content-box-blue[.red[Important]: the dataset has to be in long format to create these types of figures!!] <br> For example consider the following dataset in a wide format: ``` ## county_code state_name 2000 2001 ## 1: 001 Nebraska 161 185 ## 2: 003 Nebraska 159 165 ## 3: 005 Nebraska 130 135 ## 4: 007 Kansas 160 169 ## 5: 007 Nebraska 125 143 ## --- ## 152: 191 Kansas 162 118 ## 153: 193 Kansas 169 197 ## 154: 195 Kansas 122 168 ## 155: 199 Kansas 161 158 ## 156: 203 Kansas 167 170 ``` This dataset has county-level yields for Nebraska, Colorado, and Kansas stored in variables named `2000` and `2001` (they themselves represent years). Imagine creating boxplots of corn yield fill color-differentiated by state and faceted by year. You will have trouble with specifying `facet_grid()` because you do not have a single variable that represents `year`. You will find that reshaping wide datasets using `pivot_longer()` is very useful in creating figures. ] <!-- #========================================= # Other useful geom_* #========================================= --> --- class: inverse, center, middle name: other-geoms # Other supplementary `geom_*()` <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Other supplementary `geom_*()` .panelset[ .panel[.panel-name[geom_*] .left-full[ Here are the list of useful **geom_**. + `geom_vline()`: draw a vertical line + `geom_hline()`: draw a horizontal line + `geom_abline()`: draw a line with the specified intercept and slope + `geom_smooth()`: draw an OLS-estimated regression line (other regression methods available) + `geom_ribbon()`: create a shaded area + `geom_text()` and `annotate()`: add texts in the figure We will use `g_fig_scatter` to illustrate how these functions work. ] ] .panel[.panel-name[vline and hline] .left-code[ ```r g_fig_scatter + geom_vline( xintercept = 10, color = "blue" ) + geom_hline( yintercept = 100, color = "red" ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/hv-line-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[abline] .left-code[ ```r g_fig_scatter + geom_abline( #--- a ---# intercept = 50, #--- b ---# slope = 4, color = "blue" ) ``` `$$y = a + b\times x$$` + `intercept`: `\(a\)` + `slope`: `\(b\)` ] .right-plot[ <img src="data_visualization_x_files/figure-html/ab-line-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[smooth] .left-code[ ```r g_fig_scatter + geom_smooth( aes( y = corn_yield, x = d3_5_9 ) ) ``` Also try ```r g_fig_scatter + geom_smooth( aes( y = corn_yield, x = d3_5_9 ), method = "lm" ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/smooth-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[ribbon] .left-code[ ```r g_fig_scatter + geom_ribbon( aes( x = d3_5_9, ymin = 100, ymax = 200 ), fill = "green", alpha = 0.3 ) ``` + `ymin`: lower bound of the ribbon + `ymax`: upper bound of the ribbon Useful when drawing confidence intervals. ] .right-plot[ <img src="data_visualization_x_files/figure-html/ribbon-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[text] .left-code[ ```r g_fig_scatter + geom_text( aes( x = d3_5_9, y = corn_yield, label = state_name, ) ) ``` + `x`, `y`: position of where texts are placed + `label`: variable to print ] .right-plot[ <img src="data_visualization_x_files/figure-html/text-ex-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[annotate] .left-code[ ```r g_fig_scatter + annotate( 'text', x = 10, y = 50, label = 'Drought hurts \n a lot!!', size = 3, color = "red" ) ``` + `x`: where on x-axis + `y`: where on y-axis + `label`: text to print (\n break the line) + size: font size ] .right-plot[ <img src="data_visualization_x_files/figure-html/annotate-fig-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Multiple datasets in one figure .panelset[ .panel[.panel-name[multiple datasets] .left-full[ <span style="color:red">Important: </span>`data = county_yield` declared inside `ggplot()` applies to ALL the subsequent `geom_*()`s unless overwritten locally inside individual `geom_*()`s. Try this: ```r ggplot() + geom_point(data = county_yield, aes(y = corn_yield, x = d3_5_9)) + geom_smooth(aes(y = corn_yield, x = d3_5_9)) ``` It is easy to use multiple datasets inside a single `ggplot` object (or a figure). You just need to specify what dataset to use locally inside individual `geom_*()`s. Let's see how this works using an example of drawing the confidence intervals around the regression lie of the following regression: <br> `$$corn\_yield = \beta_0 + \beta_1 d3\_5\_9 + v$$` ] ] .panel[.panel-name[Preparation] .left-full[ ```r #--- regression ---# reg <- lm(corn_yield ~ d3_5_9, data = county_yield) #--- find confidence interval ---# min_d3 <- county_yield$d3_5_9 %>% min(na.rm = TRUE) # minimum d3 observed max_d3 <- county_yield$d3_5_9 %>% max(na.rm = TRUE) # maximum d3 observed eval_points <- data.frame(d3_5_9 = seq(min_d3, max_d3, length = 1000)) # evaluation points ci_bound <- predict(reg, newdata = eval_points, interval = "confidence", level = 0.9) # upper and lower bound ci_bound_data <- cbind(eval_points, ci_bound) # combine evaluation points and ci ``` ```r head(ci_bound_data) ``` ``` ## d3_5_9 fit lwr upr ## 1 0.00000000 180.4965 179.5620 181.4311 ## 2 0.02202202 180.4657 179.5332 181.3981 ## 3 0.04404404 180.4349 179.5045 181.3652 ## 4 0.06606607 180.4041 179.4758 181.3324 ## 5 0.08808809 180.3733 179.4470 181.2995 ## 6 0.11011011 180.3424 179.4182 181.2667 ``` ] ] <!-- panel ends here --> ] --- count: false # Multiple datasets in one figure .panel1-mult-geom-user[ ```r *ggplot() + #--- scatter plot ---# * geom_point( * data = county_yield, * aes(y = corn_yield, x = d3_5_9) * ) ``` ] .panel2-mult-geom-user[ <img src="data_visualization_x_files/figure-html/mult-geom_user_01_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Multiple datasets in one figure .panel1-mult-geom-user[ ```r ggplot() + #--- scatter plot ---# geom_point( data = county_yield, aes(y = corn_yield, x = d3_5_9) ) + #--- regression line ---# * geom_line( * data = ci_bound_data, * aes(x = d3_5_9, y = fit), * color = "blue", * size = 1.2 * ) ``` ] .panel2-mult-geom-user[ <img src="data_visualization_x_files/figure-html/mult-geom_user_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Multiple datasets in one figure .panel1-mult-geom-user[ ```r ggplot() + #--- scatter plot ---# geom_point( data = county_yield, aes(y = corn_yield, x = d3_5_9) ) + #--- regression line ---# geom_line( data = ci_bound_data, aes(x = d3_5_9, y = fit), color = "blue", size = 1.2 ) + #--- confidence interval ---# * geom_ribbon( * data = ci_bound_data, * aes(x = d3_5_9, ymin = lwr, ymax = upr), * fill = "red", * alpha = 0.4 * ) ``` ] .panel2-mult-geom-user[ <img src="data_visualization_x_files/figure-html/mult-geom_user_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-mult-geom-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mult-geom-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mult-geom-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> <!-- #========================================= # Make your figures presentable to others #========================================= --> text.html.markdown.rmarkdown meta.block-level.markdown markup.heading.1.markdown --- class: inverse, center, middle name: fine-tune # Make your figures presentable to others <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Make your figures presentable to others .panelset[ .panel[.panel-name[Motivation] .left-full[ + Figures we have created so far cannot be used for formal presentations or publications. They are simply too crude. + We need fine-tune raw figures before they are publishable. + You can control virtually every element of a figure under the `ggplot2` framework. + Take a look at [here](https://ggplot2.tidyverse.org/reference/theme.html) for the complete list of options you can use to modify the theme of figures <span style="color:red"> Key:</span> The most important thing is actually to know which part of a figure a theme option refers to (e.g., `axis.text`) ] ] .panel[.panel-name[two types] .left-full[ ## Two types of operations Operations to make your figures presentable can be categorized into two types: + Content-altering + Theme-altering They are two separate things. ## Examples: For the y-axis title, + The axis title text itself (say "Corn Yield (bu/acre)") falls under the **content** category. + The position of or the font size of the axis-title fall under the **theme** category The content itself does not change when theme is altered. ] ] <!-- panel ends here --> .panel[.panel-name[content-altering] .left5[ Original <img src="data_visualization_x_files/figure-html/original-f-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ Altered <img src="data_visualization_x_files/figure-html/altered-f-1.png" width="100%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[theme-altering] .left5[ Original <img src="data_visualization_x_files/figure-html/t-original-f-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ Altered <img src="data_visualization_x_files/figure-html/t-altered-f-1.png" width="100%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[Note] .left-full[ <br> + Distinctions between the two types of actions are not always clear + But, typically, you use * `scale_*()` function series to alter contents * `theme()` function to alter the theme + Note that there are shorthand convenience functions to alter figure contents for commonly altered parts of figures ] ] <!-- panel ends here --> ] --- # Axes content .panelset[ .panel[.panel-name[Preparation] We are going to build on this figure in this section: .left-code[ ```r county_yield_s_b2010 <- county_yield %>% filter(year >= 2005, year <= 2010) g_box <- ggplot(data = county_yield_s_b2010) + geom_boxplot( aes( x = factor(year), y = corn_yield, fill = state_name ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/g-box-f-a-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[How] .left-full[ We can use + `scale_x_discrete()`/`scale_x_continuous()` for x-axis + `scale_y_discrete()`/`scale_y_continuous()` for y-axis to control the following elements of axes: + `name`: an axis title + `limit`: the range of an axis + `breaks`: axis ticks positions + `label`: axis texts at ticks <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> We use `scale_x_discrete()` if `x` is a discrete variable (not numeric) and `scale_x_continuous()` if `x` is a continuous variable (numeric). The same applies for `y`. ] ] <!-- panel ends here --> .panel[.panel-name[axis title] .left5[ ```r g_box + * scale_x_discrete(name = "Year") + * scale_y_continuous(name = "Corn Yield (bu/acre)") ``` Or just this, ```r g_box + * xlabel("Year") + * ylabel("Corn Yield (bu/acre)") ``` ] .right5[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-24-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[limit] .left5[ ```r g_box + scale_x_discrete( name = "Year" ) + scale_y_continuous( name = "Corn Yield (bu/acre)", #--- first min, second max ---# * limit = c(100, 200) ) ``` Or just, ```r g_box + xlabel("Year") + ylabel("Corn Yield (bu/acre)") + * ylim(100, 200) ``` Or, You can filter the data first and then use the filtered data. ] .right5[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-26-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[breaks and labels] .left5[ ```r g_box + scale_x_discrete( name = "Year", * label = gsub("20", "", as.character(2000:2018)) ) + scale_y_continuous( name = "Corn Yield (bu/acre)", limit = c(100, 200), * breaks = seq(100, 200, by = 10) ) ``` <br> + `breaks`: determines where the ticks are located + `labels`: defines the texts at the ticks ] .right5[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-27-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Exercise .panelset[ .panel[.panel-name[Instruction] .left-code[ Run the following code to create `gg_delay`, which you will build on. ```r library(nycflights13) gg_delay <- flights %>% group_by(origin, month) %>% summarize(mean_arr_delay = mean(arr_delay, na.rm = TRUE)) %>% ggplot(.) + geom_line(aes(y = mean_arr_delay, x = month, color = origin)) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/prep-ex-axes-f-1.png" width="100%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[Exercise 1] .left5[ Change the axes content to create the figure on the right using `scale_x_continuous()` and `scale_y_continuous()`. Here are the list of changes you need to make: + x-axis * change the x-axis title to "Month" * change the limit of the x-axis title to 4 through 8 * change the the breaks and their labels of the x-axis ticks (breaks) to 4 through 8 + y-axis * change the y-axis title to "Average Arrival Delay (minutes)" * change the limit of the y-axis title to 0 through 25 ] .right5[ <img src="data_visualization_x_files/figure-html/g-ex-1-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[Exercise 2] .left5[ Change the axes content to create the figure on the right. But, use `scale_x_continuous()` only for changing the x-axis breaks. Here are the list of changes you need to make: + x-axis * change the x-axis title to "Month" * change the limit of the x-axis title to 4 through 8 * change the the breaks and their labels of the x-axis ticks (breaks) to 4 through 8 + y-axis * change the y-axis title to "Average Arrival Delay (minutes)" * change the limit of the y-axis title to 0 through 25 ] .right5[ <img src="data_visualization_x_files/figure-html/g-ex-2-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] <!-- panel set ends here --> --- # Legends content .panelset[ .panel[.panel-name[Preparation] We are going to build on this figure in this section: .left-code[ ```r g_axis <- g_box + scale_x_discrete( name = "Year", label = gsub("20", "", as.character(2000:2018)) ) + scale_y_continuous( name = "Corn Yield (bu/acre)", limit = c(100, 200), breaks = seq(100, 200, by = 10) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/start-leg-content-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[title] .left-code[ ```r g_axis + * scale_fill_brewer(name = "State") ``` Or, ```r g_axis + * labs(fill = "State") ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-29-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[fill color] .left-code[ ```r g_axis + scale_fill_brewer( name = "state", * palette = "Set1" ) ``` <span style="color:red"> We are going to spend lots of time on color scheme later. </span> ] .right-plot[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-30-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[title position] .left-code[ ```r g_axis + scale_fill_brewer( name = "state", palette = "Set1", * guide = guide_legend( * title.position = "left" * ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-31-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[wrapping] .left-code[ ```r g_axis + scale_fill_brewer( name = "state", palette = "Set1", guide = guide_legend( title.position = "left", * nrow = 2 ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-32-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Exercise .panelset[ .panel[.panel-name[Instruction] .left-code[ Run the following code to create `gg_delay`, which you will build on. ```r library(nycflights13) gg_delay <- flights %>% group_by(origin, month) %>% summarize(mean_arr_delay = mean(arr_delay, na.rm = TRUE)) %>% ggplot(.) + geom_line(aes(y = mean_arr_delay, x = month, color = origin)) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/prep-ex-legends-f-1.png" width="100%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[Exercise] .left5[ Change the legend contents to create the figure on the right. Using `scale_*_brewer()`. You need to identify what goes into `*` in `scale_*_brewer()`. Here are the list of changes you need to make: * change the legend title to "Airports in NY" * change the the legend title position to "bottom" * change the legend items to be spread in 3 columns ] .right5[ <img src="data_visualization_x_files/figure-html/g-ex-legend-1-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] <!-- panel set ends here --> --- # Theme .panelset[ .panel[.panel-name[Naming rules] .left-full[ When specifying the theme of figure elements, it is good to know the naming convention of figure elements: For example: + `axis.title` This refers to the title of both x- and y-axis. Any aesthetic theme you apply to this element will be reflected on the title of both x- and y-axis. + `axis.title.x` This refers to the title of only x-axis. Any aesthetic theme you apply to this element will be reflected on the title of only x-axis. So, basically appending `.name` narrows down the scope of the figure elements the element name refers to. ] ] <!-- panel ends here --> .panel[.panel-name[Common functions] .left-full[ There are common functions we use to specify the aesthetic nature of figure elements based on the type of the elements: <br> + `element_text()`: for text elements like `axis.text`, `axis.title`, `legend.text` Inside the function, you specify things like font size, font family, angle, etc. + `element_rect()`: for box-like elements like `legend.background`, `plot.background`, `strip.background` Inside the function, you specify things like font background color, border line color, etc. + `element_line()`: for line elements like `panel.grid.major`, `axis.line.x` Inside the function, you specify things like line thickness, line color, etc. + `element_blank()`: any components It makes the specified component disappear. + `unit()`: for attributes of figure elements like `legend.key.width`, `legend.box.spacing` ] ] ] --- # Axis theme .panelset[ .panel[.panel-name[Preparation] We are going to build on this figure in this section: .left-code[ ```r g_axis ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/start-axis-theme-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[title and text] .left5[ ```r g_axis + theme( * axis.title.x = element_text(size = 8, color = "red"), * axis.text = element_text(size = 14, family = "Times") ) ``` ] .right5[ <img src="data_visualization_x_files/figure-html/at-font-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[line] .left5[ ```r g_axis + theme( axis.title.x = element_text(size = 8, color = "red"), axis.text = element_text(size = 14, family = "Times"), * axis.line.y = element_line(size = 2, color = "blue") ) ``` ] .right5[ <img src="data_visualization_x_files/figure-html/at-line-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[ticks] .left5[ ```r g_axis + theme( axis.title.x = element_text(size = 8, color = "red"), axis.text = element_text(size = 14, family = "Times"), axis.line.y = element_line(size = 2, color = "blue"), * axis.ticks.length.x = unit(2, "cm") ) ``` ] .right5[ <img src="data_visualization_x_files/figure-html/at-ticks-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Legends theme .panelset[ .panel[.panel-name[How] <br> We can use `them()` to change the aesthetics of legends. Some of the elements include + title + position + key + text + direction + background See [here](https://ggplot2.tidyverse.org/reference/theme.html) for the full list of options related to legends. We will discuss how to change the color scheme of legends later in much detail. ] .panel[.panel-name[Preparation] This is what we will build on: .left-code[ ```r g_l <- g_axis + scale_fill_brewer( palette = "Paired", guide = guide_legend( title.position = "left", nrow = 2 ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/starting-point-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[position] .left-code[ ```r g_l + labs(fill = "State") + * theme( * legend.position = "bottom" * ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/l-position-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[key] .left-code[ ```r g_l + labs(fill = "State") + theme( legend.position = "bottom", * legend.key.height = unit(0.5, "cm"), * legend.key.width = unit(2, "cm") ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/l-key-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[font] .left-code[ ```r g_l + labs(fill = "State") + theme( legend.position = "bottom", legend.key.height = unit(0.5, "cm"), legend.key.width = unit(2, "cm"), * legend.text = element_text( * size = 16, * family = "Times" ), * legend.title = element_text( * size = 6, * family = "Courier", * color = "red" ), ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/l-font-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[background] .left-code[ ```r g_l + labs(fill = "State") + theme( legend.position = "bottom", legend.key.height = unit(0.5, "cm"), legend.key.width = unit(2, "cm"), legend.text = element_text( size = 16, family = "Times" ), legend.title = element_text( size = 6, family = "Courier", color = "red" ), * legend.background = element_rect( * fill = "lightblue", * linetype = "solid" * ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/l-bcg-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Pre-made themes .panelset[ .panel[.panel-name[Instruction] .left-full[ There are a bunch of pre-made themes from the `ggplot2` and `ggthemes` packages that can quickly change how figures look. Install and library `ggthemes` package first: ```r #--- install ---# install.packages("ggthemes") #--- library ---# library("ggthemes") ``` See the full list of pre-made themes [here](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/). ] ] .panel[.panel-name[bw] .left-code[ ```r g_axis + theme_bw() ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/bw-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[void] .left-code[ ```r g_axis + theme_void() ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/void-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[stata] .left-code[ ```r g_axis + theme_stata() ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/stata-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[gdocs] .left-code[ ```r g_axis + theme_gdocs() ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/gdocs-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[economist] .left-code[ ```r g_axis + theme_economist() ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/economist-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[excel] .left-code[ ```r g_axis + theme_excel() ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/excel-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Panel (build on a pre-made theme) .panelset[ .panel[.panel-name[How] .left-full[ ## Build on pre-made theme You can simply override parts of the pre-made theme by adding theme options like this (see more on this [here](#custom-theme)): ```r g_axis + theme_bw() + theme( panel.grid.minor = element_blank() ) ``` So, you can pick the pre-made theme that looks the closest to what you would like, and then add on theme elements to the part you do not like. ] ] .panel[.panel-name[Preparation] .left-code[ This is what we will be building on: ```r g_axis + theme_bw() ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/start-panel-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[minor grid] .left-code[ ```r g_axis + theme_bw() + theme( * panel.grid.minor = element_blank() ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/panel-grid-minor-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[major grid] .left-code[ ```r g_axis + theme_bw() + theme( panel.grid.minor = element_blank(), * panel.grid.major.x = element_blank(), * panel.grid.major.y = element_line( * size = 1, * color = "blue", * linetype = "dotted" * ) ) ``` See [here](http://sape.inf.usi.ch/quick-reference/ggplot2/linetype) for the line types available. ] .right-plot[ <img src="data_visualization_x_files/figure-html/panel-grid-major-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Faceted figure theme .panelset[ .panel[.panel-name[Instruction] <br> Faceted figures have `strip` elements that do no exist for non-faceted figures like + `strip.background` + `strip.placement` + `strip.text` + `panel.spacing` We learn how to modify these elements. ] .panel[.panel-name[Preparation] .left-code[ Create a dataset for this section: ```r county_yield_f <- county_yield %>% filter(state_name %in% c("Nebraska", "Colorado", "Kansas")) %>% filter(year %in% c(2005, 2006)) ``` Create a faceted figure we will build on: ] .right-plot[ <img src="data_visualization_x_files/figure-html/f-prep-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[text] .left-code[ ```r g_f + theme( * strip.text.x = element_text( * size = 12, * family = "Times", * color = "red" * ), * strip.text.y = element_text( * angle = 0, * size = 6, * color = "blue" * ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/st-text-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[background] .left-code[ ```r g_f + theme( strip.text.x = element_text( size = 12, family = "Times", color = "red" ), strip.text.y = element_text( angle = 0, size = 6, color = "blue" ), * strip.background.x = element_rect( * color = "blue" * ), * strip.background.y = element_blank() ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/st-background-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[panel spacing] .left-code[ ```r g_f + theme( strip.text.x = element_text( size = 12, family = "Times", color = "red" ), strip.text.y = element_text( angle = 0, size = 6, color = "blue" ), strip.background.x = element_rect( color = "blue" ), strip.background.y = element_blank(), * panel.spacing.x = unit(2, "cm"), * panel.spacing.y = unit(0.01, "cm") ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/p-spacing-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # More font families .panelset[ .panel[.panel-name[Instruction] .left-full[ You can use more font families by taking advantage of the `extrafont` package. Install and library the package, and then run `font_import()` to import font families (you need to do this only once). ```r install.packages(extrafont) library(extrafont) font_import() ``` Next time around, just run ```r library(extrafont) #--- load fonts ---# loadfonts() ``` ] ] .panel[.panel-name[font families] You can use `fonttable()` to see the list of fonts available: ```r fonttable() %>% dplyr::select(FontName, Bold, Italic) %>% head(20) ``` ``` ## FontName Bold Italic ## 1 -Keyboard FALSE FALSE ## 2 -SFNSDisplay FALSE FALSE ## 3 -SFNSText FALSE FALSE ## 4 -SFNSText-Italic FALSE TRUE ## 5 AndaleMono FALSE FALSE ## 6 AppleBraille FALSE FALSE ## 7 AppleBraille-Outline6Dot FALSE FALSE ## 8 AppleBraille-Outline8Dot FALSE FALSE ## 9 AppleBraille-Pinpoint6Dot FALSE FALSE ## 10 AppleBraille-Pinpoint8Dot FALSE FALSE ## 11 AppleMyungjo FALSE FALSE ## 12 Arial-Black FALSE FALSE ## 13 Arial-BoldItalicMT TRUE TRUE ## 14 Arial-BoldMT TRUE FALSE ## 15 Arial-ItalicMT FALSE TRUE ## 16 ArialMT FALSE FALSE ## 17 ArialNarrow FALSE FALSE ## 18 ArialNarrow-Bold TRUE FALSE ## 19 ArialNarrow-BoldItalic TRUE TRUE ## 20 ArialNarrow-Italic FALSE TRUE ``` ] .panel[.panel-name[try a family] .left-code[ ```r g_f + theme( strip.text.y = element_text( * family = "Georgia", color = "red" ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/f-fam-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # More flexible color options with HEX .panelset[ .panel[.panel-name[Instruction] .left-full[ Instead of naming the color you want to use, you can use **HEX color codes** instead. <span style="color:red"> Direction: </span> + Visit [here](https://www.color-hex.com/) + Click on any color you like + Then you will see two sets of color gradients (thicker and lighter from the color you picked) + Pick the color you like from the color bar and copy the HEX color code beneath the color you picked You could alternatively use the RGB codes, but I do not see any reasons to do so because the use of HEX codes is sufficient. ] ] .panel[.panel-name[Example] .left-code[ ```r ggplot(data = county_yield) + geom_point( aes(y = corn_yield, x = d3_5_9), * color = "#824283" ) ``` You can use HEX color codes for any color-related elements in a figure. ] .right-plot[ <img src="data_visualization_x_files/figure-html/color-hex-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Color scale .panelset[ .panel[.panel-name[Intro 1] .left-full[ The choice of color schemes for your figures are very important (not so much for academic journals ...) We use `scale_A_B()` functions to for color specification: + **A** is the name of aesthetic (`color` or `fill`) + **B** is the type of color specification method ] ] .panel[.panel-name[Intro 2] .left-full[ For example, consider the following code: <code class ='r hljs remark-code'>ggplot(data = county_yield) +<br> geom_point(aes(y = corn_yield, x = d3_5_9, <span style='background-color:#ffff7f'>color = corn_yield</span>))</code> Since it is the `color` aesthetic that we want to work on, **A** = `color`. There are many options for **B**. Indeed, there are so many that, it gets confusing! + `scale_color_brewer()` (discrete) + `scale_color_distiller()` (continuous) + `scale_color_viridis_d()` (discrete) + `scale_color_viridis_c()` (continuous) + `scale_color_continuous()` (continuous) + `scale_color_discrete()` (discrete) + `scale_color_hue()` (discrete) One thing to remember is that you need to be aware of whether the aesthetic variable (here, `corn_yield`) is numeric or not as that determines acceptable type of **B**. ] ] ] --- # Viridis .panelset[ .panel[.panel-name[Instruction] <br> We have four `scale` functions for Viridis color map: + `scale_color_viridis_c()`: for `color` aesthetic with a continuous variable + `scale_color_viridis_d()`: for `color` aesthetic with a discrete variable + `scale_fill_viridis_c()`: for `fill` aesthetic with a continuous variable + `scale_fill_viridis_d()`: for `fill` aesthetic with a discrete variable There are five color scheme types under the Viridis color map: + magma + inferno + plasma + viridis + civiris You can use `option` to specify which one of them you want to use inside the `scale` functions. These color schemes are color-blind sage. ] <!-- panel ends here --> .panel[.panel-name[types] <img src="data_visualization_x_files/figure-html/viridis-ex-1.png" width="80%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Example 1] .left-code[ ```r ggplot(data = filter(county_yield, corn_yield > 50)) + geom_point( aes( y = corn_yield, x = d3_5_9, color = corn_yield ) ) + * scale_color_viridis_c() ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/viridis-ex-1-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Example 2] .left-code[ ```r ggplot(data = filter(county_yield, corn_yield > 50)) + geom_point( aes( y = corn_yield, x = d3_5_9, color = corn_yield ) ) + * scale_color_viridis_c(option = 2) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/viridis-ex-2-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[reverse] .left-code[ ```r ggplot(data = filter(county_yield, corn_yield > 50)) + geom_point( aes( y = corn_yield, x = d3_5_9, color = corn_yield ) ) + scale_color_viridis_c( option = 2, * direction = - 1 ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/viridis-reverse-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # RColorBrewer .panelset[ .panel[.panel-name[Instruction] .left-full[ `RColorBrewer` package provides a number of color palettes of three types: + **sequential**: suitable for a variable that has ordinal meaning (e.g., temperature, precipitation) + **diverging**: suitable for variables that take both negative and positive values (e.g., changes in groundwater level) + **qualitative**: suitable for qualitative or categorical variable These palettes are particularly suitable for maps. <span style="color:red"> Direction: </span>visit [here](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> We use two types of scale functions for the palettes: + `scale_A_brewer()`: for discrete aesthetic variable + `scale_A_distiller()`: for continuous aesthetic variable ] ] .panel[.panel-name[sequential] ```r display.brewer.all(type = "seq") ``` <img src="data_visualization_x_files/figure-html/seq-s-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[diverging] ```r display.brewer.all(type = "div") ``` <img src="data_visualization_x_files/figure-html/div-s-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[qualitative] ```r display.brewer.all(type = "qual") ``` <img src="data_visualization_x_files/figure-html/qua-s-1.png" width="60%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Example 1] .left-code[ Generate a dataset for visualization: ```r county_yield_s_b2010 <- county_yield %>% filter(year >= 2005, year <= 2010) ``` Create a figure: ```r ggplot(data = county_yield_s_b2010) + geom_boxplot( aes( x = factor(year), y = corn_yield, fill = state_name ) ) + * scale_fill_brewer(palette = "Set2") ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/rb-ex-1-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[Example 2] .left-code[ ```r ggplot(data = filter(county_yield, corn_yield > 50)) + geom_point( aes( y = corn_yield, x = d3_5_9, color = corn_yield ) ) + * scale_color_distiller(palette = "RdYlGn") ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/rb-ex-2-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # Set discrete color scale manually .panelset[ .panel[.panel-name[Instruction] .left-full[ Sometimes, you just want to pick colors yourself. In that case, you can use + `scale_color_manual()` + `scale_fill_manual()` Inside the `scale_*_manual()` function, you provide a named vector where a sequence of group names and their corresponding colors are specified to the `scale` function via the `values` option. For example, consider the box plot of corn yield for four states: Colorado, Kansas, Nebraska, and South Dakota. Then, a sample named vector looks like this: ```r ( cols <- c("Colorado" = "red", "Nebraska" = "blue", "Kansas" = "orange", "South Dakota" = "#ff0080") ) ``` Now that a named vector is created, you can do the following to impose the color scheme you just defined. ```r scale_fill_manual(values = cols) ``` ] ] .panel[.panel-name[Example] .left-code[ Define a named color vector: ```r cols <- c("Colorado" = "red", "Nebraska" = "blue", "Kansas" = "orange", "South Dakota" = "#ff0080") ``` Create a figure: ```r ggplot(data = county_yield_s_b2010) + geom_boxplot( aes( x = factor(year), y = corn_yield, fill = state_name ) ) + * scale_fill_manual(values = cols) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/manu-ex-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Set continuous color scale manually .panelset[ .panel[.panel-name[Instruction] .left-full[ .content-box-green[**How**] You can use `scale_*_gradientn()` to create your own continuous color scale. .content-box-green[**Syntax**] ```r `scale_*_gradientn(colors, values)` ``` + `colors`: a vector of colors + `values`: a vector of numeric numbers ranging from 0 to 1 + `limits`: define the lower and upper bounds of the scale bar `n`th value of `colors` is used for the interval defined by `n`th and `n+1`th values in `values`. ] ] .panel[.panel-name[Example] .left4[ Create a figure: ```r ggplot(data = county_yield) + geom_point( aes( x = d1_5_9, y = corn_yield, color = corn_yield ), size = 0.3 ) + scale_color_gradientn( colors = c("red", "orange", "green", "blue"), values = c(0, 0.1, 0.2, 0.9, 1), limits = c(100, 250) ) ``` In this example, green is dominant in the color bar because the interval [0.2, 0.9] is for `"green"` in `colors`, where the interval represents [130, 235] ([100 + (250-100)\times 0.2, 100 + (250-100)\times 0.9]). ] .right6[ <img src="data_visualization_x_files/figure-html/manu-gradient-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Patchwork (grid of figures) .panelset[ .panel[.panel-name[Instruction] .left-code[ The `patchwork` package allows you to combine and arrange multiple figures (and even tables and texts) like the figure to the right: ```r install.packages("patchwork") library("patchwork") ``` See the [`patchwork` package website](https://patchwork.data-imaginist.com/index.html) for a fuller treatment of this package. ] .right-plot[ <img src="data_visualization_x_files/figure-html/patch-1-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Preparation] .left-full[ We are going to use following figures: ```r g_1 <- ggplot(data = county_yield) + geom_histogram(aes(x = corn_yield)) + ggtitle("g_1") g_2 <- ggplot(data = county_yield) + geom_boxplot(aes(x = factor(year), y = corn_yield)) + ggtitle("g_2") g_3 <- ggplot(data = county_yield) + geom_density(aes(x = corn_yield)) + ggtitle("g_3") g_4 <- ggplot(data = mean_yield) + geom_line(aes(x = year, y = corn_yield)) + ggtitle("g_4") ``` Note: `mean_yield` for `g_4` is created in the **Line Plot** tab in slide 9. ] ] .panel[.panel-name[+] ```r g_1 + g_2 ``` <img src="data_visualization_x_files/figure-html/plus-1-1.png" width="75%" style="display: block; margin: auto;" /> ] <!-- panel ends here --> .panel[.panel-name[++] ```r g_1 + g_2 + g_3 + g_4 ``` <img src="data_visualization_x_files/figure-html/pp-1-1.png" width="75%" style="display: block; margin: auto;" /> ] <!-- panel ends here --> .panel[.panel-name[/] ```r g_1 / g_2 ``` <img src="data_visualization_x_files/figure-html/v-1-1.png" width="75%" style="display: block; margin: auto;" /> ] <!-- panel ends here --> .panel[.panel-name[|] ```r g_1 | g_2 ``` <img src="data_visualization_x_files/figure-html/h-1-1.png" width="75%" style="display: block; margin: auto;" /> ] <!-- panel ends here --> .panel[.panel-name[||] ```r g_1 | g_2 | g_3 | g_4 ``` <img src="data_visualization_x_files/figure-html/hh-1-1.png" width="75%" style="display: block; margin: auto;" /> ] <!-- panel ends here --> .panel[.panel-name[()] ```r g_1 | (g_3 / g_4) ``` <img src="data_visualization_x_files/figure-html/group-p-1-1.png" width="75%" style="display: block; margin: auto;" /> ] <!-- panel ends here --> .panel[.panel-name[faceting?] .left-full[ The difference between faceted figures and panel of independent figures + `facet_*()`: faceted figures share the same legend + `patchwork`: individual figures can clearly have independent legends Faceting is not suitable for presenting multiple distinct variables as they need to share the same legend. Imagine plotting temperature (in Celsius) and precipitation (in mm) as faceted figures. <br> <span style="color:red"> Note: </span> + Before you use the `patchwork` packages to arrange figures, think a bit if you really need to do it. Can't you just arrange individual figures in WORD or latex? + It is very useful if the destination of the panels of figures is html (e.g., shiny, flex dashboard) because it is not trivial to arrange figures the way the `patchwork` allows you to (you certainly need to know how **css** works). ] ] ] <!-- #========================================= # Random tips #========================================= --> --- class: inverse, center, middle name: tips # Tips <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Change the order .panelset[ .panel[.panel-name[Preparation] .left-full[ Create a dataset for this section: ```r county_yield_f <- county_yield %>% filter(state_name %in% c("Nebraska", "Colorado", "Kansas")) %>% filter(year %in% c(2005, 2006)) ``` Create a faceted figure we will work on: ```r g_f <- ggplot(data = county_yield_f) + geom_histogram(aes(x = corn_yield)) + facet_grid( state_name ~ year, scales = "free_x" ) ``` ] ] .panel[.panel-name[Problem] <br> + You want the panels of figures to appear in the order of Nebraska, Kansas, and Colorado. + But, by default, `ggplot2` orders panels in alphabetical order <img src="data_visualization_x_files/figure-html/problem-f-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Solution] .left-full[ You can turn `state_name` into a factor with the preferred order of state names. ```r county_yield_f <- county_yield_f %>% mutate( state_name_f = factor(state_name, level = c("Nebraska", "Kansas", "Colorado") ) ) ``` ```r county_yield_f$state_name_f %>% head() ``` ``` ## [1] Kansas Kansas Kansas Kansas Kansas Kansas ## Levels: Nebraska Kansas Colorado ``` ] ] <!-- panel ends here --> .panel[.panel-name[Problem solved] .left-code[ <code class ='r hljs remark-code'>ggplot(data = county_yield_f) + <br> geom_histogram(aes(x = corn_yield)) +<br> facet_grid(<br> <span style='background-color:#ffff7f'>state_name_f</span> ~ year, <br> scales = "free_x"<br> )</code> ] .right-plot[ <img src="data_visualization_x_files/figure-html/problem-f-s-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # More on box plots .panelset[ .panel[.panel-name[this works] .left-code[ ```r county_yield_s_b2010 <- county_yield %>% filter(year >= 2005, year <= 2010) ``` <code class ='r hljs remark-code'>ggplot(data = county_yield_s_b2010) + <br> geom_boxplot(<br> aes(<br> x = <span style='background-color:#ffff7f'>factor(year)</span>, <br> y = corn_yield, <br> fill = state_name<br> )<br> )</code> + The `x` variable has to be discrete (character or factor). + `factor(year)` converts `year` into a `factor` ] .right-plot[ <img src="data_visualization_x_files/figure-html/box-disc-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[not this] .left-code[ ```r county_yield_s_b2010 <- county_yield %>% filter(year >= 2005, year <= 2010) ``` <code class ='r hljs remark-code'>ggplot(data = county_yield_s_b2010) + <br> geom_boxplot(<br> aes(<br> <span style='background-color:#ffff7f'>x = year</span>, <br> y = corn_yield, <br> fill = state_name<br> )<br> )</code> + The `x` variable has to be discrete (character or factor). + `year` is numeric ] .right-plot[ <img src="data_visualization_x_files/figure-html/box-num-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[this works] .left-code[ ```r county_yield_s_b2010 <- county_yield %>% filter(year >= 2005, year <= 2010) ``` <code class ='r hljs remark-code'>ggplot(data = county_yield_s_b2010) + <br> geom_boxplot(<br> aes(<br> <span style='background-color:#ffff7f'>x = as.character(year)</span>, <br> y = corn_yield, <br> fill = state_name<br> )<br> )</code> + The `x` variable has to be discrete (character or factor). + `as.character(year)` converts `year` into a `character` variable ] .right-plot[ <img src="data_visualization_x_files/figure-html/box-char-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[this?] .left-code[ Load the `nycflights13` package to get access to the `weather` dataset. ```r library(nycflights13) ``` Create a box plot of temperature by month: <code class ='r hljs remark-code'>ggplot(data = weather) +<br> geom_boxplot(<br> aes(<br> y = temp, <br> <span style='background-color:#ffff7f'>x = as.character(month)</span><br> )<br> )</code> + Remember using `month` (which is a numeric variable) would not have worked + Notice that 10 and 11 come after 1 ] .right-plot[ <img src="data_visualization_x_files/figure-html/weather-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[this works] .left-code[ Load the `nycflights13` package to get access to the `weather` dataset. ```r library(nycflights13) ``` Create a box plot of temperature by month: ```r ggplot(data = weather) + geom_boxplot( aes( y = temp, * x = factor(month) ) ) ``` When a factor object is created using `factor()`, the order of its levels are set in alphabetical order for a character variable and in numeric order for a numeric variable (that's why it worked). ] .right-plot[ <img src="data_visualization_x_files/figure-html/weather-2-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] --- # More on bar charts .panelset[ .panel[.panel-name[data prep] Create the following data: ```r mean_yield_sy <- county_yield %>% filter(year >= 2010) %>% group_by(state_name, year) %>% summarize( corn_yield = mean(corn_yield, na.rm = TRUE) ) %>% filter(!is.na(year)) ``` ``` ## # A tibble: 27 x 3 ## # Groups: state_name [3] ## state_name year corn_yield ## <chr> <int> <dbl> ## 1 Colorado 2010 196. ## 2 Colorado 2011 186. ## 3 Colorado 2012 160. ## 4 Colorado 2013 184. ## 5 Colorado 2014 187. ## 6 Colorado 2015 181. ## 7 Colorado 2016 173. ## 8 Colorado 2017 206 ## 9 Colorado 2018 175. ## 10 Kansas 2010 182. ## # … with 17 more rows ``` ] <!-- panel ends here --> .panel[.panel-name[Default] .left5[ By default `geom_bar()` creates a bar plot where the height of the bars are proportional to the number of observations in each value of x, meaning you do not need to supply `y` in `aes()`. ```r ggplot(data = mean_yield_sy) + geom_bar( aes( x = year ) ) ``` <img src="data_visualization_x_files/figure-html/bar-1-1.png" width="80%" style="display: block; margin: auto;" /> ] .right5[ You can make the height of bars to be the value of a variable by adding `y = variable` to `aes()` and add `stat = "identity"` option. ```r ggplot(data = mean_yield_sy) + geom_bar( aes( x = year, * y = corn_yield ), stat = "identity" ) ``` <img src="data_visualization_x_files/figure-html/bar-2-1.png" width="80%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[geom_col()] If you would like the height of bars to be the value of a variable, it is better to just use `geom_col()`, which does not ask you to put `stat = "identity"`. <br> .left-code[ ```r ggplot(data = mean_yield_sy) + * geom_col( aes( x = year, y = corn_yield ) ) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/bar-col-f-1.png" width="80%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[position] You might have wondered why `con_yield` is really high. That's because the value of `corn_yield` is automatically stacked horizontally when there are more than one observations per group (here, year). .left5[ No action: ```r ggplot(data = mean_yield_sy) + geom_col( aes( x = year, y = corn_yield ) ) ``` <img src="data_visualization_x_files/figure-html/position-1-1.png" width="70%" style="display: block; margin: auto;" /> ] .right5[ Fill-color differentiated by `state_name`: ```r ggplot(data = mean_yield_sy) + geom_col( aes( x = year, y = corn_yield, * fill = state_name ) ) ``` <img src="data_visualization_x_files/figure-html/position-2-1.png" width="70%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> .panel[.panel-name[position] .left5[ By default, bars are stacked (`position = "stack"`): ```r ggplot(data = mean_yield_sy) + geom_col( aes(x = year, y = corn_yield, fill = state_name) ) ``` <img src="data_visualization_x_files/figure-html/position-s-1.png" width="70%" style="display: block; margin: auto;" /> ] .right5[ You can add `position = "dodge"` to not stack bars vertically: ```r ggplot(data = mean_yield_sy) + geom_col( aes(x = year, y = corn_yield, fill = state_name), * position = "dodge" ) ``` <img src="data_visualization_x_files/figure-html/position-d-1.png" width="70%" style="display: block; margin: auto;" /> ] ] <!-- panel ends here --> ] <!-- panel set ends here --> --- name: custom-theme # Custom theme .panelset[ .panel[.panel-name[Custom theme] .left-full[ You can create your own theme, save it, and then use it later. Here, I am creating my own theme off of `theme_economist()`, where axis titles and major panel grids are absent. ```r my_theme <- theme_economist() + theme( axis.title = element_blank(), panel.grid.major = element_blank() ) ``` You can add `my_theme` like below just like a regular pre-made theme: ```r ggplot(data = weather) + geom_boxplot( aes(y = temp, x = factor(month)) ) + my_theme ``` ] ] .panel[.panel-name[Compare] .left5[ ```r ggplot(data = weather) + geom_boxplot( aes(y = temp, x = factor(month)) ) + * theme_economist() ``` <img src="data_visualization_x_files/figure-html/t-econ-f-1.png" width="80%" style="display: block; margin: auto;" /> ] .right5[ ```r ggplot(data = weather) + geom_boxplot( aes(y = temp, x = factor(month)) ) + * my_theme ``` <img src="data_visualization_x_files/figure-html/t-econ-custom-f-1.png" width="80%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[theme_set()] .left-full[ If you would like to apply your theme to all the figures you generate, then use `theme_set()` like below: ```r theme_set(my_theme) ``` After this, all of your figures will follow `my_theme`. ] ] <!-- panel ends here --> ] <!-- #========================================= # Extensions #========================================= --> --- class: inverse, center, middle name: gallery # Gallery of other type of figures <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Pie charts .left4[ .content-box-green[**When to use**]: Pie charts are useful to visualize the share of multiple categories of the same variable (water use share by the industrial, public, and agricultural sectors) .content-box-green[**Data preparation**] Create a variable with the starting and ending angles for each category based on the share for each group (see next slide) .content-box-green[**How**] You can use `geom_arc_bar()` from the `ggforce` package to create a pie chart ```r library(ggforce) ``` Then use `geom_arc_bar()` to create a pie chart where you provide the starting and ending angles. Note: You can use the following code as a template to create a pie chart. Replace `state_name` in `fill = state_name` with the variable representing the groups in your share data. ] .right5[ <img src="data_visualization_x_files/figure-html/pie-chart-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# *rpie = 1 # pie radius *rlabel = 0.6 * rpie # radius of the labels *county_yield ``` ] .panel2-pie-chart-flip-user[ ``` ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 d2_5_9 ## 1: NA NA 2018 053 Kansas 0.8980 3.8186 13.5279 ## 2: NA NA 2017 053 Kansas 3.9994 7.0006 0.0000 ## 3: NA NA 2016 053 Kansas 0.5724 0.0996 0.0000 ## 4: NA NA 2015 053 Kansas 4.4283 1.6177 0.0000 ## 5: NA NA 2014 053 Kansas 4.7032 9.9327 3.5824 ## --- ## 2960: 53 181 2004 073 Nebraska 0.0000 3.2915 19.7085 ## 2961: 57 195 2003 073 Nebraska 0.0000 7.7427 11.8459 ## 2962: 51 170 2002 073 Nebraska 0.0000 7.0000 1.2978 ## 2963: 56 195 2001 073 Nebraska 5.7915 0.0000 0.0000 ## 2964: 54 147 2000 073 Nebraska 0.0000 4.7386 17.6887 ## d3_5_9 d4_5_9 ## 1: 0.0000 0 ## 2: 0.0000 0 ## 3: 0.0000 0 ## 4: 0.0000 0 ## 5: 4.7817 0 ## --- ## 2960: 0.0000 0 ## 2961: 3.4114 0 ## 2962: 4.7022 9 ## 2963: 0.0000 0 ## 2964: 0.5727 0 ``` ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% * dplyr::select(state_name, corn_yield) ``` ] .panel2-pie-chart-flip-user[ ``` ## state_name corn_yield ## 1: Kansas NA ## 2: Kansas NA ## 3: Kansas NA ## 4: Kansas NA ## 5: Kansas NA ## --- ## 2960: Nebraska 181 ## 2961: Nebraska 195 ## 2962: Nebraska 170 ## 2963: Nebraska 195 ## 2964: Nebraska 147 ``` ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% * group_by(state_name) ``` ] .panel2-pie-chart-flip-user[ ``` ## # A tibble: 2,964 x 2 ## # Groups: state_name [3] ## state_name corn_yield ## <chr> <dbl> ## 1 Kansas NA ## 2 Kansas NA ## 3 Kansas NA ## 4 Kansas NA ## 5 Kansas NA ## 6 Kansas NA ## 7 Kansas NA ## 8 Kansas NA ## 9 Kansas NA ## 10 Kansas NA ## # … with 2,954 more rows ``` ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% * summarize(yield = mean(corn_yield, na.rm = T)) ``` ] .panel2-pie-chart-flip-user[ ``` ## # A tibble: 3 x 2 ## state_name yield ## <chr> <dbl> ## 1 Colorado 168. ## 2 Kansas 173. ## 3 Nebraska 182. ``` ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% summarize(yield = mean(corn_yield, na.rm = T)) %>% * mutate(yield_ratio = yield/sum(yield)) ``` ] .panel2-pie-chart-flip-user[ ``` ## # A tibble: 3 x 3 ## state_name yield yield_ratio ## <chr> <dbl> <dbl> ## 1 Colorado 168. 0.322 ## 2 Kansas 173. 0.331 ## 3 Nebraska 182. 0.347 ``` ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% summarize(yield = mean(corn_yield, na.rm = T)) %>% mutate(yield_ratio = yield/sum(yield)) %>% * mutate( * end_angle = 2*pi*cumsum(yield_ratio), * start_angle = lag(end_angle, default = 0), * mid_angle = (end_angle + start_angle)/2 * ) ``` ] .panel2-pie-chart-flip-user[ ``` ## # A tibble: 3 x 6 ## state_name yield yield_ratio end_angle start_angle mid_angle ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Colorado 168. 0.322 2.02 0 1.01 ## 2 Kansas 173. 0.331 4.10 2.02 3.06 ## 3 Nebraska 182. 0.347 6.28 4.10 5.19 ``` ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% summarize(yield = mean(corn_yield, na.rm = T)) %>% mutate(yield_ratio = yield/sum(yield)) %>% mutate( end_angle = 2*pi*cumsum(yield_ratio), start_angle = lag(end_angle, default = 0), mid_angle = (end_angle + start_angle)/2 ) %>% * ggplot(data = .) ``` ] .panel2-pie-chart-flip-user[ <img src="data_visualization_x_files/figure-html/pie-chart-flip_user_07_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% summarize(yield = mean(corn_yield, na.rm = T)) %>% mutate(yield_ratio = yield/sum(yield)) %>% mutate( end_angle = 2*pi*cumsum(yield_ratio), start_angle = lag(end_angle, default = 0), mid_angle = (end_angle + start_angle)/2 ) %>% ggplot(data = .) + * geom_arc_bar( * aes( * x0 = 0, y0 = 0, r0 = 0, r = rpie, * start = start_angle, end = end_angle, * fill = state_name * ) * ) ``` ] .panel2-pie-chart-flip-user[ <img src="data_visualization_x_files/figure-html/pie-chart-flip_user_08_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% summarize(yield = mean(corn_yield, na.rm = T)) %>% mutate(yield_ratio = yield/sum(yield)) %>% mutate( end_angle = 2*pi*cumsum(yield_ratio), start_angle = lag(end_angle, default = 0), mid_angle = (end_angle + start_angle)/2 ) %>% ggplot(data = .) + geom_arc_bar( aes( x0 = 0, y0 = 0, r0 = 0, r = rpie, start = start_angle, end = end_angle, fill = state_name ) ) + * geom_text( * aes( * x = rlabel * sin(mid_angle), * y = rlabel * cos(mid_angle), * label = state_name * ), hjust = 0.5, vjust = 0.5 * ) ``` ] .panel2-pie-chart-flip-user[ <img src="data_visualization_x_files/figure-html/pie-chart-flip_user_09_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% summarize(yield = mean(corn_yield, na.rm = T)) %>% mutate(yield_ratio = yield/sum(yield)) %>% mutate( end_angle = 2*pi*cumsum(yield_ratio), start_angle = lag(end_angle, default = 0), mid_angle = (end_angle + start_angle)/2 ) %>% ggplot(data = .) + geom_arc_bar( aes( x0 = 0, y0 = 0, r0 = 0, r = rpie, start = start_angle, end = end_angle, fill = state_name ) ) + geom_text( aes( x = rlabel * sin(mid_angle), y = rlabel * cos(mid_angle), label = state_name ), hjust = 0.5, vjust = 0.5 ) + * coord_fixed() ``` ] .panel2-pie-chart-flip-user[ <img src="data_visualization_x_files/figure-html/pie-chart-flip_user_10_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% summarize(yield = mean(corn_yield, na.rm = T)) %>% mutate(yield_ratio = yield/sum(yield)) %>% mutate( end_angle = 2*pi*cumsum(yield_ratio), start_angle = lag(end_angle, default = 0), mid_angle = (end_angle + start_angle)/2 ) %>% ggplot(data = .) + geom_arc_bar( aes( x0 = 0, y0 = 0, r0 = 0, r = rpie, start = start_angle, end = end_angle, fill = state_name ) ) + geom_text( aes( x = rlabel * sin(mid_angle), y = rlabel * cos(mid_angle), label = state_name ), hjust = 0.5, vjust = 0.5 ) + coord_fixed() + * theme_void() ``` ] .panel2-pie-chart-flip-user[ <img src="data_visualization_x_files/figure-html/pie-chart-flip_user_11_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-pie-chart-flip-user[ ```r #--- define two parameters ---# rpie = 1 # pie radius rlabel = 0.6 * rpie # radius of the labels county_yield %>% dplyr::select(state_name, corn_yield) %>% group_by(state_name) %>% summarize(yield = mean(corn_yield, na.rm = T)) %>% mutate(yield_ratio = yield/sum(yield)) %>% mutate( end_angle = 2*pi*cumsum(yield_ratio), start_angle = lag(end_angle, default = 0), mid_angle = (end_angle + start_angle)/2 ) %>% ggplot(data = .) + geom_arc_bar( aes( x0 = 0, y0 = 0, r0 = 0, r = rpie, start = start_angle, end = end_angle, fill = state_name ) ) + geom_text( aes( x = rlabel * sin(mid_angle), y = rlabel * cos(mid_angle), label = state_name ), hjust = 0.5, vjust = 0.5 ) + coord_fixed() + theme_void() + * scale_fill_viridis_d("State") ``` ] .panel2-pie-chart-flip-user[ <img src="data_visualization_x_files/figure-html/pie-chart-flip_user_12_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-pie-chart-flip-user { color: black; width: 39.2%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-pie-chart-flip-user { color: black; width: 58.8%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-pie-chart-flip-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Correlation plot .panelset[ .panel[.panel-name[Instruction] .left-full[ + Correlation plots visualize the degree of correlation between variables from a correlation matrix + Here, we use the `ggcorrplot` package, which is one of the packages that lets you make a correlation plot (others include the `corrplot`, `GGally` packages). ```r library(ggcorrplot) ``` ] ] .panel[.panel-name[How] .left-full[ + Create a correlation matrix using `cor()` on a dataset + Apply `ggcorrplot()` to the correlation matrix ] ] <!-- panel ends here --> .panel[.panel-name[Example] .left4[ ```r weather %>% na.omit() %>% dplyr::select(where(is.numeric)) %>% dplyr::select(- year) %>% cor() %>% # create a cor matrix #--- create a cor plot ---# ggcorrplot(., type = "lower", hc.order = TRUE ) + theme( legend.position = "bottom" ) ``` + You can add `lab = TRUE` in `ggcorrplot()` to display the correlation coefficient numbers in the squares as well ] .right6[ <img src="data_visualization_x_files/figure-html/cor-plot-f-1.png" width="100%" style="display: block; margin: auto;" /> ] ] ] --- # Diverging chart (variant of a bar chart) .left4[ .content-box-green[**When to use**]: Diverging charts can be useful to visualize the heterogeneity of a single variable across groups. .content-box-green[**Data preparation**] Diverging charts are just bar chart with y-axis and x-axis flipped. So, you just need a single value per group in long format. .content-box-green[**How**] You can use `geom_bar()` with `coord_flip()`. See the next slide for a demonstration. ] .right6[ <img src="data_visualization_x_files/figure-html/diverging2-1.png" width="90%" style="display: block; margin: auto;" /> ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r *county_yield ``` ] .panel2-divergin-plot-user[ ``` ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 d2_5_9 ## 1: NA NA 2018 053 Kansas 0.8980 3.8186 13.5279 ## 2: NA NA 2017 053 Kansas 3.9994 7.0006 0.0000 ## 3: NA NA 2016 053 Kansas 0.5724 0.0996 0.0000 ## 4: NA NA 2015 053 Kansas 4.4283 1.6177 0.0000 ## 5: NA NA 2014 053 Kansas 4.7032 9.9327 3.5824 ## --- ## 2960: 53 181 2004 073 Nebraska 0.0000 3.2915 19.7085 ## 2961: 57 195 2003 073 Nebraska 0.0000 7.7427 11.8459 ## 2962: 51 170 2002 073 Nebraska 0.0000 7.0000 1.2978 ## 2963: 56 195 2001 073 Nebraska 5.7915 0.0000 0.0000 ## 2964: 54 147 2000 073 Nebraska 0.0000 4.7386 17.6887 ## d3_5_9 d4_5_9 ## 1: 0.0000 0 ## 2: 0.0000 0 ## 3: 0.0000 0 ## 4: 0.0000 0 ## 5: 4.7817 0 ## --- ## 2960: 0.0000 0 ## 2961: 3.4114 0 ## 2962: 4.7022 9 ## 2963: 0.0000 0 ## 2964: 0.5727 0 ``` ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% * filter(state_name == "Nebraska") %>% * filter(!is.na(corn_yield)) %>% * group_by(county_code) %>% * summarize(corn_yield = mean(corn_yield)) ``` ] .panel2-divergin-plot-user[ ``` ## # A tibble: 82 x 2 ## county_code corn_yield ## <chr> <dbl> ## 1 001 195. ## 2 003 195. ## 3 005 147 ## 4 007 144. ## 5 009 145. ## 6 011 186. ## 7 013 162. ## 8 015 183. ## 9 017 179. ## 10 019 197. ## # … with 72 more rows ``` ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% * mutate( * yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), * below_above = ifelse(yield_norm < 0, "below", "above") * ) ``` ] .panel2-divergin-plot-user[ ``` ## # A tibble: 82 x 4 ## county_code corn_yield yield_norm below_above ## <chr> <dbl> <dbl> <chr> ## 1 001 195. 1.08 above ## 2 003 195. 1.12 above ## 3 005 147 -2.23 below ## 4 007 144. -2.43 below ## 5 009 145. -2.36 below ## 6 011 186. 0.485 above ## 7 013 162. -1.18 below ## 8 015 183. 0.231 above ## 9 017 179. -0.0206 below ## 10 019 197. 1.20 above ## # … with 72 more rows ``` ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% * head(20) ``` ] .panel2-divergin-plot-user[ ``` ## # A tibble: 20 x 4 ## county_code corn_yield yield_norm below_above ## <chr> <dbl> <dbl> <chr> ## 1 001 195. 1.08 above ## 2 003 195. 1.12 above ## 3 005 147 -2.23 below ## 4 007 144. -2.43 below ## 5 009 145. -2.36 below ## 6 011 186. 0.485 above ## 7 013 162. -1.18 below ## 8 015 183. 0.231 above ## 9 017 179. -0.0206 below ## 10 019 197. 1.20 above ## 11 021 181. 0.0950 above ## 12 023 187. 0.552 above ## 13 027 189. 0.683 above ## 14 029 191. 0.841 above ## 15 031 176. -0.202 below ## 16 033 156. -1.60 below ## 17 035 191. 0.812 above ## 18 037 184. 0.355 above ## 19 039 194. 0.990 above ## 20 041 186. 0.463 above ``` ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% head(20) %>% * arrange(yield_norm) ``` ] .panel2-divergin-plot-user[ ``` ## # A tibble: 20 x 4 ## county_code corn_yield yield_norm below_above ## <chr> <dbl> <dbl> <chr> ## 1 007 144. -2.43 below ## 2 009 145. -2.36 below ## 3 005 147 -2.23 below ## 4 033 156. -1.60 below ## 5 013 162. -1.18 below ## 6 031 176. -0.202 below ## 7 017 179. -0.0206 below ## 8 021 181. 0.0950 above ## 9 015 183. 0.231 above ## 10 037 184. 0.355 above ## 11 041 186. 0.463 above ## 12 011 186. 0.485 above ## 13 023 187. 0.552 above ## 14 027 189. 0.683 above ## 15 035 191. 0.812 above ## 16 029 191. 0.841 above ## 17 039 194. 0.990 above ## 18 001 195. 1.08 above ## 19 003 195. 1.12 above ## 20 019 197. 1.20 above ``` ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% head(20) %>% arrange(yield_norm) %>% * mutate( * county_code_f = factor(county_code, levels = .$county_code) * ) ``` ] .panel2-divergin-plot-user[ ``` ## # A tibble: 20 x 5 ## county_code corn_yield yield_norm below_above county_code_f ## <chr> <dbl> <dbl> <chr> <fct> ## 1 007 144. -2.43 below 007 ## 2 009 145. -2.36 below 009 ## 3 005 147 -2.23 below 005 ## 4 033 156. -1.60 below 033 ## 5 013 162. -1.18 below 013 ## 6 031 176. -0.202 below 031 ## 7 017 179. -0.0206 below 017 ## 8 021 181. 0.0950 above 021 ## 9 015 183. 0.231 above 015 ## 10 037 184. 0.355 above 037 ## 11 041 186. 0.463 above 041 ## 12 011 186. 0.485 above 011 ## 13 023 187. 0.552 above 023 ## 14 027 189. 0.683 above 027 ## 15 035 191. 0.812 above 035 ## 16 029 191. 0.841 above 029 ## 17 039 194. 0.990 above 039 ## 18 001 195. 1.08 above 001 ## 19 003 195. 1.12 above 003 ## 20 019 197. 1.20 above 019 ``` ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% head(20) %>% arrange(yield_norm) %>% mutate( county_code_f = factor(county_code, levels = .$county_code) ) %>% * ggplot(data = .) ``` ] .panel2-divergin-plot-user[ <img src="data_visualization_x_files/figure-html/divergin-plot_user_07_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% head(20) %>% arrange(yield_norm) %>% mutate( county_code_f = factor(county_code, levels = .$county_code) ) %>% ggplot(data = .) + * geom_bar( * aes(fill = below_above, x = county_code_f, y = yield_norm, label = county_code_f), * stat = 'identity', * width = 0.5 * ) ``` ] .panel2-divergin-plot-user[ <img src="data_visualization_x_files/figure-html/divergin-plot_user_08_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% head(20) %>% arrange(yield_norm) %>% mutate( county_code_f = factor(county_code, levels = .$county_code) ) %>% ggplot(data = .) + geom_bar( aes(fill = below_above, x = county_code_f, y = yield_norm, label = county_code_f), stat = 'identity', width = 0.5 ) + * scale_fill_manual( * name = "Productivity", * labels = c("Above Average", "Below Average"), * values = c("above" = "#00ba38", "below" = "#f8766d") * ) ``` ] .panel2-divergin-plot-user[ <img src="data_visualization_x_files/figure-html/divergin-plot_user_09_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% head(20) %>% arrange(yield_norm) %>% mutate( county_code_f = factor(county_code, levels = .$county_code) ) %>% ggplot(data = .) + geom_bar( aes(fill = below_above, x = county_code_f, y = yield_norm, label = county_code_f), stat = 'identity', width = 0.5 ) + scale_fill_manual( name = "Productivity", labels = c("Above Average", "Below Average"), values = c("above" = "#00ba38", "below" = "#f8766d") ) + * coord_flip() ``` ] .panel2-divergin-plot-user[ <img src="data_visualization_x_files/figure-html/divergin-plot_user_10_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% head(20) %>% arrange(yield_norm) %>% mutate( county_code_f = factor(county_code, levels = .$county_code) ) %>% ggplot(data = .) + geom_bar( aes(fill = below_above, x = county_code_f, y = yield_norm, label = county_code_f), stat = 'identity', width = 0.5 ) + scale_fill_manual( name = "Productivity", labels = c("Above Average", "Below Average"), values = c("above" = "#00ba38", "below" = "#f8766d") ) + coord_flip() + * labs(x = "Normalized Yield", y = "County Code") ``` ] .panel2-divergin-plot-user[ <img src="data_visualization_x_files/figure-html/divergin-plot_user_11_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Diverging chart: step by step .panel1-divergin-plot-user[ ```r county_yield %>% filter(state_name == "Nebraska") %>% filter(!is.na(corn_yield)) %>% group_by(county_code) %>% summarize(corn_yield = mean(corn_yield)) %>% mutate( yield_norm = (corn_yield - mean(corn_yield))/sd(corn_yield), below_above = ifelse(yield_norm < 0, "below", "above") ) %>% head(20) %>% arrange(yield_norm) %>% mutate( county_code_f = factor(county_code, levels = .$county_code) ) %>% ggplot(data = .) + geom_bar( aes(fill = below_above, x = county_code_f, y = yield_norm, label = county_code_f), stat = 'identity', width = 0.5 ) + scale_fill_manual( name = "Productivity", labels = c("Above Average", "Below Average"), values = c("above" = "#00ba38", "below" = "#f8766d") ) + coord_flip() + labs(x = "Normalized Yield", y = "County Code") + * theme( * legend.position = "bottom", * axis.text.y = element_text(size = 6) * ) ``` ] .panel2-divergin-plot-user[ <img src="data_visualization_x_files/figure-html/divergin-plot_user_12_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-divergin-plot-user { color: black; width: 39.2%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-divergin-plot-user { color: black; width: 58.8%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-divergin-plot-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Area chart .left4[ <br> .content-box-green[**When to use**]: Area charts are useful to visualize the share of multiple categories of the same object (e.g., energy production by energy sources) and also their magnitude at the same time. .content-box-green[**Data preparation**]: You just need `y` and `x` just like a line plot. You do not have to calculate the height (`y` value) yourself. `geom_area()` will automatically stack `y` values vertically for you (see the next slide). ] .right6[ <img src="data_visualization_x_files/figure-html/area-chart-1.png" width="90%" style="display: block; margin: auto;" /> ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r *county_yield ``` ] .panel2-area-chart-flip-user[ ``` ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 d2_5_9 ## 1: NA NA 2018 053 Kansas 0.8980 3.8186 13.5279 ## 2: NA NA 2017 053 Kansas 3.9994 7.0006 0.0000 ## 3: NA NA 2016 053 Kansas 0.5724 0.0996 0.0000 ## 4: NA NA 2015 053 Kansas 4.4283 1.6177 0.0000 ## 5: NA NA 2014 053 Kansas 4.7032 9.9327 3.5824 ## --- ## 2960: 53 181 2004 073 Nebraska 0.0000 3.2915 19.7085 ## 2961: 57 195 2003 073 Nebraska 0.0000 7.7427 11.8459 ## 2962: 51 170 2002 073 Nebraska 0.0000 7.0000 1.2978 ## 2963: 56 195 2001 073 Nebraska 5.7915 0.0000 0.0000 ## 2964: 54 147 2000 073 Nebraska 0.0000 4.7386 17.6887 ## d3_5_9 d4_5_9 ## 1: 0.0000 0 ## 2: 0.0000 0 ## 3: 0.0000 0 ## 4: 0.0000 0 ## 5: 4.7817 0 ## --- ## 2960: 0.0000 0 ## 2961: 3.4114 0 ## 2962: 4.7022 9 ## 2963: 0.0000 0 ## 2964: 0.5727 0 ``` ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% * filter(!is.na(corn_yield)) ``` ] .panel2-area-chart-flip-user[ ``` ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 d2_5_9 ## 1: 42.0 123.0 2000 053 Kansas 2.4856 2.8664 0.1336 ## 2: NA 188.2 2017 095 Kansas 8.7246 0.0000 0.0000 ## 3: 58.4 168.7 2016 095 Kansas 1.0000 0.0000 0.0000 ## 4: NA 197.9 2015 095 Kansas 1.7604 1.2120 2.0878 ## 5: NA 151.7 2012 095 Kansas 6.2804 1.4699 9.5382 ## --- ## 1952: 53.0 181.0 2004 073 Nebraska 0.0000 3.2915 19.7085 ## 1953: 57.0 195.0 2003 073 Nebraska 0.0000 7.7427 11.8459 ## 1954: 51.0 170.0 2002 073 Nebraska 0.0000 7.0000 1.2978 ## 1955: 56.0 195.0 2001 073 Nebraska 5.7915 0.0000 0.0000 ## 1956: 54.0 147.0 2000 073 Nebraska 0.0000 4.7386 17.6887 ## d3_5_9 d4_5_9 ## 1: 0.0000 0 ## 2: 0.0000 0 ## 3: 0.0000 0 ## 4: 0.0000 0 ## 5: 4.4618 0 ## --- ## 1952: 0.0000 0 ## 1953: 3.4114 0 ## 1954: 4.7022 9 ## 1955: 0.0000 0 ## 1956: 0.5727 0 ``` ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% filter(!is.na(corn_yield)) %>% * group_by(state_name, year) ``` ] .panel2-area-chart-flip-user[ ``` ## # A tibble: 1,956 x 10 ## # Groups: state_name, year [56] ## soy_yield corn_yield year county_code state_name d0_5_9 d1_5_9 d2_5_9 d3_5_9 ## <dbl> <dbl> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 42 123 2000 053 Kansas 2.49 2.87 0.134 0 ## 2 NA 188. 2017 095 Kansas 8.72 0 0 0 ## 3 58.4 169. 2016 095 Kansas 1 0 0 0 ## 4 NA 198. 2015 095 Kansas 1.76 1.21 2.09 0 ## 5 NA 152. 2012 095 Kansas 6.28 1.47 9.54 4.46 ## 6 42 170 2007 095 Kansas 0 0 0 0 ## 7 49 193 2005 095 Kansas 4.32 0 0 0 ## 8 47 173 2003 095 Kansas 2.29 5.16 4.46 1.09 ## 9 40 165 2002 095 Kansas 3.71 1.48 1.90 0 ## 10 52 171 2001 095 Kansas 9.88 0.188 0 0 ## # … with 1,946 more rows, and 1 more variable: d4_5_9 <dbl> ``` ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% filter(!is.na(corn_yield)) %>% group_by(state_name, year) %>% * summarize(corn_yield = mean(corn_yield)) ``` ] .panel2-area-chart-flip-user[ ``` ## # A tibble: 56 x 3 ## # Groups: state_name [3] ## state_name year corn_yield ## <chr> <int> <dbl> ## 1 Colorado 2000 157. ## 2 Colorado 2001 159. ## 3 Colorado 2002 139. ## 4 Colorado 2003 150. ## 5 Colorado 2004 169. ## 6 Colorado 2005 168. ## 7 Colorado 2006 187. ## 8 Colorado 2007 186. ## 9 Colorado 2008 164. ## 10 Colorado 2010 196. ## # … with 46 more rows ``` ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% filter(!is.na(corn_yield)) %>% group_by(state_name, year) %>% summarize(corn_yield = mean(corn_yield)) %>% * ggplot(data = .) ``` ] .panel2-area-chart-flip-user[ <img src="data_visualization_x_files/figure-html/area-chart-flip_user_05_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% filter(!is.na(corn_yield)) %>% group_by(state_name, year) %>% summarize(corn_yield = mean(corn_yield)) %>% ggplot(data = .) + * geom_area( * aes( * y = corn_yield, * x = year, * fill = state_name * ), * stat = "identity" * ) ``` ] .panel2-area-chart-flip-user[ <img src="data_visualization_x_files/figure-html/area-chart-flip_user_06_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% filter(!is.na(corn_yield)) %>% group_by(state_name, year) %>% summarize(corn_yield = mean(corn_yield)) %>% ggplot(data = .) + geom_area( aes( y = corn_yield, x = year, fill = state_name ), stat = "identity" ) + * labs(x = "Year", y = "Corn Yield") ``` ] .panel2-area-chart-flip-user[ <img src="data_visualization_x_files/figure-html/area-chart-flip_user_07_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% filter(!is.na(corn_yield)) %>% group_by(state_name, year) %>% summarize(corn_yield = mean(corn_yield)) %>% ggplot(data = .) + geom_area( aes( y = corn_yield, x = year, fill = state_name ), stat = "identity" ) + labs(x = "Year", y = "Corn Yield") + * scale_fill_viridis_d(name = "State") ``` ] .panel2-area-chart-flip-user[ <img src="data_visualization_x_files/figure-html/area-chart-flip_user_08_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% filter(!is.na(corn_yield)) %>% group_by(state_name, year) %>% summarize(corn_yield = mean(corn_yield)) %>% ggplot(data = .) + geom_area( aes( y = corn_yield, x = year, fill = state_name ), stat = "identity" ) + labs(x = "Year", y = "Corn Yield") + scale_fill_viridis_d(name = "State") + * theme_bw() ``` ] .panel2-area-chart-flip-user[ <img src="data_visualization_x_files/figure-html/area-chart-flip_user_09_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Area chart: step by step .panel1-area-chart-flip-user[ ```r county_yield %>% filter(!is.na(corn_yield)) %>% group_by(state_name, year) %>% summarize(corn_yield = mean(corn_yield)) %>% ggplot(data = .) + geom_area( aes( y = corn_yield, x = year, fill = state_name ), stat = "identity" ) + labs(x = "Year", y = "Corn Yield") + scale_fill_viridis_d(name = "State") + theme_bw() + * theme(legend.position = "bottom") ``` ] .panel2-area-chart-flip-user[ <img src="data_visualization_x_files/figure-html/area-chart-flip_user_10_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-area-chart-flip-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-area-chart-flip-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-area-chart-flip-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Mean and SE chart .left4[ .content-box-green[**When to use**]: Mean and SE charts are useful for reporting mean of variables and their uncertainty in the form of standard errors. .content-box-green[**Data preparation**]: For each group, you need to supply the mean of the variable and the upper and lower bounds (e.g., 95% confidence interval). ] .right6[ <img src="data_visualization_x_files/figure-html/mean-se-1.png" width="90%" style="display: block; margin: auto;" /> ] --- count: false # Mean and SE chart: step by step .panel1-mean-se-flip-user[ ```r *data(Salaries, package="carData") *Salaries ``` ] .panel2-mean-se-flip-user[ ``` ## rank discipline yrs.since.phd yrs.service sex salary ## 1 Prof B 19 18 Male 139750 ## 2 Prof B 20 16 Male 173200 ## 3 AsstProf B 4 3 Male 79750 ## 4 Prof B 45 39 Male 115000 ## 5 Prof B 40 41 Male 141500 ## 6 AssocProf B 6 6 Male 97000 ## 7 Prof B 30 23 Male 175000 ## 8 Prof B 45 45 Male 147765 ## 9 Prof B 21 20 Male 119250 ## 10 Prof B 18 18 Female 129000 ## 11 AssocProf B 12 8 Male 119800 ## 12 AsstProf B 7 2 Male 79800 ## 13 AsstProf B 1 1 Male 77700 ## 14 AsstProf B 2 0 Male 78000 ## 15 Prof B 20 18 Male 104800 ## 16 Prof B 12 3 Male 117150 ## 17 Prof B 19 20 Male 101000 ## 18 Prof A 38 34 Male 103450 ## 19 Prof A 37 23 Male 124750 ## 20 Prof A 39 36 Female 137000 ## 21 Prof A 31 26 Male 89565 ## 22 Prof A 36 31 Male 102580 ## 23 Prof A 34 30 Male 93904 ## 24 Prof A 24 19 Male 113068 ## 25 AssocProf A 13 8 Female 74830 ## 26 Prof A 21 8 Male 106294 ## 27 Prof A 35 23 Male 134885 ## 28 AsstProf B 5 3 Male 82379 ## 29 AsstProf B 11 0 Male 77000 ## 30 Prof B 12 8 Male 118223 ## 31 Prof B 20 4 Male 132261 ## 32 AsstProf B 7 2 Male 79916 ## 33 Prof B 13 9 Male 117256 ## 34 AsstProf B 4 2 Male 80225 ## 35 AsstProf B 4 2 Female 80225 ## 36 AsstProf B 5 0 Female 77000 ## 37 Prof B 22 21 Male 155750 ## 38 AsstProf B 7 4 Male 86373 ## 39 Prof B 41 31 Male 125196 ## 40 AssocProf B 9 9 Male 100938 ## 41 Prof B 23 2 Male 146500 ## 42 AssocProf B 23 23 Male 93418 ## 43 Prof B 40 27 Male 101299 ## 44 Prof B 38 38 Male 231545 ## 45 Prof B 19 19 Male 94384 ## 46 Prof B 25 15 Male 114778 ## 47 Prof B 40 28 Male 98193 ## 48 Prof B 23 19 Female 151768 ## 49 Prof B 25 25 Female 140096 ## 50 AsstProf B 1 1 Male 70768 ## 51 Prof B 28 28 Male 126621 ## 52 Prof B 12 11 Male 108875 ## 53 AsstProf B 11 3 Female 74692 ## 54 Prof B 16 9 Male 106639 ## 55 AssocProf B 12 11 Male 103760 ## 56 AssocProf B 14 5 Male 83900 ## 57 Prof B 23 21 Male 117704 ## 58 AssocProf B 9 8 Male 90215 ## 59 AssocProf B 10 9 Male 100135 ## 60 AsstProf B 8 3 Male 75044 ## 61 AssocProf B 9 8 Male 90304 ## 62 AsstProf B 3 2 Male 75243 ## 63 Prof B 33 31 Male 109785 ## 64 AssocProf B 11 11 Female 103613 ## 65 AsstProf B 4 3 Male 68404 ## 66 AssocProf B 9 8 Male 100522 ## 67 Prof B 22 12 Male 101000 ## 68 Prof B 35 31 Male 99418 ## 69 Prof B 17 17 Female 111512 ## 70 Prof B 28 36 Male 91412 ## 71 Prof B 17 2 Male 126320 ## 72 Prof B 45 45 Male 146856 ## 73 Prof B 29 19 Male 100131 ## 74 Prof B 35 34 Male 92391 ## 75 Prof B 28 23 Male 113398 ## 76 AsstProf B 8 3 Male 73266 ## 77 Prof B 17 3 Male 150480 ## 78 Prof B 26 19 Male 193000 ## 79 AsstProf B 3 1 Male 86100 ## 80 AsstProf B 6 2 Male 84240 ## 81 Prof B 43 28 Male 150743 ## 82 Prof B 17 16 Male 135585 ## 83 Prof B 22 20 Male 144640 ## 84 AsstProf B 6 2 Male 88825 ## 85 Prof B 17 18 Female 122960 ## 86 Prof B 15 14 Male 132825 ## 87 Prof B 37 37 Male 152708 ## 88 AsstProf B 2 2 Male 88400 ## 89 Prof B 25 25 Male 172272 ## 90 AssocProf B 9 7 Male 107008 ## 91 AsstProf B 10 5 Female 97032 ## 92 AssocProf B 10 7 Male 105128 ## 93 AssocProf B 10 7 Male 105631 ## 94 Prof B 38 38 Male 166024 ## 95 Prof B 21 20 Male 123683 ## 96 AsstProf B 4 0 Male 84000 ## 97 AssocProf B 17 12 Male 95611 ## 98 Prof B 13 7 Male 129676 ## 99 Prof B 30 14 Male 102235 ## 100 Prof B 41 26 Male 106689 ## 101 Prof B 42 25 Male 133217 ## 102 Prof B 28 23 Male 126933 ## 103 Prof B 16 5 Male 153303 ## 104 Prof B 20 14 Female 127512 ## 105 AssocProf A 18 10 Male 83850 ## 106 Prof A 31 28 Male 113543 ## 107 AssocProf A 11 8 Male 82099 ## 108 AssocProf A 10 8 Male 82600 ## 109 AssocProf A 15 8 Male 81500 ## 110 Prof A 40 31 Male 131205 ## 111 Prof A 20 16 Male 112429 ## 112 AssocProf A 19 16 Male 82100 ## 113 AsstProf A 3 1 Male 72500 ## 114 Prof A 37 37 Male 104279 ## 115 Prof A 12 0 Female 105000 ## 116 Prof A 21 9 Male 120806 ## 117 Prof A 30 29 Male 148500 ## 118 Prof A 39 36 Male 117515 ## 119 AsstProf A 4 1 Male 72500 ## 120 AsstProf A 5 3 Female 73500 ## 121 Prof A 14 14 Male 115313 ## 122 Prof A 32 32 Male 124309 ## 123 Prof A 24 22 Male 97262 ## 124 AssocProf A 25 22 Female 62884 ## 125 Prof A 24 22 Male 96614 ## 126 Prof A 54 49 Male 78162 ## 127 Prof A 28 26 Male 155500 ## 128 AsstProf A 2 0 Female 72500 ## 129 Prof A 32 30 Male 113278 ## 130 AsstProf A 4 2 Male 73000 ## 131 AssocProf A 11 9 Male 83001 ## 132 Prof A 56 57 Male 76840 ## 133 AssocProf A 10 8 Female 77500 ## 134 AsstProf A 3 1 Female 72500 ## 135 Prof A 35 25 Male 168635 ## 136 Prof A 20 18 Male 136000 ## 137 Prof A 16 14 Male 108262 ## 138 Prof A 17 14 Male 105668 ## 139 AssocProf A 10 7 Male 73877 ## 140 Prof A 21 18 Male 152664 ## 141 AssocProf A 14 8 Male 100102 ## 142 AssocProf A 15 10 Male 81500 ## 143 Prof A 19 11 Male 106608 ## 144 AsstProf B 3 3 Male 89942 ## 145 Prof B 27 27 Male 112696 ## 146 Prof B 28 28 Male 119015 ## 147 AsstProf B 4 4 Male 92000 ## 148 Prof B 27 27 Male 156938 ## 149 Prof B 36 26 Female 144651 ## 150 AsstProf B 4 3 Male 95079 ## 151 Prof B 14 12 Male 128148 ## 152 AsstProf B 4 4 Male 92000 ## 153 Prof B 21 9 Male 111168 ## 154 AssocProf B 12 10 Female 103994 ## 155 AsstProf B 4 0 Male 92000 ## 156 Prof B 21 21 Male 118971 ## 157 AssocProf B 12 18 Male 113341 ## 158 AsstProf B 1 0 Male 88000 ## 159 AssocProf B 6 6 Male 95408 ## 160 Prof B 15 16 Male 137167 ## 161 AsstProf B 2 2 Male 89516 ## 162 Prof B 26 19 Male 176500 ## 163 AssocProf B 22 7 Male 98510 ## 164 AsstProf B 3 3 Male 89942 ## 165 AsstProf B 1 0 Male 88795 ## 166 Prof B 21 8 Male 105890 ## 167 Prof B 16 16 Male 167284 ## 168 Prof B 18 19 Male 130664 ## 169 AssocProf B 8 6 Male 101210 ## 170 Prof B 25 18 Male 181257 ## 171 AsstProf B 5 5 Male 91227 ## 172 Prof B 19 19 Male 151575 ## 173 Prof B 37 24 Male 93164 ## 174 Prof B 20 20 Male 134185 ## 175 AssocProf B 17 6 Male 105000 ## 176 Prof B 28 25 Male 111751 ## 177 AssocProf B 10 7 Male 95436 ## 178 AssocProf B 13 9 Male 100944 ## 179 Prof B 27 14 Male 147349 ## 180 AsstProf B 3 3 Female 92000 ## 181 Prof B 11 11 Male 142467 ## 182 Prof B 18 5 Male 141136 ## 183 AssocProf B 8 8 Male 100000 ## 184 Prof B 26 22 Male 150000 ## 185 Prof B 23 23 Male 101000 ## 186 Prof B 33 30 Male 134000 ## 187 AssocProf B 13 10 Female 103750 ## 188 Prof B 18 10 Male 107500 ## 189 AssocProf B 28 28 Male 106300 ## 190 Prof B 25 19 Male 153750 ## 191 Prof B 22 9 Male 180000 ## 192 Prof B 43 22 Male 133700 ## 193 Prof B 19 18 Male 122100 ## 194 AssocProf B 19 19 Male 86250 ## 195 AssocProf B 48 53 Male 90000 ## 196 AssocProf B 9 7 Male 113600 ## 197 AsstProf B 4 4 Male 92700 ## 198 AsstProf B 4 4 Male 92000 ## 199 Prof B 34 33 Male 189409 ## 200 Prof B 38 22 Male 114500 ## 201 AsstProf B 4 4 Male 92700 ## 202 Prof B 40 40 Male 119700 ## 203 Prof B 28 17 Male 160400 ## 204 Prof B 17 17 Male 152500 ## 205 Prof B 19 5 Male 165000 ## 206 Prof B 21 2 Male 96545 ## 207 Prof B 35 33 Male 162200 ## 208 Prof B 18 18 Male 120000 ## 209 AsstProf B 7 2 Male 91300 ## 210 Prof B 20 20 Male 163200 ## 211 AsstProf B 4 3 Male 91000 ## 212 Prof B 39 39 Male 111350 ## 213 Prof B 15 7 Male 128400 ## 214 Prof B 26 19 Male 126200 ## 215 AssocProf B 11 1 Male 118700 ## 216 Prof B 16 11 Male 145350 ## 217 Prof B 15 11 Male 146000 ## 218 AssocProf B 29 22 Male 105350 ## 219 AssocProf B 14 7 Female 109650 ## 220 Prof B 13 11 Male 119500 ## 221 Prof B 21 21 Male 170000 ## 222 Prof B 23 10 Male 145200 ## 223 AssocProf B 13 6 Male 107150 ## 224 Prof B 34 20 Male 129600 ## 225 Prof A 38 35 Male 87800 ## 226 Prof A 20 20 Male 122400 ## 227 AsstProf A 3 1 Male 63900 ## 228 AssocProf A 9 7 Male 70000 ## 229 Prof A 16 11 Male 88175 ## 230 Prof A 39 38 Male 133900 ## 231 Prof A 29 27 Female 91000 ## 232 AssocProf A 26 24 Female 73300 ## 233 Prof A 38 19 Male 148750 ## 234 Prof A 36 19 Female 117555 ## 235 AsstProf A 8 3 Male 69700 ## 236 Prof A 28 17 Male 81700 ## 237 Prof A 25 25 Male 114000 ## 238 AsstProf A 7 6 Female 63100 ## 239 Prof A 46 40 Male 77202 ## 240 Prof A 19 6 Male 96200 ## 241 AsstProf A 5 3 Male 69200 ## 242 Prof A 31 30 Male 122875 ## 243 Prof A 38 37 Male 102600 ## 244 Prof A 23 23 Male 108200 ## 245 Prof A 19 23 Male 84273 ## 246 Prof A 17 11 Female 90450 ## 247 Prof A 30 23 Male 91100 ## 248 Prof A 21 18 Male 101100 ## 249 Prof A 28 23 Male 128800 ## 250 Prof A 29 7 Male 204000 ## 251 Prof A 39 39 Male 109000 ## 252 Prof A 20 8 Male 102000 ## 253 Prof A 31 12 Male 132000 ## 254 AsstProf A 4 2 Female 77500 ## 255 Prof A 28 7 Female 116450 ## 256 AssocProf A 12 8 Male 83000 ## 257 Prof A 22 22 Male 140300 ## 258 AssocProf A 30 23 Male 74000 ## 259 AsstProf A 9 3 Male 73800 ## 260 Prof A 32 30 Male 92550 ## 261 AssocProf A 41 33 Male 88600 ## 262 Prof A 45 45 Male 107550 ## 263 Prof A 31 26 Male 121200 ## 264 Prof A 31 31 Male 126000 ## 265 Prof A 37 35 Male 99000 ## 266 Prof A 36 30 Male 134800 ## 267 Prof A 43 43 Male 143940 ## 268 Prof A 14 10 Male 104350 ## 269 Prof A 47 44 Male 89650 ## 270 Prof A 13 7 Male 103700 ## 271 Prof A 42 40 Male 143250 ## 272 Prof A 42 18 Male 194800 ## 273 AsstProf A 4 1 Male 73000 ## 274 AsstProf A 8 4 Male 74000 ## 275 AsstProf A 8 3 Female 78500 ## 276 Prof A 12 6 Male 93000 ## 277 Prof A 52 48 Male 107200 ## 278 Prof A 31 27 Male 163200 ## 279 Prof A 24 18 Male 107100 ## 280 Prof A 46 46 Male 100600 ## 281 Prof A 39 38 Male 136500 ## 282 Prof A 37 27 Male 103600 ## 283 Prof A 51 51 Male 57800 ## 284 Prof A 45 43 Male 155865 ## 285 AssocProf A 8 6 Male 88650 ## 286 AssocProf A 49 49 Male 81800 ## 287 Prof A 28 27 Male 115800 ## 288 AsstProf A 2 0 Male 85000 ## 289 Prof A 29 27 Male 150500 ## 290 AsstProf A 8 5 Male 74000 ## 291 Prof A 33 7 Male 174500 ## 292 Prof A 32 28 Male 168500 ## 293 Prof A 39 9 Male 183800 ## 294 AssocProf A 11 1 Male 104800 ## 295 Prof A 19 7 Male 107300 ## 296 Prof A 40 36 Male 97150 ## 297 Prof A 18 18 Male 126300 ## 298 Prof A 17 11 Male 148800 ## 299 Prof A 49 43 Male 72300 ## 300 AssocProf A 45 39 Male 70700 ## 301 Prof A 39 36 Male 88600 ## 302 Prof A 27 16 Male 127100 ## 303 Prof A 28 13 Male 170500 ## 304 Prof A 14 4 Male 105260 ## 305 Prof A 46 44 Male 144050 ## 306 Prof A 33 31 Male 111350 ## 307 AsstProf A 7 4 Male 74500 ## 308 Prof A 31 28 Male 122500 ## 309 AsstProf A 5 0 Male 74000 ## 310 Prof A 22 15 Male 166800 ## 311 Prof A 20 7 Male 92050 ## 312 Prof A 14 9 Male 108100 ## 313 Prof A 29 19 Male 94350 ## 314 Prof A 35 35 Male 100351 ## 315 Prof A 22 6 Male 146800 ## 316 AsstProf B 6 3 Male 84716 ## 317 AssocProf B 12 9 Female 71065 ## 318 Prof B 46 45 Male 67559 ## 319 Prof B 16 16 Male 134550 ## 320 Prof B 16 15 Male 135027 ## 321 Prof B 24 23 Male 104428 ## 322 AssocProf B 9 9 Male 95642 ## 323 AssocProf B 13 11 Male 126431 ## 324 Prof B 24 15 Female 161101 ## 325 Prof B 30 31 Male 162221 ## 326 AsstProf B 8 4 Male 84500 ## 327 Prof B 23 15 Male 124714 ## 328 Prof B 37 37 Male 151650 ## 329 AssocProf B 10 10 Male 99247 ## 330 Prof B 23 23 Male 134778 ## 331 Prof B 49 60 Male 192253 ## 332 Prof B 20 9 Male 116518 ## 333 Prof B 18 10 Female 105450 ## 334 Prof B 33 19 Male 145098 ## 335 AssocProf B 19 6 Female 104542 ## 336 Prof B 36 38 Male 151445 ## 337 Prof B 35 23 Male 98053 ## 338 Prof B 13 12 Male 145000 ## 339 Prof B 32 25 Male 128464 ## 340 Prof B 37 15 Male 137317 ## 341 Prof B 13 11 Male 106231 ## 342 Prof B 17 17 Female 124312 ## 343 Prof B 38 38 Male 114596 ## 344 Prof B 31 31 Male 162150 ## 345 Prof B 32 35 Male 150376 ## 346 Prof B 15 10 Male 107986 ## 347 Prof B 41 27 Male 142023 ## 348 Prof B 39 33 Male 128250 ## 349 AsstProf B 4 3 Male 80139 ## 350 Prof B 27 28 Male 144309 ## 351 Prof B 56 49 Male 186960 ## 352 Prof B 38 38 Male 93519 ## 353 Prof B 26 27 Male 142500 ## 354 Prof B 22 20 Male 138000 ## 355 AsstProf B 8 1 Male 83600 ## 356 Prof B 25 21 Male 145028 ## 357 Prof A 49 40 Male 88709 ## 358 Prof A 39 35 Male 107309 ## 359 Prof A 28 14 Female 109954 ## 360 AsstProf A 11 4 Male 78785 ## 361 Prof A 14 11 Male 121946 ## 362 Prof A 23 15 Female 109646 ## 363 Prof A 30 30 Male 138771 ## 364 AssocProf A 20 17 Male 81285 ## 365 Prof A 43 43 Male 205500 ## 366 Prof A 43 40 Male 101036 ## 367 Prof A 15 10 Male 115435 ## 368 AssocProf A 10 1 Male 108413 ## 369 Prof A 35 30 Male 131950 ## 370 Prof A 33 31 Male 134690 ## 371 AssocProf A 13 8 Male 78182 ## 372 Prof A 23 20 Male 110515 ## 373 Prof A 12 7 Male 109707 ## 374 Prof A 30 26 Male 136660 ## 375 Prof A 27 19 Male 103275 ## 376 Prof A 28 26 Male 103649 ## 377 AsstProf A 4 1 Male 74856 ## 378 AsstProf A 6 3 Male 77081 ## 379 Prof A 38 38 Male 150680 ## 380 AssocProf A 11 8 Male 104121 ## 381 AsstProf A 8 3 Male 75996 ## 382 Prof A 27 23 Male 172505 ## 383 AssocProf A 8 5 Male 86895 ## 384 Prof A 44 44 Male 105000 ## 385 Prof A 27 21 Male 125192 ## 386 Prof A 15 9 Male 114330 ## 387 Prof A 29 27 Male 139219 ## 388 Prof A 29 15 Male 109305 ## 389 Prof A 38 36 Male 119450 ## 390 Prof A 33 18 Male 186023 ## 391 Prof A 40 19 Male 166605 ## 392 Prof A 30 19 Male 151292 ## 393 Prof A 33 30 Male 103106 ## 394 Prof A 31 19 Male 150564 ## 395 Prof A 42 25 Male 101738 ## 396 Prof A 25 15 Male 95329 ## 397 AsstProf A 8 4 Male 81035 ``` ] --- count: false # Mean and SE chart: step by step .panel1-mean-se-flip-user[ ```r data(Salaries, package="carData") Salaries %>% * group_by(rank) %>% * summarize( * n = n(), * mean_salary = mean(salary), * sd = sd(salary), * se = sd / sqrt(n) * ) ``` ] .panel2-mean-se-flip-user[ ``` ## # A tibble: 3 x 5 ## rank n mean_salary sd se ## <fct> <int> <dbl> <dbl> <dbl> ## 1 AsstProf 67 80776. 8174. 999. ## 2 AssocProf 64 93876. 13832. 1729. ## 3 Prof 266 126772. 27719. 1700. ``` ] --- count: false # Mean and SE chart: step by step .panel1-mean-se-flip-user[ ```r data(Salaries, package="carData") Salaries %>% group_by(rank) %>% summarize( n = n(), mean_salary = mean(salary), sd = sd(salary), se = sd / sqrt(n) ) %>% * ggplot(data = .) ``` ] .panel2-mean-se-flip-user[ <img src="data_visualization_x_files/figure-html/mean-se-flip_user_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Mean and SE chart: step by step .panel1-mean-se-flip-user[ ```r data(Salaries, package="carData") Salaries %>% group_by(rank) %>% summarize( n = n(), mean_salary = mean(salary), sd = sd(salary), se = sd / sqrt(n) ) %>% ggplot(data = .) + * aes(x = rank, y = mean_salary) ``` ] .panel2-mean-se-flip-user[ <img src="data_visualization_x_files/figure-html/mean-se-flip_user_04_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Mean and SE chart: step by step .panel1-mean-se-flip-user[ ```r data(Salaries, package="carData") Salaries %>% group_by(rank) %>% summarize( n = n(), mean_salary = mean(salary), sd = sd(salary), se = sd / sqrt(n) ) %>% ggplot(data = .) + aes(x = rank, y = mean_salary) + * geom_point(size = 3) ``` ] .panel2-mean-se-flip-user[ <img src="data_visualization_x_files/figure-html/mean-se-flip_user_05_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Mean and SE chart: step by step .panel1-mean-se-flip-user[ ```r data(Salaries, package="carData") Salaries %>% group_by(rank) %>% summarize( n = n(), mean_salary = mean(salary), sd = sd(salary), se = sd / sqrt(n) ) %>% ggplot(data = .) + aes(x = rank, y = mean_salary) + geom_point(size = 3) + * geom_errorbar( * aes( * ymin = mean_salary - 1.96 * se, * ymax = mean_salary + 1.96 * se * ), * width = .1 * ) ``` ] .panel2-mean-se-flip-user[ <img src="data_visualization_x_files/figure-html/mean-se-flip_user_06_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-mean-se-flip-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mean-se-flip-user { color: black; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mean-se-flip-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Heat map .left5[ .content-box-green[**When to use**]: Heat maps can be useful for 3-dimensional data (`x`, `y`, `z`). The magnitude of the third dimension (`z`) is represented by color unlike a 3D-plot (e.g., contour map). .content-box-green[**Data preparation**]: A dataset in long format that has a single value for each group. .content-box-green[**How**]: For each group, you supply `x`, `y`, and `fill = z` to `geom_tile()` (See the next slide for a demonstration). Read the following data for replication: ```r gene_data <- readRDS("gene.rds") ``` ] .right5[ <img src="data_visualization_x_files/figure-html/unnamed-chunk-68-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-heatmap-user[ ```r *gene_data ``` ] .panel2-heatmap-user[ ``` ## # A tibble: 8,748 x 5 ## x gene value group gene_txt ## <chr> <chr> <dbl> <chr> <chr> ## 1 Zm00001d004894 20_WW_BL_TP1 0.739 BL 20_WWTP1 ## 2 Zm00001d004894 20_WW_BL_TP3 0.772 BL 20_WWTP3 ## 3 Zm00001d004894 20_WW_BL_TP5 0.136 BL 20_WWTP5 ## 4 Zm00001d004894 10_WW_BL_TP1 0.222 BL 10_WWTP1 ## 5 Zm00001d004894 10_WW_BL_TP3 0.672 BL 10_WWTP3 ## 6 Zm00001d004894 10_WW_BL_TP5 0.504 BL 10_WWTP5 ## 7 Zm00001d004894 5_WW_BL_TP1 0.991 BL 5_WWTP1 ## 8 Zm00001d004894 5_WW_BL_TP3 0.201 BL 5_WWTP3 ## 9 Zm00001d004894 5_WW_BL_TP5 0.669 BL 5_WWTP5 ## 10 Zm00001d004894 20_D_BL_TP1 0.534 BL 20_DTP1 ## # … with 8,738 more rows ``` ] --- count: false # .panel1-heatmap-user[ ```r gene_data %>% * ggplot(.) ``` ] .panel2-heatmap-user[ <img src="data_visualization_x_files/figure-html/heatmap_user_02_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-heatmap-user[ ```r gene_data %>% ggplot(.) + * geom_tile(aes(gene_txt, x, fill= value)) ``` ] .panel2-heatmap-user[ <img src="data_visualization_x_files/figure-html/heatmap_user_03_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-heatmap-user[ ```r gene_data %>% ggplot(.) + geom_tile(aes(gene_txt, x, fill= value)) + * scale_fill_distiller(palette = "YlOrRd") ``` ] .panel2-heatmap-user[ <img src="data_visualization_x_files/figure-html/heatmap_user_04_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-heatmap-user[ ```r gene_data %>% ggplot(.) + geom_tile(aes(gene_txt, x, fill= value)) + scale_fill_distiller(palette = "YlOrRd") + * facet_grid(. ~ group) ``` ] .panel2-heatmap-user[ <img src="data_visualization_x_files/figure-html/heatmap_user_05_output-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # .panel1-heatmap-user[ ```r gene_data %>% ggplot(.) + geom_tile(aes(gene_txt, x, fill= value)) + scale_fill_distiller(palette = "YlOrRd") + facet_grid(. ~ group) + * theme( * axis.ticks.y = element_blank(), * axis.title = element_blank(), * axis.text.y = element_blank(), * axis.text.x = element_text(angle = 90), * legend.position = "bottom" * ) ``` ] .panel2-heatmap-user[ <img src="data_visualization_x_files/figure-html/heatmap_user_06_output-1.png" width="100%" style="display: block; margin: auto;" /> ] <style> .panel1-heatmap-user { color: black; width: 39.2%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-heatmap-user { color: black; width: 58.8%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-heatmap-user { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # More supplementary `geom_*()` .panelset[ .panel[.panel-name[encircle] .left5[ Create a dataset first: ```r library(ggalt) yield_group <- county_yield %>% filter(state_name %in% c("Colorado", "Nebraska")) %>% filter(!(state_name == "Colorado" & corn_yield > 130)) %>% filter(!(state_name == "Nebraska" & corn_yield > 150)) ``` Create a figure: ```r ggplot(data = yield_group) + geom_point( aes(y = corn_yield, x = d3_5_9, color = state_name) ) + * geom_encircle( * aes(y = corn_yield, x = d3_5_9, color = state_name) * ) ``` + Can be useful for illustrating clusters ] .right5[ <img src="data_visualization_x_files/figure-html/encircle-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[segment] .left5[ ```r ggplot(data = yield_group) + geom_point( aes(y = corn_yield, x = d3_5_9, color = state_name) ) + * geom_segment( * x = 10, * y = 50, * xend = 18.8, * yend = 88, * arrow = arrow(length = unit(0.5, "cm")) * ) ``` ] .right5[ <img src="data_visualization_x_files/figure-html/segment-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[curve] .left5[ ```r ggplot(data = yield_group) + geom_point( aes(y = corn_yield, x = d3_5_9, color = state_name) ) + * geom_curve( * x = 10, * y = 50, * xend = 18.8, * yend = 88, * curvature = 0.2, * arrow = arrow(length = unit(0.5, "cm")) * ) ``` ] .right5[ <img src="data_visualization_x_files/figure-html/curve-f-1.png" width="90%" style="display: block; margin: auto;" /> ] ] ] <!-- #========================================= # Animation #========================================= --> --- class: inverse, center, middle name: animated # Animated figures <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Animated figures using the `gganimate` package .panelset[ .panel[.panel-name[Instruction] .left-full[ + Install the `gganimate` package: ```r install.packages(gganimate) ``` + Install the `png` and `gifski` packages as well: ```r install.packages(png) install.packages(gifski) ``` + You need to library only the `gganimate` package ```r library(gganimate) ``` ] ] .panel[.panel-name[How] .left-full[ **One state a a time**: + Create a regular ggplot object **without** the dimension you intend to animate over + add `transition_states(transition variable)` to make the `ggplot` object animated **Reveal a state at a time**: + Create a regular ggplot object **with** the dimension you intend to animate over + add `transition_reveal(transition variable)` to make the `ggplot` object animated ] ] .panel[.panel-name[state] .left-code[ ```r weather %>% ggplot(data = .) + geom_boxplot( aes(y = temp, x = origin, fill = origin) ) + * transition_states(month) ``` + `weather` is from the the `nycflights13` package has daily weather information at three airports in NY. + Each frame has a boxplot of temperature for the three airports in NY in its corresponding month. ] .right-plot[ <img src="data_visualization_x_files/figure-html/animate-f-1-1.gif" width="90%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[reveal] .left-code[ ```r weather %>% filter(month == 8) %>% filter(day <= 10) %>% ggplot(data = .) + geom_boxplot( aes(y = temp, x = factor(day), fill = origin) ) + * transition_reveal(day) ``` ] .right-plot[ <img src="data_visualization_x_files/figure-html/animate-2-f-1.gif" width="90%" style="display: block; margin: auto;" /> ] ] ] --- # Animated figures .panelset[ .panel[.panel-name[Instruction] .left-full[ You can also use the `plotly` package to create an animated figure. ```r install.packages(plotly) library(plotly) ``` + It is very easy to create an animated figure using `plotly` + But, it is a bit of a hassle to integrate animated figures generated by `plotly` ] ] .panel[.panel-name[How 1] .left-full[ + Create a regular `ggplot` object (figure) where `frame = transition variable` is added in `aes()` along with other necessary arguments + Apply `ggploty()` to the `ggplot` object + Save the results as an html file + Import the html file and put it in an **iframe** ] ] .panel[.panel-name[How 2] .left-full[ ```r #--- create a ggplot object ---# g_box <- county_yield %>% filter(state_name %in% c("Nebraska", "Colorado", "Kansas")) %>% ggplot(data = .) + geom_boxplot( aes( y = corn_yield, x = state_name, #--- add frame ---# * frame = year ) ) + labs(x = "State", y = "Corn Yield (bu/acre)") #--- apply ggplotly() to the ggplot object ---# *ggplotly(g_box, width = 800, height = 400) %>% #--- save as html file ---# * htmltools::save_html(file = "g_box.html") ``` Then, add this in your Rmd file: ``` <iframe src="g_box.html" width="1000" height="550" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ``` ] ] <!-- panel ends here --> .panel[.panel-name[time-series box plot] <iframe src="g_box.html" width="1000" height="550" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ] .panel[.panel-name[time-series scatter plot] <iframe src="fig.html" width="900" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ] <!-- panel ends here --> ] <!-- #========================================= # Exporting a figure as an image #========================================= --> --- class: inverse, center, middle name: inputoutput # Exporting a figure as an image <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Exporting a figure as an image .panelset[ .panel[.panel-name[Instruction] .left-full[ You can use the `ggsave()` function with the following syntax: ```r #--- Syntax (NOT RUN) ---# ggsave(filename = file name, plot = ggplot object) #--- or just this ---# ggsave(file name, ggplot object) ``` ## Example ```r ggsave("ex_boxplot.pdf", g_box) ``` This will save `g_box` as **ex_boxplot.pdf** in the working directory. ] ] .panel[.panel-name[output file format] .left-full[ + Many different file formats are supported including pdf, svg, eps, png, jpg, tif, etc. One thing you want to keep in mind is the type of graphics: * vector graphics (pdf, svg, eps) * raster graphics (jpg, png, tif) + While vector graphics are scalable, raster graphics are not. + If you enlarge raster graphics, the cells making up the figure become visible, making the figure unappealing. + Unless it is required to save figures as raster graphics, it is encouraged to save figures as vector graphics. + **pdf** is almost always a good choice ] ] .panel[.panel-name[Options] .left-full[ ## Image width and height + You can control the width and height of the output image using the `width` and `height` options (the default unit is inch.): ```r ggsave("ex_boxplot.pdf", g_box, height = 5, width = 7) ``` ## Image resolution + You can control the resolution of the output image by specifying DPI (dots per inch) using the dpi option. + The default DPI value is 300, but you can specify any value suitable for the output image, including “retina” (320) or “screen” (72). + 600 or higher is recommended when a high resolution output is required. ```r #--- dpi = 320 ---# ggsave("nc_dpi_320.png", g_nc, height = 5, width = 7, dpi = "retina") ggsave("nc_dpi_600.png", g_nc, height = 5, width = 7, dpi = 600) ``` ] ] ] <!-- #========================================= # Resources #========================================= --> --- class: inverse, center, middle name: inputoutput # Resources <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- # Resources ## Books and tutorials + [ggplot2: Elegant Graphics for Data Analysis](https://ggplot2-book.org/) + [Data Visualization with R](https://rkabacoff.github.io/datavis/) + [ggplot2 tutorial by Cedric Scherer](https://cedricscherer.netlify.app/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/) + [Interactive html document](https://plotly-r.com/index.html) + [R Graphics Cookbook, 2nd edition](https://r-graphics.org/) ## Packages + [ggplot2 extensions](https://exts.ggplot2.tidyverse.org/) + [gganimat](https://exts.ggplot2.tidyverse.org/)