04-3: Data visualization with ggplot2: Fine Tuning

Tips to make the most of the lecture notes

  • Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want

  • Hit letter “o” on your keyboard and you will have a panel view of all the slides

  • The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
  • Hit the “Run Code” button to execute all the code inside the code area.
  • You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
  • If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
  • You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

Make your figures presentable to others


Make your figures presentable to others

  • Figures we have created so far cannot be used for formal presentations or publications. They are simply too crude.

  • We need fine-tune raw figures before they are publishable.

  • You can control virtually every element of a figure under the ggplot2 framework.

  • Take a look at here for the complete list of options you can use to modify the theme of figures

Key

The most important thing is actually to know which part of a figure a theme option refers to (e.g., axis.text)

Two types of operations

Operations to make your figures presentable can be categorized into two types:

  • Content-altering
  • Theme-altering


Examples

For the y-axis title,

  • The axis title text itself (say “Corn Yield (bu/acre)”) falls under the content category.

  • The position of or the font size of the axis-title fall under the theme category

The content itself does not change when theme is altered.

Original

Altered

Original

Altered

  • Distinctions between the two types of actions are not always clear

  • But, typically, you use

    • scale_*() function series to alter contents
    • theme() function to alter the theme
  • Note that there are shorthand convenience functions to alter figure contents for commonly altered parts of figures

Axes content

We are going to build on this figure in this section:

We can use

  • scale_x_discrete()/scale_x_continuous() for x-axis
  • scale_y_discrete()/scale_y_continuous() for y-axis

to control the following elements of axes:

  • name: an axis title
  • limit: the range of an axis
  • breaks: axis ticks positions
  • label: axis texts at ticks

Note

  • We use scale_x_discrete() if x is a discrete variable (not numeric) and scale_x_continuous() if x is a continuous variable (numeric).
  • The same applies for y.



Or just this,


Or just,


Or,

You can filter the data first and then use the filtered data.

  • breaks: determines where the ticks are located
  • labels: defines the texts at the ticks

Run the following code to create gg_delay, which you will build on.

Change the axes content to create the figure on the right using scale_x_continuous() and scale_y_continuous().

Here are the list of changes you need to make:

  • x-axis
    • change the x-axis title to “Month”
    • change the limit of the x-axis title to 4 through 8
    • change the the breaks and their labels of the x-axis ticks (breaks) to 4 through 8
  • y-axis
    • change the y-axis title to “Average Arrival Delay (minutes)”
    • change the limit of the y-axis title to 0 through 25
Code
gg_delay + 
  scale_x_continuous(
    name = "Month",
    limit = c(4, 8),
    breaks = 4:8
  ) +
  scale_y_continuous(
    name = "Average Arrival Delay (minutes)",
    limit = c(0, 25)
  )

Change the axes content to create the figure on the right. But, use scale_x_continuous() only for changing the x-axis breaks.

Here are the list of changes you need to make:

  • x-axis
    • change the x-axis title to “Month”
    • change the limit of the x-axis title to 4 through 8
    • change the the breaks and their labels of the x-axis ticks (breaks) to 4 through 8
  • y-axis
    • change the y-axis title to “Average Arrival Delay (minutes)”
    • change the limit of the y-axis title to 0 through 25
Code
gg_delay +
  scale_x_continuous(
    name = "Month",
    limit = c(4, 8),
    breaks = 4:8
  ) +
  scale_y_continuous(
    name = "Average Arrival Delay (minutes)",
    limit = c(0, 25)
  )

Legends content

We are going to build on this figure in this section:

Run the following code to create gg_delay, which you will build on.

Change the legend contents to create the figure on the right. Using scale_*_brewer(). You need to identify what goes into * in scale_*_brewer().

Here are the list of changes you need to make:

  • change the legend title to “Airports in NY”
  • change the the legend title position to “bottom”
  • change the legend items to be spread in 3 columns
  • change the color palette to Set2
Code
gg_delay + 
  scale_color_brewer(
    name = "Airports in NY",
    palette = "Set2",
    guide = guide_legend(
      title.position = "bottom",
      ncol = 3
    )
  )

Theme

When specifying the theme of figure elements, it is good to know the naming convention of figure elements:

For example:

  • axis.title

This refers to the title of both x- and y-axis. Any aesthetic theme you apply to this element will be reflected on the title of both x- and y-axis.

  • axis.title.x

This refers to the title of only x-axis. Any aesthetic theme you apply to this element will be reflected on the title of only x-axis.

So, basically appending .name narrows down the scope of the figure elements the element name refers to.

There are common functions we use to specify the aesthetic nature of figure elements based on the type of the elements:


  • element_text(): for text elements like axis.text, axis.title, legend.text

Inside the function, you specify things like font size, font family, angle, etc.

  • element_rect(): for box-like elements like legend.background, plot.background, strip.background

Inside the function, you specify things like font background color, border line color, etc.

  • element_line(): for line elements like panel.grid.major, axis.line.x

Inside the function, you specify things like line thickness, line color, etc.

  • element_blank(): any components

It makes the specified component disappear.

  • unit(): for attributes of figure elements like legend.key.width, legend.box.spacing

Axis theme

We are going to build on this figure in this section:

Legends theme

We can use them() to change the aesthetics of legends. Some of the elements include

  • title
  • position
  • key
  • text
  • direction
  • background

See here for the full list of options related to legends.

We will discuss how to change the color scheme of legends later in much detail.

This is what we will build on:

Pre-made and customized themes

There are a bunch of pre-made themes from the ggplot2 and ggthemes packages that can quickly change how figures look.

Install and library ggthemes package first:

#--- install ---#
install.packages("ggthemes") 

#--- library ---#
library("ggthemes") 


See the full list of pre-made themes here.

You can simply override parts of the pre-made theme by adding theme options like this (see more on this here):

g_axis +
  theme_bw() +
  theme(
    panel.grid.minor = element_blank()
  )


So, you can pick the pre-made theme that looks the closest to what you would like, and then add on theme elements to the part you do not like.

We will build from this figure:

See here for the line types available.

You can create your own theme, save it, and then use it later.

Here, I am creating my own theme off of theme_economist(), where axis titles and major panel grids are absent.

You can add my_theme like below just like a regular pre-made theme:

ggplot(data = weather) +
  geom_boxplot(
    aes(y = temp, x = factor(month))
  ) +
  my_theme

If you would like to apply your theme to all the figures you generate, then use theme_set() like below:

theme_set(my_theme)

After this, all of your figures will follow my_theme.

Faceted figure theme

Faceted figures have strip elements that do no exist for non-faceted figures like

  • strip.background
  • strip.placement
  • strip.text
  • panel.spacing

We learn how to modify these elements.

Create a dataset for this section:


Create a faceted figure we will build on:

Color


More flexible color options with HEX

Instead of naming the color you want to use, you can use HEX color codes instead.

Direction

  • Visit here
  • Click on any color you like
  • Then you will see two sets of color gradients (thicker and lighter from the color you picked)
  • Pick the color you like from the color bar and copy the HEX color code beneath the color you picked

You could alternatively use the RGB codes, but I do not see any reasons to do so because the use of HEX codes is sufficient.

You can use HEX color codes for any color-related elements in a figure.

Try

Pick a Hex color and try it yourself.

Color scale

The choice of color schemes for your figures are very important (not so much for academic journals …)

We use scale_A_B() functions to for color specification:

  • A is the name of aesthetic (color or fill)
  • B is the type of color specification method

For example, consider the following code:

Since it is the color aesthetic that we want to work on, A = color.

There are many options for B. Indeed, there are so many that, it gets confusing!

  • scale_color_brewer() (discrete)
  • scale_color_distiller() (continuous)
  • scale_color_viridis_d() (discrete)
  • scale_color_viridis_c() (continuous)
  • scale_color_continuous() (continuous)
  • scale_color_discrete() (discrete)
  • scale_color_hue() (discrete)

One thing to remember is that you need to be aware of whether the aesthetic variable (here, corn_yield) is numeric or not as that determines acceptable type of B.

Viridis

We have four scale functions for Viridis color map:

  • scale_color_viridis_c(): for color aesthetic with a continuous variable
  • scale_color_viridis_d(): for color aesthetic with a discrete variable
  • scale_fill_viridis_c(): for fill aesthetic with a continuous variable
  • scale_fill_viridis_d(): for fill aesthetic with a discrete variable

There are five color scheme types under the Viridis color map:

  • magma
  • inferno
  • plasma
  • viridis
  • civiris

You can use option to specify which one of them you want to use inside the scale functions.

These color schemes are color-blind sage.

RColorBrewer

RColorBrewer package provides a number of color palettes of three types:

  • sequential: suitable for a variable that has ordinal meaning (e.g., temperature, precipitation)
  • diverging: suitable for variables that take both negative and positive values (e.g., changes in groundwater level)
  • qualitative: suitable for qualitative or categorical variable

We use two types of scale functions for the palettes:

  • scale_A_brewer(): for discrete aesthetic variable
  • scale_A_distiller(): for continuous aesthetic variable

Generate a dataset for visualization:


Create a figure:

Set color scale manually

Sometimes, you just want to pick colors yourself. In that case, you can use

  • scale_color_manual()
  • scale_fill_manual()

Inside the scale_*_manual() function, you provide a named vector where a sequence of group names and their corresponding colors are specified to the scale function via the values option.

For example, consider the box plot of corn yield for four states: Colorado, Kansas, Nebraska, and South Dakota. Then, a sample named vector looks like this:

(
cols <- c("Colorado" = "red", "Nebraska" = "blue", "Kansas" = "orange", "South Dakota" = "#ff0080")
)


Now that a named vector is created, you can do the following to impose the color scheme you just defined.

scale_fill_manual(values = cols)

Define a named color vector:


Create a figure:

How

You can use scale_*_gradientn() to create your own continuous color scale.


Syntax

`scale_*_gradientn(colors, values)`
  • colors: a vector of colors
  • values: a vector of numeric numbers ranging from 0 to 1
  • limits: define the lower and upper bounds of the scale bar

nth value of colors is used for the interval defined by nth and n+1th values in values.

Create a figure:


In this example, green is dominant in the color bar because the interval [0.2, 0.9] is for "green" in colors, where the interval represents [130, 235] ([100 + (250-100), 100 + (250-100)]).