04-2: Data visualization with ggplot2: More in One

Tips to make the most of the lecture notes

  • Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want

  • Hit letter “o” on your keyboard and you will have a panel view of all the slides

  • The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
  • Hit the “Run Code” button to execute all the code inside the code area.
  • You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
  • If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
  • You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

Placing more information in one figure


Placing more information in one figure

So far, we have learned the basics of ggplot2 and how to create popular types of figures. We can make a figure much more informative by making its aesthetics data-dependent.

For example, suppose you are interested in comparing the history of irrigated corn yield by state in a line plot. So, you want to create a line for each state and make the lines distinguishable so the readers know which line is for which state like this:

We can make the aesthetics of a figure data-dependent by specifying which variable you use for aesthetics differentiation INSIDE aes().

Here is an example:

In this code, color = state_name is inside aes() and it tells R to divide the data into the groups of State and draw a line by state_name (by state) where the lines are color-differentiated.

A legend is automatically generated.

Create a data set of corn yield by state-year first:

This exercise use the diamonds dataset from the ggplot2() package. First, load the dataset and extract observations with Premium cut whose color is one of E, I, and F:

Using premium, create a scatter plot of price (y-axis) against depth (x-axis) by clarity:

Code
ggplot(data = premium) +
  geom_point(aes(y = price, x = depth, color = clarity))

Using premium, create density plots of carat by color (set alpha to 0.5):

Code
ggplot(data = premium) +
  geom_density(aes(x = carat, fill = color), alpha = 0.5)

Faceting


Faceting

Sometimes, you would like to visualize information across groups on separate panels.

Too much information in one panel?

On separate panels (faceting)?

We can make faceted figures by adding either facet_wrap or facet_grid() in which you specify which variable to use for faceting.

Here is an example:


In this code, facet_wrap(state_name ~ .) is added to a simple boxplot, which tells R to make a boxplot by state_name (state).


Note

. in state_name ~ . means non (facet by no variable).

Two-way faceting will

  • divide the data into groups where each group has a unique combination of the two faceting variables

  • create a plot for each group

Example

Filter county_yield to those in 2017 and 2018.

Create a faceted density plots.

facet_wrap

facet_grid

Note

  • Unlike facet_wrap(), which side you put faceting variables matters a lot.

    • left hand side: rows
    • right hand side: columns
  • In the code above, state_name values become the rows, and year values become columns.

facet_grid() allows

  • the figures in different columns to have different scales for the x-axis (figures in the same column have the same scale for the x-axis)

  • the figures in different rows to have different scales for the y-axis (figures in the same rows have the same scale for the x-axis)

Create a variable that has the values you want to use as labels and use it as a faceting variable:

Using premium, create scatter plots of price (y-axis) against carat (x-axis) by color on separate panels as shown on the right.

Code
ggplot(data = premium) +
  geom_point(aes(x = carat, y = price)) +
  facet_grid(color ~ .)

Using premium, create histogram of carat by color and clarity on separate panels as shown on the right.

Code
ggplot(data = premium) +
  geom_histogram(aes(x = carat)) +
  facet_grid(color ~ clarity)

Preparing datasets for visualization

We have seen

  • figures where its main elements (points, lines, boxes, etc) are made color differentiated (e.g., with aes(color = var) inside the geom_*() function)
  • faceted figures

Important

The dataset has to be in long format to create these types of figures!!


For example consider the following dataset in a wide format:


This dataset has county-level yields for Nebraska, Colorado, and Kansas stored in variables named 2000 and 2001 (they themselves represent years).

Imagine creating boxplots of corn yield fill color-differentiated by state and faceted by year….You actually cannot specify facet_grid() properly because you do not have a single variable that represents year.

You will find that reshaping wide datasets using pivot_longer() is very useful in creating figures.

Multiple datasets in one figure


Multiple datasets in one figure

Important

  • (Global) When a dataset is specified inside ggplot(), then the dataset is used in ALL of the subsequent geom_*() unless otherwise specified
  • (Local) When a dataset is specified inside of a geom_*(), the dataset is used only for the geom_*() over-riding the global dataset set inside ggplot().

This works with county_yield used in both geom_point() and geom_smooth().

This does not work because no global dataset is set inside ggplot() and no dataset is supplied to geom_smooth().

To use multiple datasets inside a single ggplot object (or a figure), you just need to specify what dataset to use locally inside individual geom_*()s.