Learn ggplot2
basics to create simple figures.
Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides
This lecture does NOT provide a complete treatment of the basics of the ggplot2 package.
Rather, it provides the minimal knowledge of the package so that readers who are not familiar with the package can still keep up with the lecture on map creation.
ggplot2
is a general and extensive data visualization tool. It is very popular among R users due to its elegance in and ease of use in generating high-quality figures.
It is designed following the “grammar of graphics,”” which makes it possible to visualize data in an easy and consistent manner irrespective of the type of figures generated, whether it is a simple scatter plot or a complicated map.
This means that learning the basics of how ggplot2
works directly helps in creating maps as well. This chapter goes over the basics of how ggplot2
works in general.
We use the mpg
data to create a simple scatter plot. Here is what mpg dataset looks like:
In ggplot2
, you first specify what data to use. The following code declares to R that we will be using mpg
as the data for this figure.
Yes, it is a blank canvas. This makes sense because you have not told R how to use the data for visualization.
Now that you have specified the data for R to use, we are ready to explain how to use it for visualization.
You can achieve this using one of the geom_*()
functions available in the ggplot2
package. Here is a short list of some commonly used ones:
geom_point()
: scatter plotgeom_line()
” line plotgeom_histogram()
: histogramgeom_boxplot()
: box plotgeom_sf()
: mapHere, let’s create a scatter plot.
Note here that you added a layer defined by geom_point(aes(x = displ, y = hwy))
to g_base
.
Let’s now look inside of what is happening in geom_point()
.
In aes()
, x = displ
and y = hwy
tells R that we want displ
on the x-axis and hwy
on the y-axis.
Note that different geom_*()
s accept/require different options. For example, geom_histogram
does not have y
as the y-axis is always count.
What happens if we remove aes()
. It does not seem to be doing anything. Why can’t we just do this?
Yes, aes()
was used to tell R to look for variables inside the data you have specified for R to use earlier in ggplot(data = mpg)
. Without aes()
, R looks for an object named displ
(and hwy
), which is only defined inside of mpg
, thus resulting in the error.
geom_*()
, you can specify a number of options to make the figure look different.geom_*()
s accept different options.geom_*()
typecolor
: color of the pointsshape
: shape of the pointssize
: size of the pointscolor
: color of the borders of the barsfill
: color of the inside of the barsshape
: no effectlinewidth
: width of the borders of the barscolor
: color of the linefill
: no effectshape
: no effectlinewidth
: width of the lineYou can easily have multiple layers in a single figure just simply adding geom_()
on top of the previous one.
Let’s now add a line plot layer to this.
The way we declared the dataset to use with ggplot(data = mpg)
tells R that mpg
will be used for every single subsequent geom_*()
s unless otherwise specified.
In the code below, mpg
is used for both geom_point()
and geom_line()
.
Alternatively, you could specify the dataset locally inside a geom_*()
like below, resulting in the same figure as above.
Now, remove data = mpg
from the geom_line()
above and see what happens. It will result in an error because dataset is not declared either in ggplot()
or geom_line()
. geom_line()
does not know what dataset to use.
With this behavior understood, it is not hard to use multiple datasets in a single figure.
You might have noticed that the line plot looks a bit weird. That is because there are multiple distinct values of hwy
observed at the same value of dspl
. Let’s get the average value of hwy
conditional on displ
.
Let’s plot now,
In the previous examples, all the points and lines had the same color. But, you can use different colors based on the value of a variable.
To do so, you need to have the option inside aes()
.
This code change the color of the points based on the value of model
variable.
This code change the shape of the points based on the value of model
variable.
This code change the type and color of the lines based on the value of cyl
variable.
facet_wrap()
or facet_grid()
to achieve this.Syntax:
You can facet by up to two variables. If you want to facet by only one variable, then put .
in place.
Syntax:
You can facet by up to two variables. If you want to facet by only one variable, then put .
in place.
Yes, it is basically the same, but the order of var_1
and var_2
matters more than facet_wrap()
as you will see later.
.
, and “var_2” part is factor(cyl)
cyl
) are printed within a strip.Note
Switch .
and factor(cyl)
and see what happens.
trans
and factor(cyl)
.ncol
and nrow
, respectively.Note
Switch .
and factor(cyl)
and see what happens.
trans
and factor(cyl)
.facet_wrap()
as the number of levels for the faceting variables dictates them.