ggplot2
: BasicsClick on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides
Install the package if you have not.
Or, when you load the tidyverse
package, it automatically loads it.
We use county_yield
, which records corn and soybean yield data by county over multiple years.
soy_yield
: soybean yield (bu/acre)corn_yield
: corn yield (bu/acre)d0_5_9
: ratio of weeks under drought severity of 0 from May to Septemberd1_5_9
: ~ drought severity of 1 from May to Septemberd2_5_9
: ~ drought severity of 2 from May to Septemberd3_5_9
: ~ drought severity of 3 from May to Septemberd4_5_9
: ~ drought severity of 4 from May to SeptemberWe also use the derivative of county_yield
, which records average corn yield by year.
ggplot2
basicsggplot2
basicsThe very first job you need to do in creating a figure using the ggplot2
package is to let R know the dataset you are trying to visualize, which can be done using ggplot()
like below:
When you create a figure using the ggplot2
package, ggplot()
is always the function you call first.
Let’s now see what is inside g_fig
:
Well, it’s blank. Obviously, g_fig
still does not have enough information to create any kind of figures. You have not told R anything specific about how you would like to use the information in the dataset.
The next thing you need to do is tell g_fig
what type of figure you want by geom_*()
functions. For example, we use geom_point()
to create a scatter plot. To create a scatter plot, R needs to know which variables should be on the y-axis and x-axis. These information can be passed to g_fig
by the following code:
Here,
geom_point()
was added to g_fig
to declare that you want a scatter plotaes(x = d3_5_9, y = corn_yield)
inside geom_point()
tells R that you want to create a scatter plot where you have d3_5_9
on the x-axis and corn_yield
on the y-axisThis is what g_fig_scatter
looks:
Going back to the code,
Note that x = d3_5_9
, y = corn_yield
are inside aes()
.
Important
aes()
is used to make the aesthetic of the figure to be a function of variables in the dataset that you told ggplot
to use (here, county_yield
).
aes(x = d3_5_9, y = corn_yield)
is telling ggplot
to use d3_5_9
and corn_yield
variables in the county_yield
dataset for the x-axis and y-axis, respectively.
If you do not have x = d3_5_9
, y = corn_yield
inside aes()
, R is going to look for d3_5_9
and corn_yield
themselves (but not in county_yield
), which you have not defined.
Try:
ggplot(data = dataset)
to initiate the process of creating a figure
add geom_*()
to declare what kind of figure you would like to make
specify what variables in the dataset to use and how they are used inside aes()
place the aes()
you defined above in the geom_*()
you specified above
ggplot2
lets you create lots of different kinds of figures via various geom_*()
functions.
geom_histogram()
/geom_density()
geom_line()
geom_boxplot()
geom_bar()
How to specify aesthetics vary by geom_*()
.
Note
geom_histogram()
only needs x
.
Note
geom_density()
only needs x
.
Note
geom_line()
needs x
and y
.
Note
geom_boxplot()
needs x
and y
.factor(year)
?Note
geom_bar()
needs x
and y
All the elements in the figures we have created so far are in black and white.
You can change how figure elements look by providing options inside geom_*()
.
Here are the list of options to control the aesthetics of figures:
Elements of figures that you can modify differ by geom
types
The same element name can mean different things based on geom
types
This exercise use the diamonds
dataset from the ggplot2()
package. First, load the dataset and extract observations with Premium
cut whose color is one of E
, I
, and F
:
Using carat
and price
variables from premium
, generate the figure below:
geom_*()
sgeom_*()
sHere are the list of useful geom_.
geom_vline()
: draw a vertical linegeom_hline()
: draw a horizontal linegeom_abline()
: draw a line with the specified intercept and slopegeom_smooth()
: draw an OLS-estimated regression line (other regression methods available)geom_ribbon()
: create a shaded areageom_text()
and annotate()
: add texts in the figureWe will use g_fig_scatter
to illustrate how these functions work.
Note
xintercept
in geom_vline
: where the vertical line is placedyintercept
in geom_hline
: where the horizontal line is placedNote
\[y = a + b\times x\]
intercept
: \(a\)slope
: \(b\)Note
Also try adding method = "lm"
.
Note
ymin
: lower bound of the ribbonymax
: upper bound of the ribbonIt is useful when drawing confidence intervals.
Note
x
, y
: position of where texts are placedlabel
: variable to printNote
x
: where on x-axisy
: where on y-axislabel
: text to print (break the line)