Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides
We use the pizzaplace
dataset is available in the gt
package.
Date
R has an object class called Date
.
This is a date as character
.
This is a date as Date
.
Recording dates as an Date
object instead of a string has several benefits:
Date
objectsfilter()
based on the chronological order of datesDates (as string) come in various formats. Several of them are:
2010-12-15
12/15/2010
Dec 15 10
15 December 2010
They all represent the same date.
We can use as.Date()
to transform dates stored as characters into Date
s.
In format
you specify how day, month, and year are represented in the date characters you intend to convert using special symbols including:
%d
: day as a number (0-31)%m
: month (00, 01, 02, \(\dots\), 12)%b
: abbreviated month (Jan, \(\dots\), Dec)%B
: unabbreviated month (January, \(\dots\), December)%y
: 2-digit year (96 for 1996, 02 for 2002)%Y
: 4-digit year (1996, 2012)Example
Alternatively, you can use the lubridate
package to easily convert dates recorded in characters into Date
s.
Using lubridate
, you do not need to provide the format information unlike as.Date()
Instead, you simply use y
(year), m
(month), d
(day) in the order they appear in the dates in character.
Example
It is often the case that date values are not formatted in the way you want (e.g., when you are creating figures).
While you can use string manipulation functions to reformat dates (which we learn next in this lecture), it is easier to just use the format()
function.
You can use the same rule for the format
argument as the one we saw earlier when using as.Date()
.
Example
You can extract components (year, month, day) from a Date
object using various helper functions offered by lubridate
.
year()
: yearmonth()
: monthmday()
: day of monthyday()
: day of yearwday()
: day of weekExamples
Unlike dates in character, you can do some math on Date
objects.
You can use years()
, months()
, days()
from the lubridate
package to add specified years, months, and days, respectively.
You can use seq()
to create a sequence of dates, where the incremental step is defined by the by
option.
Package
For string (character) manipulation, we use the stringr
package, which is part of the tidyverse
package. So, you have installed it already.
stringr
is loaded automatically when you load tidyverse
. So, just load tidyverse
.
Resources
Functions
Here are the select functions we learn in this lecture:
stringr::str_c()
stringr::str_split()
(tidyr::separate()
)stringr::str_replace()
stringr::str_detect()
stringr::str_trim()
stringr::str_pad()
stringr::str_c()
lets you concatenate a vector of strings. It is basically the same as paste()
.
concatenate
order matters
separator
more than two strings
a string and a vector of strings
verbs
) are concatenated with a string ("R"
)"+"
) applied to all the vector elementscollapsing a vector of strings to a single string
collapse
option collapse all the vector elements into a single string with the collapse separator (here, %
) placed between the individual vector elementssep = "+"
is applied when concatenating a vector of strings and a string, and collapse = "%"
is applied when concatenating the resulting vector of strings.two vectors of equal length
n
th element of a vector (software_types
) is met with n
the element of the other vector (verbs
).two vectors of different lengths
n
th element of a vector (software_types
) is met with n
the element of the other vector (verbs
) with verbs
recycled for the elements in software_type
that are missing positional matches.all combinations
Sometimes, you want to concatenate two (or more) string variables into one variable.
For example, suppose you would like to combine pizza size
and type
into a single variable to make it easier to create faceted figures by size
-type
.
You can use stringr::str_c()
to create a vector of file names that have a common pattern.
For example suppose you have files that are named following this convention: “corn_yield_X.csv”, where X represents year.
You have such csv files starting from 2000 to 2020. Then,
[1] "corn_yield_2000.csv" "corn_yield_2001.csv" "corn_yield_2002.csv"
[4] "corn_yield_2003.csv" "corn_yield_2004.csv" "corn_yield_2005.csv"
Now, you can easily read each of them iteratively using a loop.
stringr::str_split()
splits a string based on a pattern you provide:
But, if you are splitting a variable into two variables, tidyr::separate()
is a better option.
How
You can use stringr::str_replace()
to replace parts of the texts matched with the user-specified texts.
Example
Note that the only the first occurrence of “rock” in each of the string vector element was replaced with “rock big time.”
You need to use stringr::str_replace_all()
to replace all the occurrences.
Suppose you would like to have a particular format of date in a figure you are trying to create using pizzaplace
: e.g., 07/08/20 (month, day, year without the first 2 digits).
Pretend that date_text
is the variable that indicates date and it looks like this:
So, you would like to replace “20” with “” (nothing).
Now you can create a figure with the dates in the desired format. From pizzaplace
, you could have just done this:
You can use stringr::str_detect()
to check if a user-specified texts are part of strings.
It takes a vector of strings and a text pattern, and then return a vector of TRUE/FALSE.
Example
First clone this repository.
Inside data/data-for-loop-demo, there are two sets of files in a single folder: corn_experiment_x.rds
and soy_experiment_y.rds
, where both x
and y
range from 1 to 30.
You want to read only the soy files.
First, let’s get the name of the whole list of files in the working directory:
all_files <-
list.files(
here::here("supplementary-material/data/data-for-loop-demo"),
full.names = TRUE
)
head(all_files, 2)
[1] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/corn_experiment_1.rds"
[2] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/corn_experiment_10.rds"
[1] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_8.rds"
[2] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_9.rds"
Now use stringr::str_detect()
to find which elements of all_files
include “soy.”
Okay so, here is the list of all the “soy” files:
[1] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_1.rds"
[2] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_10.rds"
[3] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_11.rds"
[4] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_12.rds"
[5] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_13.rds"
[6] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_14.rds"
[7] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_15.rds"
[8] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_16.rds"
[9] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_17.rds"
[10] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_18.rds"
[11] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_19.rds"
[12] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_2.rds"
[13] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_20.rds"
[14] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_21.rds"
[15] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_22.rds"
[16] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_23.rds"
[17] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_24.rds"
[18] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_25.rds"
[19] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_26.rds"
[20] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_27.rds"
[21] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_28.rds"
[22] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_29.rds"
[23] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_3.rds"
[24] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_30.rds"
[25] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_4.rds"
[26] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_5.rds"
[27] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_6.rds"
[28] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_7.rds"
[29] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_8.rds"
[30] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/supplementary-material/data/data-for-loop-demo/soy_experiment_9.rds"
Now, you can loop to read all the files.
# A tibble: 60,000 × 4
N_rate v corn_yield field_id
<dbl> <dbl> <dbl> <dbl>
1 248. 85.7 106. 1
2 237. 56.4 105. 1
3 227. 15.5 105. 1
4 175. 33.3 105. 1
5 236. 25.6 105. 1
6 169. -13.6 105. 1
7 237. 30.8 105. 1
8 240. 32.4 105. 1
9 158. -18.8 105. 1
10 247. -81.3 106. 1
# ℹ 59,990 more rows
Consider the following dataset of plant genes.
id gene
1 Zm_1 20_WW_BL_TP1
2 Zm_2 20_WW_BL_TP1
3 Zm_1 20_WW_BL_TP
4 Zm_2 20_WW_BL_TP
5 Zm_1 20_WW_ML_TP1
6 Zm_2 20_WW_ML_TP1
7 Zm_1 20_WW_ML_TP
8 Zm_2 20_WW_ML_TP
9 Zm_1 20_WW_TL_TP1
10 Zm_2 20_WW_TL_TP1
11 Zm_1 20_WW_TL_TP3
12 Zm_2 20_WW_TL_TP3
There are three different types of genes: those that have _BL_
,_ML_
, and _TL_
. The objective here is to make a variable that indicates gene group from the gene
variable.
gene_data %>%
mutate(gene_group = case_when(
stringr::str_detect(gene, "_BL_") ~ "BL",
stringr::str_detect(gene, "_ML_") ~ "ML",
stringr::str_detect(gene, "_TL_") ~ "TL"
))
id gene gene_group
1 Zm_1 20_WW_BL_TP1 BL
2 Zm_2 20_WW_BL_TP1 BL
3 Zm_1 20_WW_BL_TP BL
4 Zm_2 20_WW_BL_TP BL
5 Zm_1 20_WW_ML_TP1 ML
6 Zm_2 20_WW_ML_TP1 ML
7 Zm_1 20_WW_ML_TP ML
8 Zm_2 20_WW_ML_TP ML
9 Zm_1 20_WW_TL_TP1 TL
10 Zm_2 20_WW_TL_TP1 TL
11 Zm_1 20_WW_TL_TP3 TL
12 Zm_2 20_WW_TL_TP3 TL
Here are the collection of functions that let you change the letter case of strings.
To upper case
To lower case
Only the first letter is capitalized
We will work with the following data:
Use stringr::str_c()
to combine, year
, month
, and day
using “-” as the separator and convert the combined text to Date using lubridate
.