Read datasets in various formats (csv, xlsx, dta, and rds) containing corn yields in Nebraska counties for the year of 2008.
Write R objects as files in various formats
tidyverse
and haven
packages, which we will use later to read/write files.Note
The tidyverse
package does far more than just reading and writing files. We will learn it extensively later.
Check the format in which the dataset is stored by looking at the extension of the file (what comes after the file name and a dot)
corn.csv: a file format Microsoft Excel supports.
corn.xlsx: another format supported by Microsoft Excel, which may have more than one tabs of data sheets.
corn.dta: a format that STATA support (software that is immensely popular for economists).
corn.rds: a format that R supports.
When you import a dataset, you need to use a particular function that is appropriate for the particular type of format the dataset is in.
You can use read.csv()
from the base
package.
Syntax
Examples
You can use read_csv()
from the readr
package.
Syntax
Examples
In the previous slide, we provided the full path to the csv file to read onto R.
If you expect to import and/or export (save) datasets and R objects often from/to a particular directory, it would be nice to tell R to look for files in the directory by default. So, the R code looks more like this:
This will save us from writing out the full path every time we either import or export datasets.
You can do so by designating the directory as the working directory.
Once the working directory is set, R looks for files in that directory unless told otherwise.
It is not just when importing datasets. When you export an R object as a file, R will create a file in the working directory by default.
You can use setwd()
to designate a directory as the working directory:
You can check the current working directory using the getwd()
function:
Suppose it is convenient for you to set the working directory somewhere else than the folder where all the datasets are residing.
You can then provide the path to the file relative to the working directory like this:
This is equivalent to:
You can use ..
to move up a folder. For example, if you want to import corn_yields.csv stored in “~/Dropbox/TeachingUNL”, then the following works:
You can create an R Project using RStudio:
You can use read_excel()
from the readxl
package to read data sheets from an xls(x) file, which is part of the tidyverse
package.
The readxl
package is installed when you install the tidyverse
pacakge.
However, it is not loaded automatically when you load the tidyverse
package.
So, you need to library the package even if you have loaded the tidyverse
package.
Syntax
x
: sheet numberExamples
Import a sheet of an xls(x) file using read_excel()
:
Use the read_dta()
function from the haven
package.
Syntax
Examples
An rds ( r data set) file is a file type that is supported by R.
You can use the readRDS()
function to read an rds file.
No special packages are necessary.
Exporting datasets work much the same way as importing them.
Here is the list of functions that let you export a data.frame
or (tibble
) in different formats:
write_csv()
write_dta()
saveRDS()
Syntax
Examples
You can export any kind of R objects as an rds file.
a_list <- list(a = c("R", "rocks"), b = corn_yields)
saveRDS(a_list, "a_list.rds")
readRDS("a_list.rds")
$a
[1] "R" "rocks"
$b
# A tibble: 161 × 9
Year State FIPS County_name State_name Commodity `Data item` Irrigated
<int> <int> <int> <chr> <chr> <chr> <chr> <int>
1 2008 31 31019 BUFFALO NEBRASKA CORN CORN, GRAIN - Y… 0
2 2008 31 31019 BUFFALO NEBRASKA CORN CORN, GRAIN, IR… 1
3 2008 31 31041 CUSTER NEBRASKA CORN CORN, GRAIN - Y… 0
4 2008 31 31041 CUSTER NEBRASKA CORN CORN, GRAIN, IR… 1
5 2008 31 31047 DAWSON NEBRASKA CORN CORN, GRAIN - Y… 0
6 2008 31 31047 DAWSON NEBRASKA CORN CORN, GRAIN, IR… 1
7 2008 31 31077 GREELEY NEBRASKA CORN CORN, GRAIN - Y… 0
8 2008 31 31077 GREELEY NEBRASKA CORN CORN, GRAIN, IR… 1
9 2008 31 31079 HALL NEBRASKA CORN CORN, GRAIN - Y… 0
10 2008 31 31079 HALL NEBRASKA CORN CORN, GRAIN, IR… 1
# ℹ 151 more rows
# ℹ 1 more variable: Yield <int>
As you can see a list is saved as an rds file, and when imported, it is still a list.
Check the size of the corn data files in different formats.
Which one is the smallest?