03-1: Importing Files and Exporting to Files

Importing and Exporting Datasets


Importing and Exporting Datasets

  • Read datasets in various formats (csv, xlsx, dta, and rds) containing corn yields in Nebraska counties for the year of 2008.

  • Write R objects as files in various formats

  • Go here and clone the repository that hosts datasets used in this lecture
  • Install the tidyverse and haven packages, which we will use later to read/write files.
install.packages(c("tidyverse", "haven"))

Note

The tidyverse package does far more than just reading and writing files. We will learn it extensively later.

Check the format in which the dataset is stored by looking at the extension of the file (what comes after the file name and a dot)

  • corn.csv: a file format Microsoft Excel supports.

  • corn.xlsx: another format supported by Microsoft Excel, which may have more than one tabs of data sheets.

  • corn.dta: a format that STATA support (software that is immensely popular for economists).

  • corn.rds: a format that R supports.

When you import a dataset, you need to use a particular function that is appropriate for the particular type of format the dataset is in.

Read a CSV file

You can use read.csv() from the base package.


Syntax

#--- NOT RUN ---#  
data = read.csv(path to the file to import)


Examples

corn_yields_df <- read.csv("~/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/Lectures/Chapter-3-DataWrangling/corn_yields.csv")

You can use read_csv() from the readr package.


Syntax

#--- NOT RUN ---#  
data = readr::read_csv(path to the file to import)


Examples

corn_yields_tbl <- readr::read_csv("~/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/Lectures/Chapter-3-DataWrangling/corn_yields.csv")

Direction: evaluate corn_yields_df and corn_yields_tbl to see the differences.


Data read using read.csv():

class(corn_yields_df)
[1] "data.frame"


Data read using read_csv():

class(corn_yields_tbl) 
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 

Setting the working directory

  • In the previous slide, we provided the full path to the csv file to read onto R.

  • If you expect to import and/or export (save) datasets and R objects often from/to a particular directory, it would be nice to tell R to look for files in the directory by default. So, the R code looks more like this:

corn_yield <- read.csv("corn_yields.csv")


  • This will save us from writing out the full path every time we either import or export datasets.

  • You can do so by designating the directory as the working directory.

  • Once the working directory is set, R looks for files in that directory unless told otherwise.

  • It is not just when importing datasets. When you export an R object as a file, R will create a file in the working directory by default.

You can use setwd() to designate a directory as the working directory:

#--- Setting a directory (path) in your computer---#
setwd("/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/Lectures/Chapter-3-DataWrangling")


You can check the current working directory using the getwd() function:

#--- find the current working directory ---#
getwd()
[1] "/Users/tmieno2/Dropbox/TeachingUNL/Data-Science-with-R-Quarto/lectures/Chapter-3-DataWrangling"

Suppose it is convenient for you to set the working directory somewhere else than the folder where all the datasets are residing.

setwd("~/Dropbox/TeachingUNL/DataScience")


You can then provide the path to the file relative to the working directory like this:

data <- read_csv("Datasets/Chapter_3_data_wrangling/corn_yields.csv")


This is equivalent to:

data <- read_csv("~/Dropbox/TeachingUNL/DataScience/Datasets/Chapter_3_data_wrangling/corn_yields.csv")


You can use .. to move up a folder. For example, if you want to import corn_yields.csv stored in “~/Dropbox/TeachingUNL”, then the following works:

data <- read_csv("../corn_yields.csv")

You can create an R Project using RStudio:

  • click on a blue transparent box with a plus sign at the upper left corner of RStudio
  • click on “new directory” (to initiate a new folder) or “existing directory” (to designate an existing folder)

Let’s try

  • Create an R project
  • When you open an R Project folder, then the working directory is set at the project folder. Confirm this:
getwd() 

Read a sheet from an xls(x) file

  • You can use read_excel() from the readxl package to read data sheets from an xls(x) file, which is part of the tidyverse package.

  • The readxl package is installed when you install the tidyverse pacakge.

  • However, it is not loaded automatically when you load the tidyverse package.

  • So, you need to library the package even if you have loaded the tidyverse package.

library(readxl)

Syntax

read_excel(path to the file, sheet = x)
  • x: sheet number

Examples

Import a sheet of an xls(x) file using read_excel():

corn_08 <- read_excel("corn_yields.xls", sheet = 1) # 1st sheet 
corn_09 <- read_excel("corn_yields.xls", sheet = 2) # 2nd sheet
#--- check the class ---#
class(corn_08) 
[1] "tbl_df"     "tbl"        "data.frame"

Notice that the data is converted into a tibble (because the readxl package is part of the tidyverse package.).

Read a STATA data file (.dta)

Use the read_dta() function from the haven package.

#--- load the package ---#
library(haven) 


Syntax

#--- Syntax (NOT RUN) ---#
haven::read_dta(file path)


Examples

#--- import the data ---#
corn_yields <- haven::read_dta("corn_yields.dta")
#--- check the class ---#
class(corn_yields) 
[1] "tbl_df"     "tbl"        "data.frame"

Notice that the data is converted into a data.frame object, not a tibble.

Read an rds file

  • An rds ( r data set) file is a file type that is supported by R.

  • You can use the readRDS() function to read an rds file.

  • No special packages are necessary.

Syntax

readRDS("path to the file") 


Examples

corn_yields <- readRDS("corn_yields.rds") 
class(corn_yields)
[1] "tbl_df"     "tbl"        "data.frame"


Notice that the imported dataset is already a tibble object. This is because the R object exported as corn_yields.rds was tibble.

Export an R object

  • Exporting datasets work much the same way as importing them.

  • Here is the list of functions that let you export a data.frame or (tibble) in different formats:

    • csv: write_csv()
    • dta: write_dta()
    • rds: saveRDS()

Syntax

export_function(obeject name, file name)


Examples

#--- export as csv ---#
readr::write_csv(corn_yields, "corn_yields_exp_rownames.csv")

#--- export as dta ---#
haven::write_dta(corn_yields, "corn_yields_exp.dta")

#--- export as rds ---#
saveRDS(corn_yields, "corn_yields_exp.rds")

#--- export as xls file ---#
# just don't do it

You can export any kind of R objects as an rds file.

a_list <- list(a = c("R", "rocks"), b = corn_yields)   

saveRDS(a_list, "a_list.rds")

readRDS("a_list.rds")
$a
[1] "R"     "rocks"

$b
# A tibble: 161 × 9
    Year State  FIPS County_name State_name Commodity `Data item`      Irrigated
   <int> <int> <int> <chr>       <chr>      <chr>     <chr>                <int>
 1  2008    31 31019 BUFFALO     NEBRASKA   CORN      CORN, GRAIN - Y…         0
 2  2008    31 31019 BUFFALO     NEBRASKA   CORN      CORN, GRAIN, IR…         1
 3  2008    31 31041 CUSTER      NEBRASKA   CORN      CORN, GRAIN - Y…         0
 4  2008    31 31041 CUSTER      NEBRASKA   CORN      CORN, GRAIN, IR…         1
 5  2008    31 31047 DAWSON      NEBRASKA   CORN      CORN, GRAIN - Y…         0
 6  2008    31 31047 DAWSON      NEBRASKA   CORN      CORN, GRAIN, IR…         1
 7  2008    31 31077 GREELEY     NEBRASKA   CORN      CORN, GRAIN - Y…         0
 8  2008    31 31077 GREELEY     NEBRASKA   CORN      CORN, GRAIN, IR…         1
 9  2008    31 31079 HALL        NEBRASKA   CORN      CORN, GRAIN - Y…         0
10  2008    31 31079 HALL        NEBRASKA   CORN      CORN, GRAIN, IR…         1
# ℹ 151 more rows
# ℹ 1 more variable: Yield <int>

As you can see a list is saved as an rds file, and when imported, it is still a list.

Check the size of the corn data files in different formats.

Which one is the smallest?