2.3 Reading and writing vector data

The vast majority of people still use ArcGIS to handle spatial data, which has its own system of storing spatial data37 called shapefile. So, chances are that your collaborators use shapefiles. Moreover, there are many GIS data online that are available only as shapefiles. So, it is important to learn how to read and write shapefiles.

2.3.1 Reading a shapefile

We can use st_read() to read a shapefile. It reads in a shapefile and then turn the data into an sf object. Let’s take a look at an example.

#--- read a NE county boundary shapefile ---#
nc_loaded <- st_read(dsn = "Data", "nc")

Typically, you have two arguments to specify for st_read(). The first one is dsn, which is basically the path to folder in which the shapefile you want to import is stored. The second one is the name of the shapefile. Notice that you do not add .shp extension to the file name: nc, not nc.shp.38.

2.3.2 Writing to a shapefile

Writing an sf object as a shapefile is just as easy. You use the st_write() function, with the first argument being the name of the sf object you are exporting, and the second being the name of the new shapefile. For example, the code below will export an sf object called nc_loaded as nc2.shp (along with other supporting files).

st_write(
  nc_loaded,
  dsn = "Data",
  layer = "nc2",
  driver = "ESRI Shapefile",
  append = FALSE
)

append = FALSE forces writing the data when a file already exists with the same name. Without the option, this happens.

st_write(
  nc_loaded,
  dsn = "Data",
  layer = "nc2",
  driver = "ESRI Shapefile"
)
Layer nc2 in dataset Data already exists:
use either append=TRUE to append to layer or append=FALSE to overwrite layer
Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), : Dataset already exists.

2.3.3 Better alternatives

Now, if your collaborator is using ArcGIS and demanding that he/she needs a shapefile for his/her work, sure you can use the above command to write a shapefile. But, there is really no need to work with the shapefile system. One of the alternative data formats that is considered superior to the shapefile system is GeoPackage39, which overcomes various limitations associated with shapefile40. Unlike the shapefile system, it produces only a single file with .gpkg extension.41 Note that GeoPackage files can also be easily read into ArcGIS. So, it might be worthwhile to convince your collaborators to stop using shapefiles and start using GeoPackage.

#--- write as a gpkg file ---#
st_write(nc, dsn = "Data/nc.gpkg", append = FALSE)

#--- read a gpkg file ---#
nc <- st_read("Data/nc.gpkg")

Or better yet, if your collaborator uses R (or if it is only you who is going to use the data), then just save it as an rds file using saveRDS(), which can be of course read using readRDS().

#--- save as an rds ---#
saveRDS(nc, "Data/nc_county.rds")

#--- read an rds ---#
nc <- readRDS("Data/nc_county.rds")

The use of rds files can be particularly attractive when the dataset is large because rds files are typically more memory efficient than shapefiles, eating up less of your disk memory.

As you can see here, it is a myth that spatial datasets have to be stored as shapefiles.


  1. See here for how spatial datasets can be stores in various other formats.↩︎

  2. When storing a spatial dataset, ArcGIS divides the information into separate files. All of them have the same prefix, but have different extensions. We typically say we read a shapefile, but we really are importing all these files including the shapefile with the .shp extension. When you read those data, you just refer to the common prefix because you really are importing all the files, not just a .shp file.↩︎

  3. here↩︎

  4. see the last paragraph of chapter 7.5 of this book, this blogpost, or this↩︎

  5. Am I the only one who gets very frustrated when your collaborator attaches 15 files for three geographic objects to an email? It could have been just three files using the GeoPackage format.↩︎