Before you start

In this chapter we learn how to use the sf package to handle and operate on spatial datasets. The sf package uses the class of simple feature (sf)³⁰ for spatial objects in R. We first learn how sf objects store and represent spatial datasets. We then move on to the following practical topics:

read and write a shapefile and spatial data in other formats (and why you might not want to use the shapefile system any more, but use other alternative formats)
project and reproject spatial objects
convert sf objects into sp objects, vice versa
confirm that dplyr works well with sf objects
implement non-interactive (does not involve two sf objects) geometric operations on sf objects
- create buffers
- find the area of polygons
- find the centroid of polygons
- calculate the length of lines

`sf` or `sp`?

The sf package was designed to replace the sp package, which has been one of the most popular and powerful spatial packages in R for more than a decade. It has been about four years since the sf package was first registered on CRAN. A couple of years back, many other spatial packages did not have support for the package yet. In this blog post the author responded to the questions of whether one should learn sp or sf saying,

“That’s a tough question. If you have time, I would say, learn to use both. sf is pretty new, so a lot of packages that depend on spatial classes still rely on sp. So you will need to know sp if you want to do any integration with many other packages, including raster (as of March 2018).

However, in the future we should see an increasing shift toward the sf package and greater use of sf classes in other packages. I also think that sf is easier to learn to use than sp.”

The future has come, and it’s not a tough question anymore. I cannot think of any major spatial packages that do not support sf package, and sf has largely becomes the standard for handling vector data in \(R\)³¹. Thus, this lecture note does not cover how to use sp at all.

sf has several advantages over the sp package (Pebesma 2018).³² First, it cut off the tie that sp had with ESRI shapefile system, which has a somewhat loose way of representing spatial data. Instead, it uses simple feature access, which is an open standard supported by Open Geospatial Consortium (OGC). Another important benefit is its compatibility with the tidyverse package, which includes widely popular packages like ggplot2 and dplyr. Consequently, map-making with ggplot() and data wrangling with a family of dplyr functions come very natural to many \(R\) users. sp objects have different slots for spatial information and attributes data, and they are not amenable to dplyr way of data transformation.

Direction for replication

Datasets

All the datasets that you need to import are available here. In this chapter, the path to files is set relative to my own working directory (which is hidden). To run the codes without having to mess with paths to the files, follow these steps:

set a folder (any folder) as the working directory using setwd()
create a folder called “Data” inside the folder designated as the working directory (if you have created a “Data” folder to replicate demonstrations in Chapter 1, then skip this step)
download the pertinent datasets from here
place all the files in the downloaded folder in the “Data” folder

Packages

Run the following code to install or load (if already installed) the pacman package, and then install or load (if already installed) the listed package inside the pacman::p_load() function.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  sf, # vector data operations
  dplyr, # data wrangling
  data.table, # data wrangling
  tmap, # make maps
  mapview # create an interactive map
)

References

Pebesma, Edzer. 2018. “Simple Features for r: Standardized Support for Spatial Vector Data.” R Journal 10 (1).

Yes, it is the same as the package name.↩︎
Even if there are packages that do not support sf, you can always go back and forth between sp and sf objects, which we will learn in Chapter @ref(conv_sp)↩︎
There are cases where sp is faster completing the same task than sf. For example, see the answer to this question. But, I doubt the difference between the two is practically important even with bigger data than the test data.↩︎

Before you start

sf or sp?

Direction for replication

References

`sf` or `sp`?