Before you start
In this chapter we learn how to use the sf
package to handle and operate on spatial datasets. The sf
package uses the class of simple feature (sf
)30 for spatial objects in R. We first learn how sf
objects store and represent spatial datasets. We then move on to the following practical topics:
- read and write a shapefile and spatial data in other formats (and why you might not want to use the shapefile system any more, but use other alternative formats)
- project and reproject spatial objects
- convert
sf
objects intosp
objects, vice versa - confirm that
dplyr
works well withsf
objects - implement non-interactive (does not involve two
sf
objects) geometric operations onsf
objects- create buffers
- find the area of polygons
- find the centroid of polygons
- calculate the length of lines
sf
or sp
?
The sf
package was designed to replace the sp
package, which has been one of the most popular and powerful spatial packages in R for more than a decade. It has been about four years since the sf
package was first registered on CRAN. A couple of years back, many other spatial packages did not have support for the package yet. In this blog post the author responded to the questions of whether one should learn sp
or sf
saying,
“That’s a tough question. If you have time, I would say, learn to use both. sf is pretty new, so a lot of packages that depend on spatial classes still rely on sp. So you will need to know sp if you want to do any integration with many other packages, including raster (as of March 2018).
However, in the future we should see an increasing shift toward the sf package and greater use of sf classes in other packages. I also think that sf is easier to learn to use than sp.”
The future has come, and it’s not a tough question anymore. I cannot think of any major spatial packages that do not support sf
package, and sf
has largely becomes the standard for handling vector data in \(R\)31. Thus, this lecture note does not cover how to use sp
at all.
sf
has several advantages over the sp
package (Pebesma 2018).32 First, it cut off the tie that sp
had with ESRI shapefile system, which has a somewhat loose way of representing spatial data. Instead, it uses simple feature access, which is an open standard supported by Open Geospatial Consortium (OGC). Another important benefit is its compatibility with the tidyverse
package, which includes widely popular packages like ggplot2
and dplyr
. Consequently, map-making with ggplot()
and data wrangling with a family of dplyr
functions come very natural to many \(R\) users. sp
objects have different slots for spatial information and attributes data, and they are not amenable to dplyr
way of data transformation.
Direction for replication
Datasets
All the datasets that you need to import are available here. In this chapter, the path to files is set relative to my own working directory (which is hidden). To run the codes without having to mess with paths to the files, follow these steps:
- set a folder (any folder) as the working directory using
setwd()
- create a folder called “Data” inside the folder designated as the working directory (if you have created a “Data” folder to replicate demonstrations in Chapter 1, then skip this step)
- download the pertinent datasets from here
- place all the files in the downloaded folder in the “Data” folder
Packages
Run the following code to install or load (if already installed) the pacman
package, and then install or load (if already installed) the listed package inside the pacman::p_load()
function.
if (!require("pacman")) install.packages("pacman")
::p_load(
pacman# vector data operations
sf, # data wrangling
dplyr, # data wrangling
data.table, # make maps
tmap, # create an interactive map
mapview )
References
Yes, it is the same as the package name.↩︎
Even if there are packages that do not support
sf
, you can always go back and forth betweensp
andsf
objects, which we will learn in Chapter @ref(conv_sp)↩︎There are cases where
sp
is faster completing the same task thansf
. For example, see the answer to this question. But, I doubt the difference between the two is practically important even with bigger data than the test data.↩︎