09-1: R as GIS: Vector Data Basics with the sf package

Before you start


Learning objectives

The objectives of this chapter is to learn how to use R as GIS, specifically how to handle vector spatial data.


Tips to make the most of the lecture notes


Interactive navigation tools

  • Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want

  • Hit letter “o” on your keyboard and you will have a panel view of all the slides


Running and writing codes

  • The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
  • Hit the “Run Code” button to execute all the code inside the code area.
  • You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
  • If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
  • You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

Getting started


Prerequisites

You understand

  • What Geographic Coordinate System (GCS), Coordinate Reference System (CRS), and projection are (this is a good resource)

  • Distinctions between vector and raster data (this is a simple summary of the difference)

  • The sf package provides a simply way of storing geographic information and the attributes of the geographic units in a single dataset called simple feature (sf).

  • The sf package allows you to do almost all the spatial operations you would need for your research

    • Projection
    • Read/write to spatial datasets in various formats (including shape files)
    • Non-interactive geometrical operations
      • create buffers
      • calculate area
      • calculate distance
    • Interactive geometrical operations
      • spatially subset datasets
      • extracting values from the intersecting spatial objects

`

Read the North Carolina county boundary data:


Check the class:

Understanding the data structure of sf

  • The first line tells you this is an simple feature (sf) object with 100 features and 3 attributes (fields)

  • So, an sf object looks just like a data.frame where rows representing observation units, columns representing attributes, except for a special column named geometry

  • The geometry column stores the geographic information of the observation units (here, county)

Example

Ashe County (1st row) has area of 0.114, FIPS code of 37009, and so on. And the entry in geometry column at the first row represents the geographic information of Ashe County.

  • An element of the geometry columns is a simple feature geometry (sfg).

  • In general, sfg represents the geographic information of a single geometric feature (here, county).

  • There are different types of sfgs (POINT, LINESTRING, POLYGON, MULTIPOLYGON, etc)

  • In this example, all the sfgs are of type MULTIPOLYGON

  • A collection of multiple sfgs as a column is called simple feature geometry column (sfc), which can make a geometry column in an sf object

Let’s see what an sfg is made of.


  • Each row represents a point

    • 1st column: longitude
    • 2nd column: latitude
  • Points are stored in a matrix format

  • Connecting all the points forms a polygon

Simple Feature Geometry (sfg)

Some of the moist common types of spatial objects represented by sfg are the following:

  • POINT: area-less feature that represents a point (e.g., well, city, farmland)

  • LINESTRING: (e.g., a tributary of a river)

  • MULTILINESTRING: (e.g., river with more than one tributary)

  • POLYGON: geometry with a positive area (e.g., county, state, country)

  • MULTIPOLYGON: collection of polygons to represent a single object (e.g., countries with islands: U.S., Japan, etc)

POINT is the simplest geometry type and is represented by a vector of two numeric values. An example below shows how a POINT feature can be made from scratch:


The st_point() function creates a POINT object when supplied with a vector of two numeric values. If you check the class of the newly created object,


you can see that it’s indeed a POINT object. But, it’s also an sfg object. So, a_point is an sfg object of type POINT.

A LINESTRING object is represented by a sequence of points in a matrix:


You can turn the matrix into a LINESTRING using sf::st_linestring():


Let’s plot it.

As you can see, each pair of consecutive points in the matrix are connected by a straight line to form a line.

Just like the LINESTRING object we created earlier, a POLYGON is represented by a collection of points.

However, the first and last points in the matrix have to be the same to form a polygon


You can turn the matrix into a POLYGON using st_polygon(), which takes a matrix in a list() :


Let’s plot it.

A POLYGON can have holes in it. The first matrix of a list becomes the exterior ring, and all the subsequent matrices will be holes within the exterior ring.


Let’s plot it.

To create a MULTIPOLYGON object you create a list of lists of matrices, with each inner list representing a polygon.


You supply a list of lists of matrices to the st_multipolygon() function to make a MULTIPOLYGON object.


Each of list(p1,p2) and list(p3) represents a polygon.

Create a POINT


Create a LINESTRING


Create a POLYGON

Constructing simple feature column (sfc) and simple feature (sf)

  • sfg is an object class that represents a single spatial object.

  • We can combine multiple sfgs as a list to create a simple feature geometry list-column (sfc).



To make a simple feature geometry list-column (sfc), you can simply supply a list of sfg to the st_sfc() function as follows:


Check its class:

To create an sf object, you first add an sfc as a column to a data.frame.


At this point, it is not yet recognized as an sf by R yet.

You can register it as an sf object using st_as_sf().


As you can see sf_ex is now recognized also as an sf object.

Create an sfc using the POINT and POLYGON you made earlier.


Create an sf object using the sfc object you created in the previous exercise, where the additional variable in the sf object is id with the POINT and POLYGON assigned id = 1 and id = 2, respectively.

Reading and writing vector data

  • The vast majority of people still use ArcGIS software to handle spatial data, which has its own system of storing spatial data called shapefile system.

  • shapefile is a collection of files including

    • .shp: stores geometry information (like sfg)
    • .prj: projection information
  • Chances are that your collaborators use shapefiles.

  • There are many GIS data online that are available only as shapefiles.

  • So, it is important to learn how to read and write shapefiles

You can use sf::st_read() to read a shapefile. It reads in a shapefile and then turn the data into an sf object.

Syntax

st_read(file_path)
  • file_path: the path to the shapefile.


Example

nc_imported <- st_read("Data/nc_practice.shp") 

Here, it is reading a file named nc_practice.shp (along with other related files) is read from the Data folder.


Try yourself

  • download nc_practice.shp from here and other supporting files to where you would like on your computer
  • find and copy the path to the file
  • import the data using sf::st_read()

You can use the sf::st_write() function to write an sf object to shape files.


Syntax

st_write(sf object, file path, append = FALSE)
  • append = FALSE forces writing the data when the shape files with the same name already exists


Example

st_write(nc_imported, "Data/nc_exported.shp")

This code will export an sf object called nc_imported as nc_exported.shp (along with other supporting files) in the “Data” folder relative to the working directory.


Try yourself

  • export the sf object you read earlier using sf::st_write() using whatever name you like
  • If your collaborators are using ArcGIS and demanding that they need a shapefile for their work, sure you can write to a shapefile.

  • But, there is really no need to work with the shapefile system if you are not using ArcGIS.

  • Basically, we are using the file system just because ArcGIS is the pioneer of GIS software and many people are still using it, but not because it is the best format available to store spatial objects.

  • Indeed, there are some limitations to shape files (see here).

  • But, first and foremost, it is annoying to have many files for a single spatial object.

A format that is increasingly popular is GeoJSON.

  • Unlike the shapefile system, it produces only a single file with .geojson extension.

  • GeoJSON files can also be read into ArcGIS.


Write

To write an sf object to a GeoJSON file, you simply give the file path to the dsn option (note that you do not use the layer option unlike the shape files case).

#--- write as a gpkg file ---#
st_write(nc, dsn = "Data/nc_exported.geojson")


Read

You can use the sf::st_read() function to read a GeoJSON file like below:

nc <- st_read("Data/nc_exported.geojson")

One of the alternative data formats that is considered superior to the shapefile system is GeoPackage, which overcomes various limitations associated with shapefile.

  • Unlike the shapefile system, it produces only a single file with .gpkg extension.

  • GeoPackage files can also be read into ArcGIS.


Write

To write an sf object to a GeoPackage file, you simply give the file path to the dsn option (note that you do not use the layer option unlike the shape files case).

#--- write as a gpkg file ---#
st_write(nc_imported, dsn = "Data/nc_exported.gpkg")


Read

You can use the sf::st_read() function to read a GeoPackage file like below:

nc <- st_read("Data/nc_exported.gpkg")

Or better yet, if your collaborator uses R (or if it is only you who is going to use the data), then just save the sf object as an .rds file using saveRDS(), which can be of course read using readRDS().


Save

#--- save as an rds ---#
saveRDS(nc_imported, "Data/nc_exported.rds")


Read

#--- read an rds ---#
nc <- readRDS("Data/nc_exported.rds")


Note

The use of rds files can be particularly attractive when the dataset is large because rds files are typically more memory efficient than shape files.

Projection

You often need to project or re-project an sf using a different coordinate reference system (CRS) because you need it to have the same CRS as an sf object that you are interacting it with (spatial join) or mapping it with.

In order to check the current CRS for an sf object, you can use the sf::st_crs() function.


  • sf uses the Well Known Text format to store the coordinate reference system (CRS), which is one of many many formats to store CRS information (See here)

  • ID["EPSG", 4267] means that the EPSG code for this CRS is 4267

    • EPSG code is a CRS reference system developed by by European Petroleum Survey Group (EPSG)
    • You can find the CRS-EPSG number correspondence here.
  • When you transform an sf using a different CRS, you can use its EPSG number if the CRS has an EPSG number

    • Potential pool of CRS is infinite.
    • Only the commonly-used CRS have been assigned EPSG SRID.

You can use sf::st_transform() to apply a different projection method to an sf object.


Syntax

st_transform(sf, EPSG number or CRS in WTK)


Let’s transform (reproject) the data using NAD83 / UTM zone 14N CRS. Its EPSG number is 26914.


Let’s confirm the change in CRS:


Let’s compare the geometry column before and after the transformation (projection):


  • There is a function that sets CRS, namely sf::st_set_crs().

  • This function literally sets the CRS, but does not transform geometry accordingly unlike sf::st_transform().

  • So, doing this is a terrible mistake and the resulting sf object is no longer where it should be.

  • You often need to change the CRS of an sf object when you interact (e.g., spatial subsetting, joining, etc) it with another sf object.

  • In such a case, you can extract the CRS of the other sf object using st_crs() and use it for transformation.

  • So, you do not need to find the EPSG of the CRS of the sf object you are interacting it with.



Example


Let’s confirm the transformation:

Check the CRS of Fairway_Five.


Find the EPSG code for WGS 84, and change the CRS of Fairway_Five to WGS 84 using the EPSG code.

Quick Visualization

The easiest way to visualize an sf object is to use plot():


  • plot() create a map for each variable where the spatial units are color-differentiated based on the values of the variable

  • We will learn how to create more elaborate maps that are of publication-quality using the ggplot2 package later

Sometimes it is useful to be able to tell where certain spatial objects are and what values are associated with them on a map.

The mapView() function from the mapview package can create an interactive map where you can point to a spatial object and the associated information is revealed on the map.

Run the following codes on your computer.

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
mapview::mapView(nc)  

Turning a data.frame of points into an sf

  • Often times, you have a dataset with geographic coordinates as variables in a csv or other formats

  • It would not be recognized immediately as a spatial dataset by R when it is read into R.

  • In this case, you need to identify which variables represent the geographic coordinates from the data set, and create an sf yourself.

  • Fortunately, it is easy to do so using the sf::st_as_sf() function.

Let’s get a dataset (irrigation wells in Nebraska) to work with:


wells_ne is a data.frame and has longdd and latdd representing longitude and latitude, respectively. Note that it is NOT an sf object.


Important

  • YOU need to know the GRS/CRS of your data because you need to provide R with that information!
  • The geographic coordinates system of this data is NAD 83 (epsg=4269) for this dataset.

We can turn a dataset (e.g., data.frame, tibble, data.table) into an sf object using sf::st_as_sf().

Syntax

sf::st_as_sf(
  data.frame, 
  coords = c(
    longitude var name, 
    latitude var name
  ),
  crs = crs
)


Example

Using the LAT (latitude) and LNG (longitude) columns, turn the tibble into an sf, and then assign the CRS of WGS 84 using its EPSG code.

Conversion to and from sp objects

  • The sp package is the predecessor of the sf package (developed by the same person)

  • There are many (older) packages that only accept spatial objects defined by the sp package

    • spdep: spatial econometrics
    • GWmodel: runs geographically-weighted regression
  • In that case, it is good to know how to convert an sf object to an sp object, vice versa.

You can convert an sf object to its sp counterpart by as(sf_object, "Spatial")

You can convert an sp object to its sf counterpart by sf::st_as_sf(sp_object).

Non-spatial Transformation of sf

  • An important feature of an sf object is that it is basically a data.frame with geometric information stored as a variable (column).

  • This means that transforming an sf object works just like transforming a data.frame.

  • Basically, everything you can do to a data.frame, you can do to an sf as well.

  • dplyr verbs work well with sf

The following code selects wellid variable using dplyr::select():

Notice that geometry column will be retained after dplyr::select() even if you did not tell R to keep it above.


Of course, you can apply other dplyr verbs just like you do with a data.frame. Here, let’s apply dplyr::select(), dplyr::filter(), and dplyr::mutate() in sequence using a piping operator.


Note

You cannot do this with the spatial objects defined by the sp package

Non-interactive geometrical operations

You can use sf::st_buffer() to create buffers of the specified length around points, lines, and polygons.


Syntax

st_buffer(sf, dist = distance)
  • dist: provide the distance in the unit of the CRS (run st_crs(sf)$units to get the unit)

Let’s create a buffer of 2000 meter.


Here is what it looks like:


Yes, you see zig-zag. You can read up on this here. For now, I would recommend that you project first and then create a buffer.

Let’s first project and then create a buffer:


Here is what it looks like:

You can use st_area() to calculate the area of all the polygons in an sf object.


By default, area calculated by st_area() is units.


So, you want to convert it to a numeric variable like this if you want to subject it to numeric operations later:

You can use st_centroid() to find the centroid of each of all the polygons in an sf object.

As you can see, st_centroid() returns an sf of centroids as points. The centroids look like this:



If you want longitude (X) and latitude (Y) of the centroids, you can further apply st_coordinates().


Of course, you can easily add the XY matrix to the original sf file using cbind():

Syntax

st_distance(sf_1, sf_2)

This finds the distance between each of the points in sf_1 and each of the points in sf_2.

Get a matrix of distances whose \([i,j]\) element is the distance between the \(i\)th sfg of st_centroid(nc[1:5, ]) and \(j\)th sfg of st_centroid(nc[6:15, ]):


Note

Notice that, even though nc is unprojected, distances returned are in meter.

Sometimes you want to combine all the geometries in a single sf. For example, you may want to get the centroid of North Carolina using nc (of course, you can alternatively get sf of NC state boundary, instead of counties in this case).

You can use sf::st_union() to achieve this. Note that the returned object is sfc, not sf.


Here is what it looks like.

Exercise

Run the following code to get fairway_grid data. Then, inspect the data (e.g., by plotting) to get a good sense of what the data looks like.

First, plot fairway_grid to get a sense of what the dataset looks like:


Transform fairway_grid so that its CRS is NAD 83/UTM zone 14N (its EPSG code is 26914) and name it fairway_grid_utm.


Answer codes

Code
plot(fairway_grid) 

fairway_grid_utm <- st_transform(fairway_grid, 26914)

Create buffers around the grid polygons in fairway_grid_utm where the radius of the buffer is 10 meter, and name it fairway_grid_buffers.


Answer codes

Code
fairway_grid_buffers <- st_buffer(fairway_grid_utm, dist = 10)

Find the centroid of each of the buffer polygons you created in Exercise 2, and then name it buffers_centroids.


Answer codes

Code
buffers_centroids <- st_centroid(fairway_grid_buffers)

Calculated the distances between the centroids in buffers_centroids.


Answer codes

Code
st_distance(buffers_centroids)