09-1: R as GIS: Vector Data Basics with the sf package

Before you start

Learning objectives

The objectives of this chapter is to learn how to use R as GIS, specifically how to handle vector spatial data.

Understanding the data structure of sf
Simple Feature Geometry(sfg)
Constructing simple feature column (sfc) and simple feature (sf)
Reading and writing vector data
Projection
Quick Visualization
Turning a data.frame of points into an sf
Conversion to and from sp objects
Non-spatial Transformation of sf
Non-interactive geometrical operations

Pre-requisite (Links)

Create maps from vector data using ggplot2
Spatial interaction of vector data
- Topological relations and subset
- Spatial join
Vector and raster data interactions
- cropping
- masking
- value extraction

Tips to make the most of the lecture notes

Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides

Running and writing codes

The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
Hit the “Run Code” button to execute all the code inside the code area.
You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

Prerequisites

You understand

What Geographic Coordinate System (GCS), Coordinate Reference System (CRS), and projection are (this is a good resource)
Distinctions between vector and raster data (this is a simple summary of the difference)

The sf package provides a simply way of storing geographic information and the attributes of the geographic units in a single dataset called simple feature (sf).
The sf package allows you to do almost all the spatial operations you would need for your research
- Projection
- Read/write to spatial datasets in various formats (including shape files)
- Non-interactive geometrical operations
  - create buffers
  - calculate area
  - calculate distance
- Interactive geometrical operations
  - spatially subset datasets
  - extracting values from the intersecting spatial objects

Read the North Carolina county boundary data:

Check the class:

Understanding the data structure of `sf`

Inside an sf object
The geometry column
Simple Feature Geometry (sfg)

The first line tells you this is an simple feature (sf) object with 100 features and 3 attributes (fields)
So, an sf object looks just like a data.frame where rows representing observation units, columns representing attributes, except for a special column named geometry
The geometry column stores the geographic information of the observation units (here, county)

Example

Ashe County (1st row) has area of 0.114, FIPS code of 37009, and so on. And the entry in geometry column at the first row represents the geographic information of Ashe County.

An element of the geometry columns is a simple feature geometry (sfg).
In general, sfg represents the geographic information of a single geometric feature (here, county).
There are different types of sfgs (POINT, LINESTRING, POLYGON, MULTIPOLYGON, etc)
In this example, all the sfgs are of type MULTIPOLYGON
A collection of multiple sfgs as a column is called simple feature geometry column (sfc), which can make a geometry column in an sf object

Let’s see what an sfg is made of.

Each row represents a point
- 1st column: longitude
- 2nd column: latitude
Points are stored in a matrix format
Connecting all the points forms a polygon

Simple Feature Geometry (`sfg`)

Different types
POINT
LINESTRING
POLYGON
POLYGON with holes
MULTIPOLYGON
Exercises

Some of the moist common types of spatial objects represented by sfg are the following:

POINT: area-less feature that represents a point (e.g., well, city, farmland)
LINESTRING: (e.g., a tributary of a river)
MULTILINESTRING: (e.g., river with more than one tributary)
POLYGON: geometry with a positive area (e.g., county, state, country)
MULTIPOLYGON: collection of polygons to represent a single object (e.g., countries with islands: U.S., Japan, etc)

POINT is the simplest geometry type and is represented by a vector of two numeric values. An example below shows how a POINT feature can be made from scratch:

The st_point() function creates a POINT object when supplied with a vector of two numeric values. If you check the class of the newly created object,

you can see that it’s indeed a POINT object. But, it’s also an sfg object. So, a_point is an sfg object of type POINT.

A LINESTRING object is represented by a sequence of points in a matrix:

You can turn the matrix into a LINESTRING using sf::st_linestring():

Let’s plot it.

As you can see, each pair of consecutive points in the matrix are connected by a straight line to form a line.

Just like the LINESTRING object we created earlier, a POLYGON is represented by a collection of points.

However, the first and last points in the matrix have to be the same to form a polygon

You can turn the matrix into a POLYGON using st_polygon(), which takes a matrix in a list() :

Let’s plot it.

A POLYGON can have holes in it. The first matrix of a list becomes the exterior ring, and all the subsequent matrices will be holes within the exterior ring.

Let’s plot it.

To create a MULTIPOLYGON object you create a list of lists of matrices, with each inner list representing a polygon.

You supply a list of lists of matrices to the st_multipolygon() function to make a MULTIPOLYGON object.

Each of list(p1,p2) and list(p3) represents a polygon.

Create a `POINT`

Create a `LINESTRING`

Create a `POLYGON`

Constructing simple feature column (`sfc`) and simple feature (`sf`)

Constructing sfc from sfgs
Constructing an sf using an sfc
Exercises

sfg is an object class that represents a single spatial object.
We can combine multiple sfgs as a list to create a simple feature geometry list-column (sfc).

To make a simple feature geometry list-column (sfc), you can simply supply a list of sfg to the st_sfc() function as follows:

Check its class:

Crate a data.frame with geometry
Register as sf

To create an sf object, you first add an sfc as a column to a data.frame.

At this point, it is not yet recognized as an sf by R yet.

You can register it as an sf object using st_as_sf().

As you can see sf_ex is now recognized also as an sf object.

Exercises 2.1
Exercises 2.2

Create an sfc using the POINT and POLYGON you made earlier.

Create an sf object using the sfc object you created in the previous exercise, where the additional variable in the sf object is id with the POINT and POLYGON assigned id = 1 and id = 2, respectively.

Reading and writing vector data

Basics
Importing a shapefile
Exporting to a shapefile
Alternative formats

The vast majority of people still use ArcGIS software to handle spatial data, which has its own system of storing spatial data called shapefile system.
shapefile is a collection of files including
- .shp: stores geometry information (like sfg)
- .prj: projection information

Chances are that your collaborators use shapefiles.
There are many GIS data online that are available only as shapefiles.
So, it is important to learn how to read and write shapefiles

You can use sf::st_read() to read a shapefile. It reads in a shapefile and then turn the data into an sf object.

Syntax

st_read(file_path)

file_path: the path to the shapefile.

Example

nc_imported <- st_read("Data/nc_practice.shp")

Here, it is reading a file named nc_practice.shp (along with other related files) is read from the Data folder.

Try yourself

download nc_practice.shp from here and other supporting files to where you would like on your computer
find and copy the path to the file
import the data using sf::st_read()

You can use the sf::st_write() function to write an sf object to shape files.

Syntax

st_write(sf object, file path, append = FALSE)

append = FALSE forces writing the data when the shape files with the same name already exists

Example

st_write(nc_imported, "Data/nc_exported.shp")

This code will export an sf object called nc_imported as nc_exported.shp (along with other supporting files) in the “Data” folder relative to the working directory.

Try yourself

export the sf object you read earlier using sf::st_write() using whatever name you like

Motivation
GeoJSON
Geopackage
R data set (rds)

If your collaborators are using ArcGIS and demanding that they need a shapefile for their work, sure you can write to a shapefile.
But, there is really no need to work with the shapefile system if you are not using ArcGIS.
Basically, we are using the file system just because ArcGIS is the pioneer of GIS software and many people are still using it, but not because it is the best format available to store spatial objects.
Indeed, there are some limitations to shape files (see here).
But, first and foremost, it is annoying to have many files for a single spatial object.

A format that is increasingly popular is GeoJSON.

Unlike the shapefile system, it produces only a single file with .geojson extension.
GeoJSON files can also be read into ArcGIS.

Write

To write an sf object to a GeoJSON file, you simply give the file path to the dsn option (note that you do not use the layer option unlike the shape files case).

#--- write as a gpkg file ---#
st_write(nc, dsn = "Data/nc_exported.geojson")

Read

You can use the sf::st_read() function to read a GeoJSON file like below:

nc <- st_read("Data/nc_exported.geojson")

One of the alternative data formats that is considered superior to the shapefile system is GeoPackage, which overcomes various limitations associated with shapefile.

Unlike the shapefile system, it produces only a single file with .gpkg extension.
GeoPackage files can also be read into ArcGIS.

Write

To write an sf object to a GeoPackage file, you simply give the file path to the dsn option (note that you do not use the layer option unlike the shape files case).

#--- write as a gpkg file ---#
st_write(nc_imported, dsn = "Data/nc_exported.gpkg")

Read

You can use the sf::st_read() function to read a GeoPackage file like below:

nc <- st_read("Data/nc_exported.gpkg")

Or better yet, if your collaborator uses R (or if it is only you who is going to use the data), then just save the sf object as an .rds file using saveRDS(), which can be of course read using readRDS().

Save

#--- save as an rds ---#
saveRDS(nc_imported, "Data/nc_exported.rds")

Read

#--- read an rds ---#
nc <- readRDS("Data/nc_exported.rds")

Note

The use of rds files can be particularly attractive when the dataset is large because rds files are typically more memory efficient than shape files.

Projection

Motivation
Check the current CRS
Project or re-project sf objects
Using the CRS of another sf
Exercises

You often need to project or re-project an sf using a different coordinate reference system (CRS) because you need it to have the same CRS as an sf object that you are interacting it with (spatial join) or mapping it with.

In order to check the current CRS for an sf object, you can use the sf::st_crs() function.

sf uses the Well Known Text format to store the coordinate reference system (CRS), which is one of many many formats to store CRS information (See here)
ID["EPSG", 4267] means that the EPSG code for this CRS is 4267
- EPSG code is a CRS reference system developed by by European Petroleum Survey Group (EPSG)
- You can find the CRS-EPSG number correspondence here.
When you transform an sf using a different CRS, you can use its EPSG number if the CRS has an EPSG number
- Potential pool of CRS is infinite.
- Only the commonly-used CRS have been assigned EPSG SRID.

How
Example
Caveat

You can use sf::st_transform() to apply a different projection method to an sf object.

Syntax

st_transform(sf, EPSG number or CRS in WTK)

Let’s transform (reproject) the data using NAD83 / UTM zone 14N CRS. Its EPSG number is 26914.

Let’s confirm the change in CRS:

Let’s compare the geometry column before and after the transformation (projection):

There is a function that sets CRS, namely sf::st_set_crs().
This function literally sets the CRS, but does not transform geometry accordingly unlike sf::st_transform().
So, doing this is a terrible mistake and the resulting sf object is no longer where it should be.

You often need to change the CRS of an sf object when you interact (e.g., spatial subsetting, joining, etc) it with another sf object.
In such a case, you can extract the CRS of the other sf object using st_crs() and use it for transformation.
So, you do not need to find the EPSG of the CRS of the sf object you are interacting it with.

Example

Let’s confirm the transformation:

Check the CRS
Transform

Check the CRS of Fairway_Five.

Find the EPSG code for WGS 84, and change the CRS of Fairway_Five to WGS 84 using the EPSG code.

Quick Visualization

Quick view using plot()
Interactive view using mapView()

The easiest way to visualize an sf object is to use plot():

plot() create a map for each variable where the spatial units are color-differentiated based on the values of the variable
We will learn how to create more elaborate maps that are of publication-quality using the ggplot2 package later

Sometimes it is useful to be able to tell where certain spatial objects are and what values are associated with them on a map.

The mapView() function from the mapview package can create an interactive map where you can point to a spatial object and the associated information is revealed on the map.

Run the following codes on your computer.

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
mapview::mapView(nc)

Turning a `data.frame` of points into an `sf`

Motivation
Preparation
Making an sf
Exercise

Often times, you have a dataset with geographic coordinates as variables in a csv or other formats
It would not be recognized immediately as a spatial dataset by R when it is read into R.
In this case, you need to identify which variables represent the geographic coordinates from the data set, and create an sf yourself.
Fortunately, it is easy to do so using the sf::st_as_sf() function.

Let’s get a dataset (irrigation wells in Nebraska) to work with:

wells_ne is a data.frame and has longdd and latdd representing longitude and latitude, respectively. Note that it is NOT an sf object.

Important

YOU need to know the GRS/CRS of your data because you need to provide R with that information!
The geographic coordinates system of this data is NAD 83 (epsg=4269) for this dataset.

We can turn a dataset (e.g., data.frame, tibble, data.table) into an sf object using sf::st_as_sf().

Syntax

sf::st_as_sf(
  data.frame, 
  coords = c(
    longitude var name, 
    latitude var name
  ),
  crs = crs
)

Example

Using the LAT (latitude) and LNG (longitude) columns, turn the tibble into an sf, and then assign the CRS of WGS 84 using its EPSG code.

Conversion to and from `sp` objects

Motivation
sf to sp
sp to sf

The sp package is the predecessor of the sf package (developed by the same person)
There are many (older) packages that only accept spatial objects defined by the sp package
- spdep: spatial econometrics
- GWmodel: runs geographically-weighted regression
In that case, it is good to know how to convert an sf object to an sp object, vice versa.

You can convert an sf object to its sp counterpart by as(sf_object, "Spatial")

You can convert an sp object to its sf counterpart by sf::st_as_sf(sp_object).

Non-spatial Transformation of `sf`

Note
dplyr verbs

An important feature of an sf object is that it is basically a data.frame with geometric information stored as a variable (column).
This means that transforming an sf object works just like transforming a data.frame.
Basically, everything you can do to a data.frame, you can do to an sf as well.
dplyr verbs work well with sf

The following code selects wellid variable using dplyr::select():

Notice that geometry column will be retained after dplyr::select() even if you did not tell R to keep it above.

Of course, you can apply other dplyr verbs just like you do with a data.frame. Here, let’s apply dplyr::select(), dplyr::filter(), and dplyr::mutate() in sequence using a piping operator.

Note

You cannot do this with the spatial objects defined by the sp package

Non-interactive geometrical operations

Create buffers
Find area
Find centroid
Calculate distance
Combine (union)

How
Example (non-projected)
Example (projected)

You can use sf::st_buffer() to create buffers of the specified length around points, lines, and polygons.

Syntax

st_buffer(sf, dist = distance)

dist: provide the distance in the unit of the CRS (run st_crs(sf)$units to get the unit)

Let’s create a buffer of 2000 meter.

Here is what it looks like:

Yes, you see zig-zag. You can read up on this here. For now, I would recommend that you project first and then create a buffer.

Let’s first project and then create a buffer:

Here is what it looks like:

Example
Caveat

You can use st_area() to calculate the area of all the polygons in an sf object.

By default, area calculated by st_area() is units.

So, you want to convert it to a numeric variable like this if you want to subject it to numeric operations later:

Example
Matrix of coordinates

You can use st_centroid() to find the centroid of each of all the polygons in an sf object.

As you can see, st_centroid() returns an sf of centroids as points. The centroids look like this:

If you want longitude (X) and latitude (Y) of the centroids, you can further apply st_coordinates().

Of course, you can easily add the XY matrix to the original sf file using cbind():

How
Example

Syntax

st_distance(sf_1, sf_2)

This finds the distance between each of the points in sf_1 and each of the points in sf_2.

Get a matrix of distances whose $[i,j]$ element is the distance between the $i$th sfg of st_centroid(nc[1:5, ]) and $j$th sfg of st_centroid(nc[6:15, ]):

Note

Notice that, even though nc is unprojected, distances returned are in meter.

Sometimes you want to combine all the geometries in a single sf. For example, you may want to get the centroid of North Carolina using nc (of course, you can alternatively get sf of NC state boundary, instead of counties in this case).

You can use sf::st_union() to achieve this. Note that the returned object is sfc, not sf.

Here is what it looks like.

Exercise

Data preparation
Exercise 1
Exercise 2
Exercise 3
Exercise 4

Run the following code to get fairway_grid data. Then, inspect the data (e.g., by plotting) to get a good sense of what the data looks like.

First, plot fairway_grid to get a sense of what the dataset looks like:

Transform fairway_grid so that its CRS is NAD 83/UTM zone 14N (its EPSG code is 26914) and name it fairway_grid_utm.

Answer codes

Code

plot(fairway_grid) 

fairway_grid_utm <- st_transform(fairway_grid, 26914)

Create buffers around the grid polygons in fairway_grid_utm where the radius of the buffer is 10 meter, and name it fairway_grid_buffers.

Answer codes

Code

fairway_grid_buffers <- st_buffer(fairway_grid_utm, dist = 10)

Find the centroid of each of the buffer polygons you created in Exercise 2, and then name it buffers_centroids.

Answer codes

Code

buffers_centroids <- st_centroid(fairway_grid_buffers)

Calculated the distances between the centroids in buffers_centroids.

Answer codes

Code

st_distance(buffers_centroids)

09-1: R as GIS: Vector Data Basics with the sf package

Before you start

Learning objectives

Table of contents

Pre-requisite (Links)

Tips to make the most of the lecture notes

Interactive navigation tools

Running and writing codes

Getting started

Understanding the data structure of `sf`

Simple Feature Geometry (`sfg`)

Create a `POINT`

Create a `LINESTRING`

Create a `POLYGON`

Constructing simple feature column (`sfc`) and simple feature (`sf`)

Reading and writing vector data

Syntax

Example

Syntax

Example

Projection

Example

Quick Visualization

Turning a `data.frame` of points into an `sf`

Syntax

Example

Conversion to and from `sp` objects

Non-spatial Transformation of `sf`

Non-interactive geometrical operations

Exercise

Answer codes

Answer codes

Answer codes

Answer codes

09-1: R as GIS: Vector Data Basics with the sf package

Before you start

Learning objectives

Table of contents

Pre-requisite (Links)

Related contents (Links)

Tips to make the most of the lecture notes

Interactive navigation tools

Running and writing codes

Getting started

Understanding the data structure of sf

Simple Feature Geometry (sfg)

Create a POINT

Create a LINESTRING

Create a POLYGON

Constructing simple feature column (sfc) and simple feature (sf)

Reading and writing vector data

Syntax

Example

Syntax

Example

Projection

Example

Quick Visualization

Turning a data.frame of points into an sf

Syntax

Example

Conversion to and from sp objects

Non-spatial Transformation of sf

Non-interactive geometrical operations

Exercise

Answer codes

Answer codes

Answer codes

Answer codes

Understanding the data structure of `sf`

Simple Feature Geometry (`sfg`)

Create a `POINT`

Create a `LINESTRING`

Create a `POLYGON`

Constructing simple feature column (`sfc`) and simple feature (`sf`)

Turning a `data.frame` of points into an `sf`

Conversion to and from `sp` objects

Non-spatial Transformation of `sf`