The objectives of this chapter is to learn how to use R as GIS, specifically how to handle vector spatial data.
sfsfg)sfc) and simple feature (sf)sfsp objectssfggplot2Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides
The sf package provides a simply way of storing geographic information and the attributes of the geographic units in a single dataset called simple feature (sf).
The sf package allows you to do almost all the spatial operations you would need for your research
`
Read the North Carolina county boundary data:
Check the class:
sfThe first line tells you this is an simple feature (sf) object with 100 features and 3 attributes (fields)
So, an sf object looks just like a data.frame where rows representing observation units, columns representing attributes, except for a special column named geometry
The geometry column stores the geographic information of the observation units (here, county)
Example
Ashe County (1st row) has area of 0.114, FIPS code of 37009, and so on. And the entry in geometry column at the first row represents the geographic information of Ashe County.
An element of the geometry columns is a simple feature geometry (sfg).
In general, sfg represents the geographic information of a single geometric feature (here, county).
There are different types of sfgs (POINT, LINESTRING, POLYGON, MULTIPOLYGON, etc)
In this example, all the sfgs are of type MULTIPOLYGON
A collection of multiple sfgs as a column is called simple feature geometry column (sfc), which can make a geometry column in an sf object
Let’s see what an sfg is made of.
Each row represents a point
Points are stored in a matrix format
Connecting all the points forms a polygon
sfg)Some of the moist common types of spatial objects represented by sfg are the following:
POINT: area-less feature that represents a point (e.g., well, city, farmland)
LINESTRING: (e.g., a tributary of a river)
MULTILINESTRING: (e.g., river with more than one tributary)
POLYGON: geometry with a positive area (e.g., county, state, country)
MULTIPOLYGON: collection of polygons to represent a single object (e.g., countries with islands: U.S., Japan, etc)
POINT is the simplest geometry type and is represented by a vector of two numeric values. An example below shows how a POINT feature can be made from scratch:
The st_point() function creates a POINT object when supplied with a vector of two numeric values. If you check the class of the newly created object,
you can see that it’s indeed a POINT object. But, it’s also an sfg object. So, a_point is an sfg object of type POINT.
A LINESTRING object is represented by a sequence of points in a matrix:
You can turn the matrix into a LINESTRING using sf::st_linestring():
Let’s plot it.
As you can see, each pair of consecutive points in the matrix are connected by a straight line to form a line.
Just like the LINESTRING object we created earlier, a POLYGON is represented by a collection of points.
However, the first and last points in the matrix have to be the same to form a polygon
You can turn the matrix into a POLYGON using st_polygon(), which takes a matrix in a list() :
Let’s plot it.
A POLYGON can have holes in it. The first matrix of a list becomes the exterior ring, and all the subsequent matrices will be holes within the exterior ring.
Let’s plot it.
To create a MULTIPOLYGON object you create a list of lists of matrices, with each inner list representing a polygon.
You supply a list of lists of matrices to the st_multipolygon() function to make a MULTIPOLYGON object.
Each of list(p1,p2) and list(p3) represents a polygon.
POINTLINESTRINGPOLYGONsfc) and simple feature (sf)sfg is an object class that represents a single spatial object.
We can combine multiple sfgs as a list to create a simple feature geometry list-column (sfc).
To make a simple feature geometry list-column (sfc), you can simply supply a list of sfg to the st_sfc() function as follows:
Check its class:
To create an sf object, you first add an sfc as a column to a data.frame.
At this point, it is not yet recognized as an sf by R yet.
You can register it as an sf object using st_as_sf().
As you can see sf_ex is now recognized also as an sf object.
Create an sfc using the POINT and POLYGON you made earlier.
Create an sf object using the sfc object you created in the previous exercise, where the additional variable in the sf object is id with the POINT and POLYGON assigned id = 1 and id = 2, respectively.
The vast majority of people still use ArcGIS software to handle spatial data, which has its own system of storing spatial data called shapefile system.
shapefile is a collection of files including
.shp: stores geometry information (like sfg).prj: projection informationChances are that your collaborators use shapefiles.
There are many GIS data online that are available only as shapefiles.
So, it is important to learn how to read and write shapefiles
You can use sf::st_read() to read a shapefile. It reads in a shapefile and then turn the data into an sf object.
file_path: the path to the shapefile.Here, it is reading a file named nc_practice.shp (along with other related files) is read from the Data folder.
Try yourself
sf::st_read()You can use the sf::st_write() function to write an sf object to shape files.
append = FALSE forces writing the data when the shape files with the same name already existsThis code will export an sf object called nc_imported as nc_exported.shp (along with other supporting files) in the “Data” folder relative to the working directory.
Try yourself
sf::st_write() using whatever name you likeIf your collaborators are using ArcGIS and demanding that they need a shapefile for their work, sure you can write to a shapefile.
But, there is really no need to work with the shapefile system if you are not using ArcGIS.
Basically, we are using the file system just because ArcGIS is the pioneer of GIS software and many people are still using it, but not because it is the best format available to store spatial objects.
Indeed, there are some limitations to shape files (see here).
But, first and foremost, it is annoying to have many files for a single spatial object.
A format that is increasingly popular is GeoJSON.
Unlike the shapefile system, it produces only a single file with .geojson extension.
GeoJSON files can also be read into ArcGIS.
Write
To write an sf object to a GeoJSON file, you simply give the file path to the dsn option (note that you do not use the layer option unlike the shape files case).
Read
You can use the sf::st_read() function to read a GeoJSON file like below:
One of the alternative data formats that is considered superior to the shapefile system is GeoPackage, which overcomes various limitations associated with shapefile.
Unlike the shapefile system, it produces only a single file with .gpkg extension.
GeoPackage files can also be read into ArcGIS.
Write
To write an sf object to a GeoPackage file, you simply give the file path to the dsn option (note that you do not use the layer option unlike the shape files case).
Read
You can use the sf::st_read() function to read a GeoPackage file like below:
Or better yet, if your collaborator uses R (or if it is only you who is going to use the data), then just save the sf object as an .rds file using saveRDS(), which can be of course read using readRDS().
Save
Read
Note
The use of rds files can be particularly attractive when the dataset is large because rds files are typically more memory efficient than shape files.
You often need to project or re-project an sf using a different coordinate reference system (CRS) because you need it to have the same CRS as an sf object that you are interacting it with (spatial join) or mapping it with.
In order to check the current CRS for an sf object, you can use the sf::st_crs() function.
sf uses the Well Known Text format to store the coordinate reference system (CRS), which is one of many many formats to store CRS information (See here)
ID["EPSG", 4267] means that the EPSG code for this CRS is 4267
When you transform an sf using a different CRS, you can use its EPSG number if the CRS has an EPSG number
You can use sf::st_transform() to apply a different projection method to an sf object.
Syntax
Let’s transform (reproject) the data using NAD83 / UTM zone 14N CRS. Its EPSG number is 26914.
Let’s confirm the change in CRS:
Let’s compare the geometry column before and after the transformation (projection):
There is a function that sets CRS, namely sf::st_set_crs().
This function literally sets the CRS, but does not transform geometry accordingly unlike sf::st_transform().
So, doing this is a terrible mistake and the resulting sf object is no longer where it should be.
You often need to change the CRS of an sf object when you interact (e.g., spatial subsetting, joining, etc) it with another sf object.
In such a case, you can extract the CRS of the other sf object using st_crs() and use it for transformation.
So, you do not need to find the EPSG of the CRS of the sf object you are interacting it with.
Let’s confirm the transformation:
Check the CRS of Fairway_Five.
Find the EPSG code for WGS 84, and change the CRS of Fairway_Five to WGS 84 using the EPSG code.
The easiest way to visualize an sf object is to use plot():
plot() create a map for each variable where the spatial units are color-differentiated based on the values of the variable
We will learn how to create more elaborate maps that are of publication-quality using the ggplot2 package later
Sometimes it is useful to be able to tell where certain spatial objects are and what values are associated with them on a map.
The mapView() function from the mapview package can create an interactive map where you can point to a spatial object and the associated information is revealed on the map.
Run the following codes on your computer.
data.frame of points into an sfOften times, you have a dataset with geographic coordinates as variables in a csv or other formats
It would not be recognized immediately as a spatial dataset by R when it is read into R.
In this case, you need to identify which variables represent the geographic coordinates from the data set, and create an sf yourself.
Fortunately, it is easy to do so using the sf::st_as_sf() function.
Let’s get a dataset (irrigation wells in Nebraska) to work with:
wells_ne is a data.frame and has longdd and latdd representing longitude and latitude, respectively. Note that it is NOT an sf object.
Important
We can turn a dataset (e.g., data.frame, tibble, data.table) into an sf object using sf::st_as_sf().
Using the LAT (latitude) and LNG (longitude) columns, turn the tibble into an sf, and then assign the CRS of WGS 84 using its EPSG code.
sp objectsThe sp package is the predecessor of the sf package (developed by the same person)
There are many (older) packages that only accept spatial objects defined by the sp package
spdep: spatial econometricsGWmodel: runs geographically-weighted regressionIn that case, it is good to know how to convert an sf object to an sp object, vice versa.
You can convert an sf object to its sp counterpart by as(sf_object, "Spatial")
You can convert an sp object to its sf counterpart by sf::st_as_sf(sp_object).
sfAn important feature of an sf object is that it is basically a data.frame with geometric information stored as a variable (column).
This means that transforming an sf object works just like transforming a data.frame.
Basically, everything you can do to a data.frame, you can do to an sf as well.
dplyr verbs work well with sf
The following code selects wellid variable using dplyr::select():
Notice that geometry column will be retained after dplyr::select() even if you did not tell R to keep it above.
Of course, you can apply other dplyr verbs just like you do with a data.frame. Here, let’s apply dplyr::select(), dplyr::filter(), and dplyr::mutate() in sequence using a piping operator.
Note
You cannot do this with the spatial objects defined by the sp package
You can use sf::st_buffer() to create buffers of the specified length around points, lines, and polygons.
Syntax
dist: provide the distance in the unit of the CRS (run st_crs(sf)$units to get the unit)Let’s create a buffer of 2000 meter.
Here is what it looks like:
Yes, you see zig-zag. You can read up on this here. For now, I would recommend that you project first and then create a buffer.
Let’s first project and then create a buffer:
Here is what it looks like:
You can use st_centroid() to find the centroid of each of all the polygons in an sf object.
As you can see, st_centroid() returns an sf of centroids as points. The centroids look like this:
If you want longitude (X) and latitude (Y) of the centroids, you can further apply st_coordinates().
Of course, you can easily add the XY matrix to the original sf file using cbind():
Syntax
This finds the distance between each of the points in sf_1 and each of the points in sf_2.
Get a matrix of distances whose \([i,j]\) element is the distance between the \(i\)th sfg of st_centroid(nc[1:5, ]) and \(j\)th sfg of st_centroid(nc[6:15, ]):
Note
Notice that, even though nc is unprojected, distances returned are in meter.
Sometimes you want to combine all the geometries in a single sf. For example, you may want to get the centroid of North Carolina using nc (of course, you can alternatively get sf of NC state boundary, instead of counties in this case).
You can use sf::st_union() to achieve this. Note that the returned object is sfc, not sf.
Here is what it looks like.
Run the following code to get fairway_grid data. Then, inspect the data (e.g., by plotting) to get a good sense of what the data looks like.
First, plot fairway_grid to get a sense of what the dataset looks like:
Transform fairway_grid so that its CRS is NAD 83/UTM zone 14N (its EPSG code is 26914) and name it fairway_grid_utm.
Create buffers around the grid polygons in fairway_grid_utm where the radius of the buffer is 10 meter, and name it fairway_grid_buffers.
Find the centroid of each of the buffer polygons you created in Exercise 2, and then name it buffers_centroids.