The objectives of this chapter is to learn how to use R as GIS, specifically how to handle vector spatial data.
sf
sfg
)sfc
) and simple feature (sf
)sf
sp
objectssf
ggplot2
Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides
The sf
package provides a simply way of storing geographic information and the attributes of the geographic units in a single dataset called simple feature (sf
).
The sf
package allows you to do almost all the spatial operations you would need for your research
`
Read the North Carolina county boundary data:
Check the class:
sf
The first line tells you this is an simple feature (sf
) object with 100 features and 3 attributes (fields)
So, an sf
object looks just like a data.frame
where rows representing observation units, columns representing attributes, except for a special column named geometry
The geometry
column stores the geographic information of the observation units (here, county)
Example
Ashe County (1st row) has area of 0.114, FIPS code of 37009, and so on. And the entry in geometry column at the first row represents the geographic information of Ashe County.
An element of the geometry
columns is a simple feature geometry (sfg
).
In general, sfg
represents the geographic information of a single geometric feature (here, county).
There are different types of sfg
s (POINT, LINESTRING, POLYGON, MULTIPOLYGON, etc)
In this example, all the sfg
s are of type MULTIPOLYGON
A collection of multiple sfg
s as a column is called simple feature geometry column (sfc
), which can make a geometry
column in an sf
object
Let’s see what an sfg
is made of.
Each row represents a point
Points are stored in a matrix format
Connecting all the points forms a polygon
sfg
)Some of the moist common types of spatial objects represented by sfg
are the following:
POINT
: area-less feature that represents a point (e.g., well, city, farmland)
LINESTRING
: (e.g., a tributary of a river)
MULTILINESTRING
: (e.g., river with more than one tributary)
POLYGON
: geometry with a positive area (e.g., county, state, country)
MULTIPOLYGON
: collection of polygons to represent a single object (e.g., countries with islands: U.S., Japan, etc)
POINT
is the simplest geometry type and is represented by a vector of two numeric values. An example below shows how a POINT
feature can be made from scratch:
The st_point()
function creates a POINT
object when supplied with a vector of two numeric values. If you check the class of the newly created object,
you can see that it’s indeed a POINT
object. But, it’s also an sfg
object. So, a_point
is an sfg
object of type POINT
.
A LINESTRING
object is represented by a sequence of points in a matrix:
You can turn the matrix into a LINESTRING using sf::st_linestring()
:
Let’s plot it.
As you can see, each pair of consecutive points in the matrix are connected by a straight line to form a line.
Just like the LINESTRING
object we created earlier, a POLYGON
is represented by a collection of points.
However, the first and last points in the matrix have to be the same to form a polygon
You can turn the matrix into a POLYGON using st_polygon()
, which takes a matrix in a list()
:
Let’s plot it.
A POLYGON
can have holes in it. The first matrix of a list becomes the exterior ring, and all the subsequent matrices will be holes within the exterior ring.
Let’s plot it.
To create a MULTIPOLYGON
object you create a list of lists of matrices, with each inner list representing a polygon.
You supply a list of lists of matrices to the st_multipolygon()
function to make a MULTIPOLYGON
object.
Each of list(p1,p2)
and list(p3)
represents a polygon.
POINT
LINESTRING
POLYGON
sfc
) and simple feature (sf
)sfg
is an object class that represents a single spatial object.
We can combine multiple sfg
s as a list to create a simple feature geometry list-column (sfc
).
To make a simple feature geometry list-column (sfc
), you can simply supply a list of sfg
to the st_sfc()
function as follows:
Check its class:
To create an sf
object, you first add an sfc
as a column to a data.frame
.
At this point, it is not yet recognized as an sf
by R yet.
You can register it as an sf
object using st_as_sf()
.
As you can see sf_ex
is now recognized also as an sf
object.
Create an sfc
using the POINT
and POLYGON
you made earlier.
Create an sf
object using the sfc
object you created in the previous exercise, where the additional variable in the sf
object is id
with the POINT
and POLYGON
assigned id = 1
and id = 2
, respectively.
The vast majority of people still use ArcGIS software to handle spatial data, which has its own system of storing spatial data called shapefile system.
shapefile is a collection of files including
.shp
: stores geometry information (like sfg
).prj
: projection informationChances are that your collaborators use shapefiles.
There are many GIS data online that are available only as shapefiles.
So, it is important to learn how to read and write shapefiles
You can use sf::st_read()
to read a shapefile. It reads in a shapefile and then turn the data into an sf object.
file_path
: the path to the shapefile.Here, it is reading a file named nc_practice.shp
(along with other related files) is read from the Data
folder.
Try yourself
sf::st_read()
You can use the sf::st_write()
function to write an sf
object to shape files.
append = FALSE
forces writing the data when the shape files with the same name already existsThis code will export an sf
object called nc_imported
as nc_exported.shp
(along with other supporting files) in the “Data” folder relative to the working directory.
Try yourself
sf::st_write()
using whatever name you likeIf your collaborators are using ArcGIS and demanding that they need a shapefile for their work, sure you can write to a shapefile.
But, there is really no need to work with the shapefile system if you are not using ArcGIS.
Basically, we are using the file system just because ArcGIS is the pioneer of GIS software and many people are still using it, but not because it is the best format available to store spatial objects.
Indeed, there are some limitations to shape files (see here).
But, first and foremost, it is annoying to have many files for a single spatial object.
A format that is increasingly popular is GeoJSON.
Unlike the shapefile system, it produces only a single file with .geojson extension.
GeoJSON files can also be read into ArcGIS.
Write
To write an sf
object to a GeoJSON file, you simply give the file path to the dsn
option (note that you do not use the layer
option unlike the shape files case).
Read
You can use the sf::st_read()
function to read a GeoJSON file like below:
One of the alternative data formats that is considered superior to the shapefile system is GeoPackage, which overcomes various limitations associated with shapefile.
Unlike the shapefile system, it produces only a single file with .gpkg extension.
GeoPackage files can also be read into ArcGIS.
Write
To write an sf
object to a GeoPackage file, you simply give the file path to the dsn
option (note that you do not use the layer
option unlike the shape files case).
Read
You can use the sf::st_read()
function to read a GeoPackage file like below:
Or better yet, if your collaborator uses R (or if it is only you who is going to use the data), then just save the sf
object as an .rds file using saveRDS()
, which can be of course read using readRDS()
.
Save
Read
Note
The use of rds files can be particularly attractive when the dataset is large because rds files are typically more memory efficient than shape files.
You often need to project or re-project an sf
using a different coordinate reference system (CRS) because you need it to have the same CRS as an sf
object that you are interacting it with (spatial join) or mapping it with.
In order to check the current CRS for an sf
object, you can use the sf::st_crs()
function.
sf
uses the Well Known Text format to store the coordinate reference system (CRS), which is one of many many formats to store CRS information (See here)
ID["EPSG", 4267]
means that the EPSG code for this CRS is 4267
When you transform an sf
using a different CRS, you can use its EPSG number if the CRS has an EPSG number
You can use sf::st_transform()
to apply a different projection method to an sf
object.
Syntax
Let’s transform (reproject) the data using NAD83 / UTM zone 14N
CRS. Its EPSG number is 26914
.
Let’s confirm the change in CRS:
Let’s compare the geometry column before and after the transformation (projection):
There is a function that sets CRS, namely sf::st_set_crs()
.
This function literally sets the CRS, but does not transform geometry accordingly unlike sf::st_transform()
.
So, doing this is a terrible mistake and the resulting sf
object is no longer where it should be.
You often need to change the CRS of an sf
object when you interact (e.g., spatial subsetting, joining, etc) it with another sf
object.
In such a case, you can extract the CRS of the other sf
object using st_crs()
and use it for transformation.
So, you do not need to find the EPSG of the CRS of the sf
object you are interacting it with.
Let’s confirm the transformation:
Check the CRS of Fairway_Five
.
Find the EPSG code for WGS 84, and change the CRS of Fairway_Five
to WGS 84 using the EPSG code.
The easiest way to visualize an sf
object is to use plot()
:
plot()
create a map for each variable where the spatial units are color-differentiated based on the values of the variable
We will learn how to create more elaborate maps that are of publication-quality using the ggplot2
package later
Sometimes it is useful to be able to tell where certain spatial objects are and what values are associated with them on a map.
The mapView()
function from the mapview
package can create an interactive map where you can point to a spatial object and the associated information is revealed on the map.
Run the following codes on your computer.
data.frame
of points into an sf
Often times, you have a dataset with geographic coordinates as variables in a csv or other formats
It would not be recognized immediately as a spatial dataset by R when it is read into R.
In this case, you need to identify which variables represent the geographic coordinates from the data set, and create an sf
yourself.
Fortunately, it is easy to do so using the sf::st_as_sf()
function.
Let’s get a dataset (irrigation wells in Nebraska) to work with:
wells_ne
is a data.frame
and has longdd
and latdd
representing longitude and latitude, respectively. Note that it is NOT an sf
object.
Important
We can turn a dataset (e.g., data.frame
, tibble
, data.table
) into an sf
object using sf::st_as_sf()
.
Using the LAT
(latitude) and LNG
(longitude) columns, turn the tibble
into an sf
, and then assign the CRS of WGS 84 using its EPSG code.
sp
objectsThe sp
package is the predecessor of the sf
package (developed by the same person)
There are many (older) packages that only accept spatial objects defined by the sp
package
spdep
: spatial econometricsGWmodel
: runs geographically-weighted regressionIn that case, it is good to know how to convert an sf
object to an sp
object, vice versa.
You can convert an sf
object to its sp
counterpart by as(sf_object, "Spatial")
You can convert an sp
object to its sf
counterpart by sf::st_as_sf(sp_object)
.
sf
An important feature of an sf
object is that it is basically a data.frame
with geometric information stored as a variable (column).
This means that transforming an sf
object works just like transforming a data.frame
.
Basically, everything you can do to a data.frame
, you can do to an sf
as well.
dplyr
verbs work well with sf
The following code selects wellid
variable using dplyr::select()
:
Notice that geometry
column will be retained after dplyr::select()
even if you did not tell R to keep it above.
Of course, you can apply other dplyr
verbs just like you do with a data.frame
. Here, let’s apply dplyr::select()
, dplyr::filter()
, and dplyr::mutate()
in sequence using a piping operator.
Note
You cannot do this with the spatial objects defined by the sp
package
You can use sf::st_buffer()
to create buffers of the specified length around points, lines, and polygons.
Syntax
dist
: provide the distance in the unit of the CRS (run st_crs(sf)$units
to get the unit)Let’s create a buffer of 2000 meter.
Here is what it looks like:
Yes, you see zig-zag. You can read up on this here. For now, I would recommend that you project first and then create a buffer.
Let’s first project and then create a buffer:
Here is what it looks like:
You can use st_centroid()
to find the centroid of each of all the polygons in an sf
object.
As you can see, st_centroid()
returns an sf
of centroids as points. The centroids look like this:
If you want longitude (X) and latitude (Y) of the centroids, you can further apply st_coordinates()
.
Of course, you can easily add the XY matrix to the original sf
file using cbind()
:
Syntax
This finds the distance between each of the points in sf_1
and each of the points in sf_2
.
Get a matrix of distances whose \([i,j]\) element is the distance between the \(i\)th sfg
of st_centroid(nc[1:5, ])
and \(j\)th sfg
of st_centroid(nc[6:15, ])
:
Note
Notice that, even though nc
is unprojected, distances returned are in meter.
Sometimes you want to combine all the geometries in a single sf
. For example, you may want to get the centroid of North Carolina using nc
(of course, you can alternatively get sf
of NC state boundary, instead of counties in this case).
You can use sf::st_union()
to achieve this. Note that the returned object is sfc
, not sf
.
Here is what it looks like.
Run the following code to get fairway_grid
data. Then, inspect the data (e.g., by plotting) to get a good sense of what the data looks like.
First, plot fairway_grid
to get a sense of what the dataset looks like:
Transform fairway_grid
so that its CRS is NAD 83/UTM zone 14N (its EPSG code is 26914) and name it fairway_grid_utm
.
Create buffers around the grid polygons in fairway_grid_utm
where the radius of the buffer is 10 meter, and name it fairway_grid_buffers
.
Find the centroid of each of the buffer polygons you created in Exercise 2, and then name it buffers_centroids
.