3 Spatial Interactions of Vector Data: Subsetting and Joining

Before you start

In this chapter we learn the spatial interactions of two spatial objects. We first look at the topological relations of two spatial objects (how they are spatially related with each other): specifically, st_intersects() and st_is_within_distance(). st_intersects() is particularly important as it is by far the most common topological relation economists will use and also because it is the default topological relation that sf uses for spatial subsetting and spatial joining.

We then follow with spatial subsetting: filtering spatial data by the geographic features of another spatial data. Finally, we will learn spatial joining. Spatial joining is the act of assigning attribute values from one spatial data to another spatial data based on how the two spatial datasets are spatially related (topological relation). This is the most important spatial operation for economists who want to use spatial variables in their econometric analysis. For those who have used the sp package, these operations are akin to sp::over().

Direction for replication

Datasets

All the datasets that you need to import are available here. In this chapter, the path to files is set relative to my own working directory (which is hidden). To run the codes without having to mess with paths to the files, follow these steps:

  • set a folder (any folder) as the working directory using setwd()
  • create a folder called “Data” inside the folder designated as the working directory (if you have created a “Data” folder previously, skip this step)
  • download the pertinent datasets from here
  • place all the files in the downloaded folder in the “Data” folder

Packages

Run the following code to install or load (if already installed) the pacman package, and then install or load (if already installed) the listed package inside the pacman::p_load() function.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  sf, # vector data operations
  dplyr, # data wrangling
  data.table, # data wrangling
  ggplot2, # for map creation
  tmap # for map creation
)

3.1 Topological relations

Before we learn spatial subsetting and joining, we first look at topological relations. Topological relations refer to the way multiple spatial objects are spatially related to one another. You can identify various types of spatial relations using the sf package. Our main focus is on the intersections of spatial objects, which can be found using st_intersects()61. We also briefly cover st_is_within_distance(). If you are interested in other topological relations, run ?geos_binary_pred.


We first create sf objects we are going to use for illustrations.

POINTS

#--- create points ---#
point_1 <- st_point(c(2, 2))
point_2 <- st_point(c(1, 1))
point_3 <- st_point(c(1, 3))

#--- combine the points to make a single  sf of points ---#
(
points <- list(point_1, point_2, point_3) %>% 
  st_sfc() %>% 
  st_as_sf() %>% 
  mutate(point_name = c("point 1", "point 2", "point 3"))
)
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 1 ymin: 1 xmax: 2 ymax: 3
CRS:           NA
            x point_name
1 POINT (2 2)    point 1
2 POINT (1 1)    point 2
3 POINT (1 3)    point 3

LINES

#--- create points ---#
line_1 <- st_linestring(rbind(c(0, 0), c(2.5, 0.5)))
line_2 <- st_linestring(rbind(c(1.5, 0.5), c(2.5, 2)))

#--- combine the points to make a single  sf of points ---#
(
lines <- list(line_1, line_2) %>% 
  st_sfc() %>% 
  st_as_sf() %>% 
  mutate(line_name = c("line 1", "line 2"))
)
Simple feature collection with 2 features and 1 field
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 0 ymin: 0 xmax: 2.5 ymax: 2
CRS:           NA
                            x line_name
1   LINESTRING (0 0, 2.5 0.5)    line 1
2 LINESTRING (1.5 0.5, 2.5 2)    line 2

POLYGONS

#--- create polygons ---#
polygon_1 <- st_polygon(list(
  rbind(c(0, 0), c(2, 0), c(2, 2), c(0, 2), c(0, 0)) 
))

polygon_2 <- st_polygon(list(
  rbind(c(0.5, 1.5), c(0.5, 3.5), c(2.5, 3.5), c(2.5, 1.5), c(0.5, 1.5)) 
))

polygon_3 <- st_polygon(list(
  rbind(c(0.5, 2.5), c(0.5, 3.2), c(2.3, 3.2), c(2, 2), c(0.5, 2.5)) 
))

#--- combine the polygons to make an sf of polygons ---#
(
polygons <- list(polygon_1, polygon_2, polygon_3) %>% 
  st_sfc() %>% 
  st_as_sf() %>% 
  mutate(polygon_name = c("polygon 1", "polygon 2", "polygon 3"))
)
Simple feature collection with 3 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 0 ymin: 0 xmax: 2.5 ymax: 3.5
CRS:           NA
                               x polygon_name
1 POLYGON ((0 0, 2 0, 2 2, 0 ...    polygon 1
2 POLYGON ((0.5 1.5, 0.5 3.5,...    polygon 2
3 POLYGON ((0.5 2.5, 0.5 3.2,...    polygon 3

Figure 3.1 shows how they look:

ggplot() +
  geom_sf(data = polygons, aes(fill = polygon_name), alpha = 0.3) +
  scale_fill_discrete(name = "Polygons") +
  geom_sf(data = lines, aes(color = line_name)) +
  scale_color_discrete(name = "Lines") + 
  geom_sf(data = points, aes(shape = point_name), size = 3) +
  scale_shape_discrete(name = "Points")  
Visualization of the points, lines, and polygons

Figure 3.1: Visualization of the points, lines, and polygons

3.1.1 st_intersects()

This function identifies which sfg object in an sf (or sfc) intersects with sfg object(s) in another sf. For example, you can use the function to identify which well is located within which county. st_intersects() is the most commonly used topological relation. It is important to understand what it does as it is the default topological relation used when performing spatial subsetting and joining, which we will cover later.


points and polygons

st_intersects(points, polygons)
Sparse geometry binary predicate list of length 3, where the predicate
was `intersects'
 1: 1, 2, 3
 2: 1
 3: 2, 3

As you can see, the output is a list of which polygon(s) each of the points intersect with. The numbers 1, 2, and 3 in the first row mean that 1st (polygon 1), 2nd (polygon 2), and 3rd (polygon 3) objects of the polygons intersect with the first point (point 1) of the points object. The fact that point 1 is considered to be intersecting with polygon 2 means that the area inside the border is considered a part of the polygon (of course).

If you would like the results of st_intersects() in a matrix form with boolean values filling the matrix, you can add sparse = FALSE option.

st_intersects(points, polygons, sparse = FALSE)
      [,1]  [,2]  [,3]
[1,]  TRUE  TRUE  TRUE
[2,]  TRUE FALSE FALSE
[3,] FALSE  TRUE  TRUE

lines and polygons

st_intersects(lines, polygons)
Sparse geometry binary predicate list of length 2, where the predicate
was `intersects'
 1: 1
 2: 1, 2

The output is a list of which polygon(s) each of the lines intersect with.


polygons and polygons

For polygons vs polygons interaction, st_intersects() identifies any polygons that either touches (even at a point like polygons 1 and 3) or share some area.

st_intersects(polygons, polygons)
Sparse geometry binary predicate list of length 3, where the predicate
was `intersects'
 1: 1, 2, 3
 2: 1, 2, 3
 3: 1, 2, 3

3.1.2 st_is_within_distance()

This function identifies whether two spatial objects are within the distance you specify as the name suggests62.

Let’s first create two sets of points.

set.seed(38424738)

points_set_1 <- lapply(1:5, function(x) st_point(runif(2))) %>% 
  st_sfc() %>% st_as_sf() %>% 
  mutate(id = 1:nrow(.))

points_set_2 <- lapply(1:5, function(x) st_point(runif(2))) %>% 
  st_sfc() %>% st_as_sf() %>% 
  mutate(id = 1:nrow(.))

Here is how they are spatially distributed (Figure 3.2). Instead of circles of points, their corresponding id (or equivalently row number here) values are displayed.

ggplot() +
  geom_sf_text(data = points_set_1, aes(label = id), color = "red") +
  geom_sf_text(data = points_set_2, aes(label = id), color = "blue") 
The locations of the set of points

Figure 3.2: The locations of the set of points

We want to know which of the blue points (points_set_2) are located within 0.2 from each of the red points (points_set_1). The following figure (Figure 3.3) gives us the answer visually.

#--- create 0.2 buffers around points in points_set_1 ---#
buffer_1 <- st_buffer(points_set_1, dist = 0.2)

ggplot() +
  geom_sf(data = buffer_1, color = "red", fill = NA) +
  geom_sf_text(data = points_set_1, aes(label = id), color = "red") +
  geom_sf_text(data = points_set_2, aes(label = id), color = "blue") 
The blue points within 0.2 radius of the red points

Figure 3.3: The blue points within 0.2 radius of the red points

Confirm your visual inspection results with the outcome of the following code using st_is_within_distance() function.

st_is_within_distance(points_set_1, points_set_2, dist = 0.2)
Sparse geometry binary predicate list of length 5, where the predicate
was `is_within_distance'
 1: 1
 2: (empty)
 3: (empty)
 4: (empty)
 5: 3

3.2 Spatial Subsetting (or Flagging)

Spatial subsetting refers to operations that narrow down the geographic scope of a spatial object (source data) based on another spatial object (target data). We illustrate spatial subsetting using Kansas county borders, the boundary of the High-Plains Aquifer (HPA), and agricultural irrigation wells in Kansas.

First, let’s import all the files we will use in this section.

#--- Kansas county borders ---#
KS_counties <- readRDS("Data/KS_county_borders.rds")

#--- HPA boundary ---#
hpa <- st_read(dsn = "Data", layer = "hp_bound2010") %>%
  .[1, ] %>%
  st_transform(st_crs(KS_counties))

#--- all the irrigation wells in KS ---#
KS_wells <- readRDS("Data/Kansas_wells.rds") %>%
  st_transform(st_crs(KS_counties))

#--- US railroad ---#
rail_roads <- st_read(dsn = "Data", layer = "tl_2015_us_rails") %>%
  st_transform(st_crs(KS_counties)) 

3.2.1 polygons (source) vs polygons (target)

The following map (Figure 3.4) shows the Kansas portion of the HPA and Kansas counties.63

#--- add US counties layer ---#
tm_shape(KS_counties) +
  tm_polygons() +
#--- add High-Plains Aquifer layer ---#
tm_shape(hpa) +
  tm_fill(col = "blue", alpha = 0.3)
Kansas portion of High-Plains Aquifer and Kansas counties

Figure 3.4: Kansas portion of High-Plains Aquifer and Kansas counties

The goal here is to select only the counties that intersect with the HPA boundary. When subsetting a data.frame by specifying the row numbers you would like to select, you can do

#--- this does not run ---#
data.frame[vector of row numbers, ]

Spatial subsetting of sf objects works in a similar syntax:

#--- this does not run ---#
sf_1[sf_2, ]

where you are subsetting sf_1 based on sf_2. Instead of row numbers, you provide another sf object in place. The following code spatially subsets Kansas counties based on the HPA boundary.

counties_in_hpa <- KS_counties[hpa, ]

See the results below in Figure 3.5.

#--- add US counties layer ---#
tm_shape(counties_in_hpa) +
  tm_polygons() +
#--- add High-Plains Aquifer layer ---#
tm_shape(hpa) +
  tm_fill(col = "blue", alpha = 0.3)
The results of spatially subsetting Kansas counties based on HPA boundary

Figure 3.5: The results of spatially subsetting Kansas counties based on HPA boundary

You can see that only the counties that intersect with the HPA boundary remained. This is because when you use the above syntax of sf_1[sf_2, ], the default underlying topological relations is st_intersects(). So, if an object in sf_1 intersects with any of the objects in sf_2 even slightly, then it will remain after subsetting.

You can specify the spatial operation to be used as an option as in

#--- this does not run ---#
sf_1[sf_2, op = topological_relation_type] 

For example, if you only want counties that are completely within the HPA boundary, you can do the following (the map of the results in Figure 3.6):

counties_within_hpa <- KS_counties[hpa, , op = st_within]
#--- add US counties layer ---#
tm_shape(counties_within_hpa) +
  tm_polygons() +
#--- add High-Plains Aquifer layer ---#
tm_shape(hpa) +
  tm_fill(col = "blue", alpha = 0.3)
Kansas counties that are completely within the HPA boundary

Figure 3.6: Kansas counties that are completely within the HPA boundary


Sometimes, you just want to flag whether two spatial objects intersect or not, instead of dropping non-overlapping observations. In that case, you can use st_intersects().

#--- check the intersections of HPA and counties  ---#
intersects_hpa <- st_intersects(KS_counties, hpa, sparse = FALSE)

#--- take a look ---#
head(intersects_hpa)
      [,1]
[1,] FALSE
[2,]  TRUE
[3,] FALSE
[4,]  TRUE
[5,]  TRUE
[6,]  TRUE
#--- assign the index as a variable ---#
KS_counties <- mutate(KS_counties, intersects_hpa  = intersects_hpa)

#--- take a look ---#
dplyr::select(KS_counties, COUNTYFP, intersects_hpa)
Simple feature collection with 105 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99308 xmax: -94.59193 ymax: 40.00308
Geodetic CRS:  NAD83
First 10 features:
   COUNTYFP intersects_hpa                       geometry
1       133          FALSE MULTIPOLYGON (((-95.5255 37...
2       075           TRUE MULTIPOLYGON (((-102.0446 3...
3       123          FALSE MULTIPOLYGON (((-98.48738 3...
4       189           TRUE MULTIPOLYGON (((-101.5566 3...
5       155           TRUE MULTIPOLYGON (((-98.47279 3...
6       129           TRUE MULTIPOLYGON (((-102.0419 3...
7       073          FALSE MULTIPOLYGON (((-96.52278 3...
8       023           TRUE MULTIPOLYGON (((-102.0517 4...
9       089           TRUE MULTIPOLYGON (((-98.50445 4...
10      059          FALSE MULTIPOLYGON (((-95.50827 3...

3.2.2 points (source) vs polygons (target)

The following map (Figure 3.7) shows the Kansas portion of the HPA and all the irrigation wells in Kansas.

tm_shape(KS_wells) +
  tm_symbols(size = 0.1) +
tm_shape(hpa) +
  tm_polygons(col = "blue", alpha = 0.1) 
A map of Kansas irrigation wells and HPA

Figure 3.7: A map of Kansas irrigation wells and HPA

We can select only wells that reside within the HPA boundary using the same syntax as the above example.

KS_wells_in_hpa <- KS_wells[hpa, ]

As you can see in Figure 3.8 below, only the wells that are inside (or intersect with) the HPA remained because the default topological relation is st_intersects().

tm_shape(KS_wells_in_hpa) +
  tm_symbols(size = 0.1) +
tm_shape(hpa) +
  tm_polygons(col = "blue", alpha = 0.1) 
A map of Kansas irrigation wells and HPA

Figure 3.8: A map of Kansas irrigation wells and HPA


If you just want to flag wells that intersects with HPA instead of dropping the non-intersecting wells, use st_intersects():

(
KS_wells <- mutate(KS_wells, in_hpa = st_intersects(KS_wells, hpa, sparse = FALSE))
)
Simple feature collection with 37647 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -102.0495 ymin: 36.99552 xmax: -94.62089 ymax: 40.00199
Geodetic CRS:  NAD83
First 10 features:
   site    af_used                   geometry in_hpa
1     1 232.099948 POINT (-100.4423 37.52046)   TRUE
2     3  13.183940 POINT (-100.7118 39.91526)   TRUE
3     5  99.187052 POINT (-99.15168 38.48849)   TRUE
4     7   0.000000 POINT (-101.8995 38.78077)   TRUE
5     8 145.520499  POINT (-100.7122 38.0731)   TRUE
6     9   3.614535 POINT (-97.70265 39.04055)  FALSE
7    11 188.423543 POINT (-101.7114 39.55035)   TRUE
8    12  77.335960 POINT (-95.97031 39.16121)  FALSE
9    15   0.000000 POINT (-98.30759 38.26787)   TRUE
10   17 167.819034 POINT (-100.2785 37.71539)   TRUE

3.2.3 lines (source) vs polygons (target)

The following map (Figure 3.9) shows the Kansas counties and U.S. railroads.

tm_shape(rail_roads) +
  tm_lines(col = "blue") +
tm_shape(KS_counties) +
  tm_polygons(alpha = 0)  +
  tm_layout(frame = FALSE) 
U.S. railroads and Kansas county boundaries

Figure 3.9: U.S. railroads and Kansas county boundaries

We can select only railroads that intersect with Kansas.

railroads_KS <- rail_roads[KS_counties, ]

As you can see in Figure 3.10 below, only the railroads that intersect with Kansas were selected. Note the lines that go beyond the Kansas boundary are also selected. Remember, the default is st_intersect(). If you would like the lines beyond the state boundary to be cut out but the intersecting parts of those lines to remain, use st_intersection(), which is explained in Chapter 3.4.1.

tm_shape(railroads_KS) +
  tm_lines(col = "blue") +
tm_shape(KS_counties) +
  tm_polygons(alpha = 0)  +
  tm_layout(frame = FALSE) 
Railroads that intersect Kansas county boundaries

Figure 3.10: Railroads that intersect Kansas county boundaries


Unlike the previous two cases, multiple objects (lines) are checked against multiple objects (polygons) for intersection64. Therefore, we cannot use the strategy we took above of returning a vector of TRUE or FALSE using the sparse = TRUE option. Here, we count the number of intersecting counties and then assign TRUE if the number is greater than 0.

#--- check the number of intersecting KS counties ---#
int_mat <- st_intersects(rail_roads, KS_counties) %>% 
  lapply(length) %>% 
  unlist() 

#--- railroads ---#
rail_roads <- mutate(rail_roads, intersect_ks  = int_mat > 0)

#--- take a look ---#
dplyr::select(rail_roads, LINEARID, intersect_ks)
Simple feature collection with 180958 features and 2 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: -165.4011 ymin: 17.95174 xmax: -65.74931 ymax: 65.00006
Geodetic CRS:  NAD83
First 10 features:
      LINEARID intersect_ks                       geometry
1  11020239500        FALSE MULTILINESTRING ((-79.47058...
2  11020239501        FALSE MULTILINESTRING ((-79.46687...
3  11020239502        FALSE MULTILINESTRING ((-79.6682 ...
4  11020239503        FALSE MULTILINESTRING ((-79.46687...
5  11020239504        FALSE MULTILINESTRING ((-79.74031...
6  11020239575        FALSE MULTILINESTRING ((-79.43695...
7  11020239576        FALSE MULTILINESTRING ((-79.47852...
8  11020239577        FALSE MULTILINESTRING ((-79.43695...
9  11020239589        FALSE MULTILINESTRING ((-79.38736...
10 11020239591        FALSE MULTILINESTRING ((-79.53848...

3.2.4 polygons (source) vs points (target)

The following map (Figure 3.11) shows the Kansas counties and irrigation wells in Kansas that overlie HPA.

tm_shape(KS_counties) +
  tm_polygons(alpha = 0) +
tm_shape(KS_wells_in_hpa) +
  tm_symbols(size = 0.1) +
  tm_layout(frame = FALSE)
Kansas county boundaries and wells that overlie the HPA

Figure 3.11: Kansas county boundaries and wells that overlie the HPA

We can select only counties that intersect with at least one well.

KS_counties_intersected <- KS_counties[KS_wells_in_hpa, ]  

As you can see in Figure 3.12 below, only the counties that intersect with at least one well remained.

tm_shape(KS_counties) +
  tm_polygons(col = NA) +
tm_shape(KS_counties_intersected) +
  tm_polygons(col ="blue", alpha = 0.6) +
  tm_layout(frame = FALSE)
Counties that have at least one well

Figure 3.12: Counties that have at least one well


To flag counties that have at least one well, use st_intersects() as follows:

int_mat <- st_intersects(KS_counties, KS_wells_in_hpa) %>% 
  lapply(length) %>% 
  unlist()

#--- railroads ---#
KS_counties <- mutate(KS_counties, intersect_wells  = int_mat > 0)

#--- take a look ---#
dplyr::select(KS_counties, NAME, COUNTYFP, intersect_wells)
Simple feature collection with 105 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99308 xmax: -94.59193 ymax: 40.00308
Geodetic CRS:  NAD83
First 10 features:
        NAME COUNTYFP intersect_wells                       geometry
1     Neosho      133           FALSE MULTIPOLYGON (((-95.5255 37...
2   Hamilton      075            TRUE MULTIPOLYGON (((-102.0446 3...
3   Mitchell      123           FALSE MULTIPOLYGON (((-98.48738 3...
4    Stevens      189            TRUE MULTIPOLYGON (((-101.5566 3...
5       Reno      155            TRUE MULTIPOLYGON (((-98.47279 3...
6     Morton      129            TRUE MULTIPOLYGON (((-102.0419 3...
7  Greenwood      073           FALSE MULTIPOLYGON (((-96.52278 3...
8   Cheyenne      023            TRUE MULTIPOLYGON (((-102.0517 4...
9     Jewell      089            TRUE MULTIPOLYGON (((-98.50445 4...
10  Franklin      059           FALSE MULTIPOLYGON (((-95.50827 3...

3.2.5 Subsetting to a geographic extent (bounding box)

We can use st_crop() to crop spatial objects to a spatial bounding box (extent) of a spatial object. The bounding box of an sf is a rectangle represented by the minimum and maximum of x and y that encompass/contain all the spatial objects in the sf. You can use st_bbox() to find the bounding box of an sf object. Let’s get the bounding box of KS_wells_in_hpa (irrigation wells in Kansas that overlie HPA).

#--- get the bounding box of KS_wells ---#
(
bbox_KS_wells_in_hpa <- st_bbox(KS_wells_in_hpa)  
)
      xmin       ymin       xmax       ymax 
-102.04953   36.99552  -97.33193   40.00199 
#--- check the class ---#
class(bbox_KS_wells_in_hpa)
[1] "bbox"

Visualizing the bounding box (Figure 3.13):

tm_shape(KS_counties) +
  tm_polygons(alpha = 0) +
tm_shape(KS_wells_in_hpa) +
  tm_symbols(size = 0.1) +
tm_shape(st_as_sfc(bbox_KS_wells_in_hpa)) +
  tm_borders(col = "red") +
  tm_layout(frame = NA)
The bounding box of the irrigation wells in Kansas that overlie HPA

Figure 3.13: The bounding box of the irrigation wells in Kansas that overlie HPA

When you use a bounding box to crop an sf objects, you can consider the bounding box as a single polygon.

Let’s crop KS_counties using the bbox of the irrigation wells.

KS_cropped <- st_crop(KS_counties, bbox_KS_wells_in_hpa)  

Here is what the cropped data looks like (Figure 3.14):

tm_shape(KS_counties) +
  tm_polygons(col = NA) +
tm_shape(KS_cropped) +
  tm_polygons(col ="blue", alpha = 0.6) +
  tm_layout(frame = NA)
The bounding box of the irrigation wells in Kansas that overlie HPA

Figure 3.14: The bounding box of the irrigation wells in Kansas that overlie HPA

As you can see, the st_crop() operation cut some counties at the right edge of the bounding box. So, st_crop() is invasive. If you do not like this to happen and want the complete original counties that have at least one well, you can use the subset approach using [, ] after converting the bounding box to an sfc as follows:

KS_complete_counties <- KS_counties[st_as_sfc(bbox_KS_wells_in_hpa), ]  

Here is what the subsetted Kansas county data looks like (Figure 3.15):

tm_shape(KS_counties) +
  tm_polygons(col = NA) +
tm_shape(KS_complete_counties) +
  tm_polygons(col ="blue", alpha = 0.6) +
  tm_layout(frame = NA)
The bounding box of the irrigation wells in Kansas that overlie HPA

Figure 3.15: The bounding box of the irrigation wells in Kansas that overlie HPA

Notice the difference between the result of this operation and the case where we used KS_wells_in_hpa directly to subset KS_counties as shown in Figure 3.12. The current approach includes counties that do not have any irrigation wells inside them.

3.3 Spatial Join

By spatial join, we mean spatial operations that involve all of the following:

  • overlay one spatial layer (target layer) onto another spatial layer (source layer)
  • for each of the observation in the target layer
    • identify which objects in the source layer it geographically intersects (or a different topological relation) with
    • extract values associated with the intersecting objects in the source layer (and summarize if necessary),
    • assign the extracted value to the object in the target layer

For economists, this is probably the most common motivation for using GIS software, with the ultimate goal being to include the spatially joined variables as covariates in regression analysis.

We can classify spatial join into four categories by the type of the underlying spatial objects:

  • vector-vector: vector data (target) against vector data (source)
  • vector-raster: vector data (target) against raster data (source)
  • raster-vector: raster data (target) against vector data (source)
  • raster-raster: raster data (target) against raster data (source)

Among the four, our focus here is the first case. The second case will be discussed in Chapter 5. We will not cover the third and fourth cases in this course because it is almost always the case that our target data is a vector data (e.g., city or farm fields as points, political boundaries as polygons, etc).

Category 1 can be further broken down into different sub categories depending on the type of spatial object (point, line, and polygon). Here, we will ignore any spatial joins that involve lines. This is because objects represented by lines are rarely observation units in econometric analysis nor the source data from which we will extract values.65 Here is the list of the types of spatial joins we will learn.

  1. points (target) against polygons (source)
  2. polygons (target) against points (source)
  3. polygons (target) against polygons (source)

3.3.1 Case 1: points (target) vs polygons (source)

Case 1, for each of the observations (points) in the target data, finds which polygon in the source data it intersects, and then assign the value associated with the polygon to the point66. In order to achieve this, we can use the st_join() function, whose syntax is as follows:

#--- this does not run ---#
st_join(target_sf, source_sf)

Similar to spatial subsetting, the default topological relation is st_intersects()67.

We use the Kansas irrigation well data (points) and Kansas county boundary data (polygons) for a demonstration. Our goal is to assign the county-level corn price information from the Kansas county data to wells. First let me create and add a fake county-level corn price variable to the Kansas county data.

KS_corn_price <- KS_counties %>%  
  mutate(corn_price = seq(3.2, 3.9, length = nrow(.))) %>% 
  dplyr::select(COUNTYFP, corn_price)

Here is the map of Kansas counties color-differentiated by fake corn price (Figure 3.16):

tm_shape(KS_corn_price) + 
  tm_polygons(col = "corn_price") +
  tm_layout(frame = FALSE, legend.outside = TRUE)
Map of county-level fake corn price

Figure 3.16: Map of county-level fake corn price

For this particular context, the following code will do the job:

#--- spatial join ---#
(
KS_wells_County <- st_join(KS_wells, KS_corn_price)
)
Simple feature collection with 37647 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -102.0495 ymin: 36.99552 xmax: -94.62089 ymax: 40.00199
Geodetic CRS:  NAD83
First 10 features:
   site    af_used in_hpa COUNTYFP corn_price                   geometry
1     1 232.099948   TRUE      069   3.556731 POINT (-100.4423 37.52046)
2     3  13.183940   TRUE      039   3.449038 POINT (-100.7118 39.91526)
3     5  99.187052   TRUE      165   3.287500 POINT (-99.15168 38.48849)
4     7   0.000000   TRUE      199   3.644231 POINT (-101.8995 38.78077)
5     8 145.520499   TRUE      055   3.832692  POINT (-100.7122 38.0731)
6     9   3.614535  FALSE      143   3.799038 POINT (-97.70265 39.04055)
7    11 188.423543   TRUE      181   3.590385 POINT (-101.7114 39.55035)
8    12  77.335960  FALSE      177   3.550000 POINT (-95.97031 39.16121)
9    15   0.000000   TRUE      159   3.610577 POINT (-98.30759 38.26787)
10   17 167.819034   TRUE      069   3.556731 POINT (-100.2785 37.71539)

You can see from Figure 3.17 below that all the wells inside the same county have the same corn price value.

tm_shape(KS_counties) +
  tm_polygons() +
tm_shape(KS_wells_County) +
  tm_symbols(col = "corn_price", size = 0.1) +
  tm_layout(frame = FALSE, legend.outside = TRUE)
Map of wells color-differentiated by corn price

Figure 3.17: Map of wells color-differentiated by corn price

3.3.2 Case 2: polygons (target) vs points (source)

Case 2, for each of the observations (polygons) in the target data, find which observations (points) in the source data it intersects, and then assign the values associated with the points to the polygon. We use the same function: st_join()68.

Suppose you are now interested in county-level analysis and you would like to get county-level total groundwater pumping. The target data is KS_counties, and the source data is KS_wells.

#--- spatial join ---#
KS_County_wells <- st_join(KS_counties, KS_wells)

#--- take a look ---#
dplyr::select(KS_County_wells, COUNTYFP, site, af_used)
Simple feature collection with 37652 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99308 xmax: -94.59193 ymax: 40.00308
Geodetic CRS:  NAD83
First 10 features:
    COUNTYFP  site   af_used                       geometry
1        133 53861  17.01790 MULTIPOLYGON (((-95.5255 37...
1.1      133 70592   0.00000 MULTIPOLYGON (((-95.5255 37...
2        075   328 394.04513 MULTIPOLYGON (((-102.0446 3...
2.1      075   336  80.65036 MULTIPOLYGON (((-102.0446 3...
2.2      075   436 568.25359 MULTIPOLYGON (((-102.0446 3...
2.3      075  1007 215.80416 MULTIPOLYGON (((-102.0446 3...
2.4      075  1170   0.00000 MULTIPOLYGON (((-102.0446 3...
2.5      075  1192  77.39120 MULTIPOLYGON (((-102.0446 3...
2.6      075  1249   0.00000 MULTIPOLYGON (((-102.0446 3...
2.7      075  1300 320.22612 MULTIPOLYGON (((-102.0446 3...

As you can see in the resulting dataset, all the unique polygon - point intersecting combinations comprise the observations. For each of the polygons, you will have as many observations as the number of wells that intersect with the polygon. Once you join the two layers, you can find statistics by polygon (county here). Since we want groundwater extraction by county, the following does the job.

KS_County_wells %>% 
  group_by(COUNTYFP) %>% 
  summarize(af_used = sum(af_used, na.rm = TRUE)) 
Simple feature collection with 105 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99308 xmax: -94.59193 ymax: 40.00308
Geodetic CRS:  NAD83
# A tibble: 105 × 3
   COUNTYFP af_used                                                     geometry
   <fct>      <dbl>                                           <MULTIPOLYGON [°]>
 1 001           0  (((-95.51931 37.82026, -95.51897 38.03823, -95.07788 38.037…
 2 003           0  (((-95.50833 38.39028, -95.06583 38.38994, -95.07788 38.037…
 3 005         771. (((-95.56413 39.65287, -95.33974 39.65298, -95.11519 39.652…
 4 007        4972. (((-99.0126 37.47042, -98.46466 37.47101, -98.46493 37.3841…
 5 009       61083. (((-99.03297 38.69676, -98.48611 38.69688, -98.47991 38.681…
 6 011           0  (((-95.08808 37.73248, -95.07969 37.8198, -95.07788 38.0377…
 7 013         480. (((-95.78811 40.00047, -95.78457 40.00046, -95.3399 40.0000…
 8 015         343. (((-97.15248 37.91273, -97.15291 38.0877, -96.84077 38.0856…
 9 017           0  (((-96.83765 38.34864, -96.81951 38.52245, -96.35378 38.521…
10 019           0  (((-96.52487 37.30273, -95.9644 37.29923, -95.96427 36.9992…
# ℹ 95 more rows

Of course, it is just as easy to get other types of statistics by simply modifying the summarize() part.

However, this two-step process can actually be done in one step using aggregate(), in which you specify how you want to aggregate with the FUN option as follows:

#--- mean ---#
aggregate(KS_wells, KS_counties, FUN = mean)
Simple feature collection with 105 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99308 xmax: -94.59193 ymax: 40.00308
Geodetic CRS:  NAD83
First 10 features:
       site    af_used    in_hpa                       geometry
1  62226.50   8.508950 0.0000000 MULTIPOLYGON (((-95.5255 37...
2  35184.64 176.390742 0.4481793 MULTIPOLYGON (((-102.0446 3...
3  40086.82  35.465123 0.0000000 MULTIPOLYGON (((-98.48738 3...
4  40179.41 285.672916 1.0000000 MULTIPOLYGON (((-101.5566 3...
5  51249.39  46.048048 0.9743783 MULTIPOLYGON (((-98.47279 3...
6  33033.13 202.612377 1.0000000 MULTIPOLYGON (((-102.0419 3...
7  29840.40   0.000000 0.0000000 MULTIPOLYGON (((-96.52278 3...
8  28235.82  94.585634 0.9736842 MULTIPOLYGON (((-102.0517 4...
9  36180.06  44.033911 0.3000000 MULTIPOLYGON (((-98.50445 4...
10 40016.00   1.142775 0.0000000 MULTIPOLYGON (((-95.50827 3...
#--- sum ---#
aggregate(KS_wells, KS_counties, FUN = sum)
Simple feature collection with 105 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99308 xmax: -94.59193 ymax: 40.00308
Geodetic CRS:  NAD83
First 10 features:
       site      af_used in_hpa                       geometry
1    124453 1.701790e+01      0 MULTIPOLYGON (((-95.5255 37...
2  12560917 6.297149e+04    160 MULTIPOLYGON (((-102.0446 3...
3   1964254 1.737791e+03      0 MULTIPOLYGON (((-98.48738 3...
4  42389277 3.013849e+05   1055 MULTIPOLYGON (((-101.5566 3...
5  68007942 6.110576e+04   1293 MULTIPOLYGON (((-98.47279 3...
6  15756801 9.664610e+04    477 MULTIPOLYGON (((-102.0419 3...
7    149202 0.000000e+00      0 MULTIPOLYGON (((-96.52278 3...
8  17167377 5.750807e+04    592 MULTIPOLYGON (((-102.0517 4...
9   1809003 2.201696e+03     15 MULTIPOLYGON (((-98.50445 4...
10   160064 4.571102e+00      0 MULTIPOLYGON (((-95.50827 3...

Notice that the mean() function was applied to all the columns in KS_wells, including site id number. So, you might want to select variables you want to join before you apply the aggregate() function like this:

aggregate(dplyr::select(KS_wells, af_used), KS_counties, FUN = mean)
Simple feature collection with 105 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99308 xmax: -94.59193 ymax: 40.00308
Geodetic CRS:  NAD83
First 10 features:
      af_used                       geometry
1    8.508950 MULTIPOLYGON (((-95.5255 37...
2  176.390742 MULTIPOLYGON (((-102.0446 3...
3   35.465123 MULTIPOLYGON (((-98.48738 3...
4  285.672916 MULTIPOLYGON (((-101.5566 3...
5   46.048048 MULTIPOLYGON (((-98.47279 3...
6  202.612377 MULTIPOLYGON (((-102.0419 3...
7    0.000000 MULTIPOLYGON (((-96.52278 3...
8   94.585634 MULTIPOLYGON (((-102.0517 4...
9   44.033911 MULTIPOLYGON (((-98.50445 4...
10   1.142775 MULTIPOLYGON (((-95.50827 3...

3.3.3 Case 3: polygons (target) vs polygons (source)

For this case, st_join(target_sf, source_sf) will return all the unique intersecting polygon-polygon combinations with the information of the polygon from source_sf attached.

We will use county-level corn acres in Iowa in 2018 from USDA NASS69 and Hydrologic Units70 Our objective here is to find corn acres by HUC units based on the county-level corn acres data71.

We first import the Iowa corn acre data:

#--- IA boundary ---#
IA_corn <- readRDS("Data/IA_corn.rds")

#--- take a look ---#
IA_corn
Simple feature collection with 93 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 203228.6 ymin: 4470941 xmax: 736832.9 ymax: 4822687
Projected CRS: NAD83 / UTM zone 15N
First 10 features:
   county_code year  acres                       geometry
1          083 2018 183500 MULTIPOLYGON (((458997 4711...
2          141 2018 167000 MULTIPOLYGON (((267700.8 47...
3          081 2018 184500 MULTIPOLYGON (((421231.2 47...
4          019 2018 189500 MULTIPOLYGON (((575285.6 47...
5          023 2018 165500 MULTIPOLYGON (((497947.5 47...
6          195 2018 111500 MULTIPOLYGON (((459791.6 48...
7          063 2018 110500 MULTIPOLYGON (((345214.3 48...
8          027 2018 183000 MULTIPOLYGON (((327408.5 46...
9          121 2018  70000 MULTIPOLYGON (((396378.1 45...
10         077 2018 107000 MULTIPOLYGON (((355180.1 46...

Here is the map of Iowa counties color-differentiated by corn acres (Figure 3.18):

#--- here is the map ---#
tm_shape(IA_corn) +
  tm_polygons(col = "acres") +
  tm_layout(frame = FALSE, legend.outside = TRUE)
Map of Iowa counties color-differentiated by corn planted acreage

Figure 3.18: Map of Iowa counties color-differentiated by corn planted acreage

Now import the HUC units data:

#--- import HUC units ---#
HUC_IA <- 
  st_read("Data/huc250k.shp") %>% 
  dplyr::select(HUC_CODE) %>% 
  #--- reproject to the CRS of IA ---#
  st_transform(st_crs(IA_corn)) %>% 
  #--- select HUC units that overlaps with IA ---#
  .[IA_corn, ]

Here is the map of HUC units (Figure 3.19):

tm_shape(HUC_IA) +
  tm_polygons() +
  tm_layout(frame = FALSE, legend.outside = TRUE)
Map of HUC units that intersect with Iowa state boundary

Figure 3.19: Map of HUC units that intersect with Iowa state boundary

Here is a map of Iowa counties with HUC units superimposed on top (Figure 3.20):

tm_shape(IA_corn) +
  tm_polygons(col = "acres") +
tm_shape(HUC_IA) +
  tm_polygons(alpha = 0) +
  tm_layout(frame = FALSE, legend.outside = TRUE)
Map of HUC units superimposed on the counties in Iowas

Figure 3.20: Map of HUC units superimposed on the counties in Iowas

Spatial joining will produce the following.

(
HUC_joined <- st_join(HUC_IA, IA_corn)
)
Simple feature collection with 349 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 154941.7 ymin: 4346327 xmax: 773299.8 ymax: 4907735
Projected CRS: NAD83 / UTM zone 15N
First 10 features:
      HUC_CODE county_code year  acres                       geometry
608   10170203         149 2018 226500 POLYGON ((235551.4 4907513,...
608.1 10170203         167 2018 249000 POLYGON ((235551.4 4907513,...
608.2 10170203         193 2018 201000 POLYGON ((235551.4 4907513,...
608.3 10170203         119 2018 184500 POLYGON ((235551.4 4907513,...
621   07020009         063 2018 110500 POLYGON ((408580.7 4880798,...
621.1 07020009         109 2018 304000 POLYGON ((408580.7 4880798,...
621.2 07020009         189 2018 120000 POLYGON ((408580.7 4880798,...
627   10170204         141 2018 167000 POLYGON ((248115.2 4891652,...
627.1 10170204         143 2018 116000 POLYGON ((248115.2 4891652,...
627.2 10170204         167 2018 249000 POLYGON ((248115.2 4891652,...

Each of the intersecting HUC-county combinations becomes an observation with its resulting geometry the same as the geometry of the HUC unit. To see this, let’s take a look at one of the HUC units.

The HUC unit with HUC_CODE ==10170203 intersects with four County.

#--- get the HUC unit with `HUC_CODE ==10170203`  ---#
(
temp_HUC_county <- filter(HUC_joined, HUC_CODE == 10170203)
)
Simple feature collection with 4 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 154941.7 ymin: 4709628 xmax: 248115.2 ymax: 4907735
Projected CRS: NAD83 / UTM zone 15N
      HUC_CODE county_code year  acres                       geometry
608   10170203         149 2018 226500 POLYGON ((235551.4 4907513,...
608.1 10170203         167 2018 249000 POLYGON ((235551.4 4907513,...
608.2 10170203         193 2018 201000 POLYGON ((235551.4 4907513,...
608.3 10170203         119 2018 184500 POLYGON ((235551.4 4907513,...

Figure 3.21 shows the map of the four observations.

tm_shape(temp_HUC_county) +
  tm_polygons() +
  tm_layout(frame = FALSE)
Map of the HUC unit

Figure 3.21: Map of the HUC unit

So, all of the four observations have identical geometry, which is the geometry of the HUC unit, meaning that the st_join() did not leave the information about the nature of the intersection of the HUC unit and the four counties. Again, remember that the default option is st_intersects(), which checks whether spatial objects intersect or not, nothing more. If you are just calculating the simple average of corn acres ignoring the degree of spatial overlaps, this is just fine. However, if you would like to calculate area-weighted average, you do not have sufficient information. You will see how to find area-weighted average in the next subsection.

3.4 Spatial Intersection (cropping join)

Sometimes you face the need to crop spatial objects by polygon boundaries. For example, we found the total length of the railroads inside of each county in Demonstration 4 in Chapter ?? by cutting off the parts of the railroads that extend beyond the boundary of counties. Also, we just saw that area-weighted averages cannot be found using st_join() because it does not provide information about how much area of each HUC unit is intersecting with each of its intersecting counties. If we can get the geometry of the intersecting part of the HUC unit and the county, then we can calculate its area, which in turn allows us to find area-weighted averages of joined attributes. For these purposes, we can use sf::st_intersection(). Below, we first illustrate how st_intersection() works for lines-polygons and polygons-polygons intersections (Note that we use the data we generated in Chapter 3.1). Intersections that involve points using st_intersection() is the same as using st_join() because points are length-less and area-less (nothing to cut). Thus, it is not discussed here.

3.4.1 st_intersection()

While st_intersects() returns the indices of intersecting objects, st_intersection() returns intersecting spatial objects with the non-intersecting parts of the sf objects cut out. Moreover, attribute values of the source sf will be merged to its intersecting sfg in the target sf. We will see how it works for lines-polygons and polygons-polygons cases using the toy examples we used to explain how st_intersects() work. Here is the figure of the lines and polygons (Figure 3.22):

ggplot() +
  geom_sf(data = polygons, aes(fill = polygon_name), alpha = 0.3) +
  scale_fill_discrete(name = "Polygons") +
  geom_sf(data = lines, aes(color = line_name)) +
  scale_color_discrete(name = "Lines") 
Visualization of the points, lines, and polygons

Figure 3.22: Visualization of the points, lines, and polygons


lines and polygons

The following code gets the intersection of the lines and the polygons.

(
intersections_lp <- st_intersection(lines, polygons) %>% 
  mutate(int_name = paste0(line_name, "-", polygon_name))
)
Simple feature collection with 3 features and 3 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 0 ymin: 0 xmax: 2.5 ymax: 2
CRS:           NA
    line_name polygon_name                              x         int_name
1      line 1    polygon 1        LINESTRING (0 0, 2 0.4) line 1-polygon 1
2      line 2    polygon 1   LINESTRING (1.5 0.5, 2 1.25) line 2-polygon 1
2.1    line 2    polygon 2 LINESTRING (2.166667 1.5, 2... line 2-polygon 2

As you can see in the output, each instance of the intersections of the lines and polygons become an observation (line 1-polygon 1, line 2-polygon 1, and line 2-polygon 2). The part of the lines that did not intersect with a polygon is cut out and does not remain in the returned sf. To see this, see Figure 3.23 below:

ggplot() +
  #--- here are all the original polygons  ---#
  geom_sf(data = polygons, aes(fill = polygon_name), alpha = 0.3) +
  #--- here is what is returned after st_intersection ---#
  geom_sf(data = intersections_lp, aes(color = int_name), size = 1.5)
The outcome of the intersections of the lines and polygons

Figure 3.23: The outcome of the intersections of the lines and polygons

This further allows us to calculate the length of the part of the lines that are completely contained in polygons, just like we did in Chapter ??. Note also that the attribute (polygon_name) of the source sf (the polygons) are merged to their intersecting lines. Therefore, st_intersection() is transforming the original geometries while joining attributes (this is why I call this cropping join).


polygons and polygons

The following code gets the intersection of polygon 1 and polygon 3 with polygon 2.

(
intersections_pp <- st_intersection(polygons[c(1,3), ], polygons[2, ]) %>% 
  mutate(int_name = paste0(polygon_name, "-", polygon_name.1))
)
Simple feature collection with 2 features and 3 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 0.5 ymin: 1.5 xmax: 2.3 ymax: 3.2
CRS:           NA
  polygon_name polygon_name.1                              x
1    polygon 1      polygon 2 POLYGON ((2 2, 2 1.5, 0.5 1...
3    polygon 3      polygon 2 POLYGON ((0.5 3.2, 2.3 3.2,...
             int_name
1 polygon 1-polygon 2
3 polygon 3-polygon 2

As you can see in Figure 3.24, each instance of the intersections of polygons 1 and 3 against polygon 2 becomes an observation (polygon 1-polygon 2 and polygon 3-polygon 2). Just like the lines-polygons case, the non-intersecting part of polygons 1 and 3 are cut out and do not remain in the returned sf. We will see later that st_intersection() can be used to find area-weighted values from the intersecting polygons with help from st_area().

ggplot() +
  #--- here are all the original polygons  ---#
  geom_sf(data = polygons, aes(fill = polygon_name), alpha = 0.3) +
  #--- here is what is returned after st_intersection ---#
  geom_sf(data = intersections_pp, aes(fill = int_name))
The outcome of the intersections of polygon 2 and polygons 1 and 3

Figure 3.24: The outcome of the intersections of polygon 2 and polygons 1 and 3

3.4.2 Area-weighted average

Let’s now get back to the example of HUC units and county-level corn acres data we saw in Chapter 3.3. We would like to find area-weighted average of corn acres instead of the simple average of corn acres.

Using st_intersection(), for each of the HUC polygons, we find the intersecting counties, and then divide it into parts based on the boundary of the intersecting polygons.

(
HUC_intersections <- st_intersection(HUC_IA, IA_corn) %>% 
  mutate(huc_county = paste0(HUC_CODE, "-", county_code))
)
Simple feature collection with 349 features and 5 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 203228.6 ymin: 4470941 xmax: 736832.9 ymax: 4822687
Projected CRS: NAD83 / UTM zone 15N
First 10 features:
      HUC_CODE county_code year  acres                       geometry
732   07080207         083 2018 183500 POLYGON ((482923.9 4711643,...
803   07080205         083 2018 183500 POLYGON ((499725.6 4696873,...
826   07080105         083 2018 183500 POLYGON ((461853.2 4682925,...
627   10170204         141 2018 167000 POLYGON ((269364.9 4793311,...
687   10230003         141 2018 167000 POLYGON ((271504.3 4754685,...
731   10230002         141 2018 167000 POLYGON ((267684.6 4790972,...
683   07100003         081 2018 184500 POLYGON ((435951.2 4789348,...
686   07080203         081 2018 184500 MULTIPOLYGON (((459303.2 47...
732.1 07080207         081 2018 184500 POLYGON ((429573.1 4779788,...
747   07100005         081 2018 184500 POLYGON ((421044.8 4772268,...
        huc_county
732   07080207-083
803   07080205-083
826   07080105-083
627   10170204-141
687   10230003-141
731   10230002-141
683   07100003-081
686   07080203-081
732.1 07080207-081
747   07100005-081

The key difference from the st_join() example is that each observation of the returned data is a unique HUC-county intersection. Figure 3.25 below is a map of all the intersections of the HUC unit with HUC_CODE ==10170203 and the four intersecting counties.

tm_shape(filter(HUC_intersections, HUC_CODE == "10170203")) + 
  tm_polygons(col = "huc_county") +
  tm_layout(frame = FALSE)
Intersections of a HUC unit and Iowa counties

Figure 3.25: Intersections of a HUC unit and Iowa counties

Note also that the attributes of county data are joined as you can see acres in the output above. As I said earlier, st_intersection() is a spatial kind of spatial join where the resulting observations are the intersections of the target and source sf objects.

In order to find the area-weighted average of corn acres, you can use st_area() first to calculate the area of the intersections, and then find the area-weighted average as follows:

(
HUC_aw_acres <- HUC_intersections %>% 
  #--- get area ---#
  mutate(area = as.numeric(st_area(.))) %>% 
  #--- get area-weight by HUC unit ---#
  group_by(HUC_CODE) %>% 
  mutate(weight = area / sum(area)) %>% 
  #--- calculate area-weighted corn acreage by HUC unit ---#
  summarize(aw_acres = sum(weight * acres))
)
Simple feature collection with 55 features and 2 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 203228.6 ymin: 4470941 xmax: 736832.9 ymax: 4822687
Projected CRS: NAD83 / UTM zone 15N
# A tibble: 55 × 3
   HUC_CODE aw_acres                                                    geometry
   <chr>       <dbl>                                              <GEOMETRY [m]>
 1 07020009  251185. POLYGON ((421060.3 4797550, 420940.3 4797420, 420860.3 479…
 2 07040008  165000  POLYGON ((602922.4 4817166, 602862.4 4816996, 602772.4 481…
 3 07060001  105234. MULTIPOLYGON (((649333.8 4761761, 648933.7 4761621, 648793…
 4 07060002  140201. MULTIPOLYGON (((593780.7 4816946, 594000.7 4816865, 594380…
 5 07060003  149000  MULTIPOLYGON (((692011.6 4712966, 691981.6 4712956, 691881…
 6 07060004  162123. POLYGON ((652046.5 4718813, 651916.4 4718783, 651786.4 471…
 7 07060005  142428. POLYGON ((734140.4 4642126, 733620.4 4641926, 733520.3 464…
 8 07060006  159635. POLYGON ((673976.9 4651911, 673950.7 4651910, 673887.8 465…
 9 07080101  115572. POLYGON ((666760.8 4558401, 666630.9 4558362, 666510.8 455…
10 07080102  160008. POLYGON ((576151.1 4706650, 575891 4706810, 575741 4706890…
# ℹ 45 more rows