5 Spatial Interactions of Vector and Raster Data

Before you start

In this chapter we learn the spatial interactions of a vector and raster dataset. We first look at how to crop (spatially subset) a raster dataset based on the geographic extent of a vector dataset. We then cover how to extract values from raster data for points and polygons. To be precise, here is what we mean by raster data extraction and what it does for points and polygons data:

  • Points: For each of the points, find which raster cell it is located within, and assign the value of the cell to the point.

  • Polygons: For each of the polygons, identify all the raster cells that intersect with the polygon, and assign a vector of the cell values to the polygon

This is probably the most important operation economists run on raster datasets.

We will show how we can use terra::extract() for both cases. But, we will also see that for polygons, exact_extract() from the exactextractr package is often considerably faster than terra::extract().

Finally, you will see conversions between Raster\(^*\) (raster package) objects and SpatRaster object (terra package) because of the incompatibility of object classes across the key packages. I believe that these hassles will go away soon when they start supporting each other.

Direction for replication

Datasets

All the datasets that you need to import are available here. In this chapter, the path to files is set relative to my own working directory (which is hidden). To run the codes without having to mess with paths to the files, follow these steps:

  • set a folder (any folder) as the working directory using setwd()
  • create a folder called “Data” inside the folder designated as the working directory (if you have created a “Data” folder previously, skip this step)
  • download the pertinent datasets from here
  • place all the files in the downloaded folder in the “Data” folder

Packages

Run the following code to install or load (if already installed) the pacman package, and then install or load (if already installed) the listed package inside the pacman::p_load() function.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  terra, # handle raster data
  raster, # handle raster data
  exactextractr, # fast extractions
  sf, # vector data operations
  dplyr, # data wrangling
  tidyr, # data wrangling
  data.table, # data wrangling
  prism, # download PRISM data
  tictoc, # timing codes
  tigris, # to get county sf
  tmap # for mapping
)

5.1 Cropping to the Area of Interest

Here we use PRISM maximum temperature (tmax) data as a raster dataset and Kansas county boundaries as a vector dataset.

Let’s download the tmax data for July 1, 2018 (Figure 5.1).

#--- set the path to the folder to which you save the downloaded PRISM data ---#
# This code sets the current working directory as the designated folder
options(prism.path = "Data")

#--- download PRISM precipitation data ---#
get_prism_dailys(
  type = "tmax",
  date = "2018-07-01",
  keepZip = FALSE
)

#--- the file name of the PRISM data just downloaded ---#
prism_file <- "Data/PRISM_tmax_stable_4kmD2_20180701_bil/PRISM_tmax_stable_4kmD2_20180701_bil.bil"

#--- read in the prism data ---#
prism_tmax_0701_sr <- rast(prism_file)
Map of PRISM tmax data on July 1, 2018

Figure 5.1: Map of PRISM tmax data on July 1, 2018

We now get Kansas county border data from the tigris package (Figure 5.2) as sf.

#--- Kansas boundary (sf) ---#
KS_county_sf <-
  #--- get Kansas county boundary ---
  tigris::counties(state = "Kansas", cb = TRUE) %>%
  #--- sp to sf ---#
  st_as_sf() %>%
  #--- transform using the CRS of the PRISM tmax data  ---#
  st_transform(terra::crs(prism_tmax_0701_sr))
Kansas county boundaries

Figure 5.2: Kansas county boundaries


Sometimes, it is convenient to crop a raster layer to the specific area of interest so that you do not have to carry around unnecessary parts of the raster layer. Moreover, it takes less time to extract values from a raster layer when the size of the raster layer is smaller. You can crop a raster layer by using terra::crop(). It works like this:

#--- syntax (this does not run) ---#
terra::crop(SpatRaster, sf)

So, in this case, this does the job.

#--- crop the entire PRISM to its KS portion---#
prism_tmax_0701_KS_sr <- terra::crop(prism_tmax_0701_sr, KS_county_sf)

The figure below (Figure 5.3) shows the PRISM tmax raster data cropped to the geographic extent of Kansas. Notice that the cropped raster layer extends beyond the outer boundary of Kansas state boundary (it is a bit hard to see, but look at the upper right corner).

PRISM tmax raster data cropped to the geographic extent of Kansas

Figure 5.3: PRISM tmax raster data cropped to the geographic extent of Kansas

5.2 Extracting Values from Raster Layers for Vector Data

In this section, we will learn how to extract information from raster layers for spatial units represented as vector data (points and polygons). For the demonstrations in this section, we use the following datasets:

  • Raster: PRISM tmax data cropped to Kansas state border for 07/01/2018 (obtained in 5.1) and 07/02/2018 (downloaded below)
  • Polygons: Kansas county boundaries (obtained in 5.1)
  • Points: Irrigation wells in Kansas (imported below)

5.2.1 Simple visual illustrations of raster data extraction

Extracting to Points

Figure 5.4 shows visually what we mean by “extract raster values to points.”

Visual illustration of raster data extraction for points data

Figure 5.4: Visual illustration of raster data extraction for points data

In the figure, we have grids (cells) and each grid holds a value (presented at the center). There are three points. We will find which grid the points fall inside and get the associated values and assign them to the points. In this example, Points 1, 2, and 3 will have 50, 4, 54, respectively,

Extracting to Polygons

Figure 5.5 shows visually what we mean by “extract raster values to polygons.”

Visual illustration of raster data extraction for polygons data

Figure 5.5: Visual illustration of raster data extraction for polygons data

There is a polygon overlaid on top of the cells along with their centroids represented by black dots. Extracting raster values to a polygon means finding raster cells that intersect with the polygons and get the value of all those cells and assigns them to the polygon. As you can see some cells are completely inside the polygon, while others are only partially overlapping with the polygon. Depending on the function you use and its options, you regard different cells as being spatially related to the polygons. For example, by default, terra::extract() will extract only the cells whose centroid is inside the polygon. But, you can add an option to include the cells that are only partially intersected with the polygon. In such a case, you can also get the fraction of the cell what is overlapped with the polygon, which enables us to find area-weighted values later. We will discuss the details these below.

PRISM tmax data for 07/02/2018

#--- download PRISM precipitation data ---#
get_prism_dailys(
  type = "tmax",
  date = "2018-07-02",
  keepZip = FALSE
)

#--- the file name of the PRISM data just downloaded ---#
prism_file <- "Data/PRISM_tmax_stable_4kmD2_20180702_bil/PRISM_tmax_stable_4kmD2_20180702_bil.bil"

#--- read in the prism data and crop it to Kansas state border ---#
prism_tmax_0702_KS_sr <- rast(prism_file) %>%
  terra::crop(KS_county_sf)

Irrigation wells in Kansas:

#--- read in the KS points data ---#
(
  KS_wells <- readRDS("Data/Chap_5_wells_KS.rds")
)
Simple feature collection with 37647 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -102.0495 ymin: 36.99552 xmax: -94.62089 ymax: 40.00199
Geodetic CRS:  NAD83
First 10 features:
   well_id                   geometry
1        1 POINT (-100.4423 37.52046)
2        3 POINT (-100.7118 39.91526)
3        5 POINT (-99.15168 38.48849)
4        7 POINT (-101.8995 38.78077)
5        8  POINT (-100.7122 38.0731)
6        9 POINT (-97.70265 39.04055)
7       11 POINT (-101.7114 39.55035)
8       12 POINT (-95.97031 39.16121)
9       15 POINT (-98.30759 38.26787)
10      17 POINT (-100.2785 37.71539)

Here is how the wells are spatially distributed over the PRISM grids and Kansas county borders (Figure 5.6):

Map of Kansas county borders, irrigation wells, and PRISM tmax

Figure 5.6: Map of Kansas county borders, irrigation wells, and PRISM tmax

5.2.2 Extracting to Points

You can extract values from raster layers to points using terra::extract(). terra::extract() finds which raster cell each of the points is located within and assigns the value of the cell to the point.

#--- syntax (this does not run) ---#
terra::extract(raster, points)

Let’s extract tmax values from the PRISM tmax layer (prism_tmax_0701_KS_sr) to the irrigation wells:

tmax_from_prism <- terra::extract(prism_tmax_0701_KS_sr, KS_wells)

Oh, okay. So, the problem is that the points data (KS_wells) is an sf object, and terra::extract() does not like it (I thought this would be fixed by now …). Let’s turn KS_wells into an SpatVect object and then apply terra::extract():

tmax_from_prism <- terra::extract(prism_tmax_0701_KS_sr, vect(KS_wells))

#--- take a look ---#
head(tmax_from_prism)
  ID PRISM_tmax_stable_4kmD2_20180701_bil
1  1                               34.241
2  2                               29.288
3  3                               32.585
4  4                               30.104
5  5                               34.232
6  6                               35.168

The resulting object is a data.frame, where the ID variable represents the order of the observations in the points data and the second column represents that values extracted for the points from the raster cells. So, you can assign the extracted values as a new variable of the points data as follows:

KS_wells$tmax_07_01 <- tmax_from_prism[, -1]

Extracting values from a multi-layer SpatRaster works the same way. Here, we combine prism_tmax_0701_KS_sr and prism_tmax_0702_KS_sr to create a multi-layer SpatRaster and then extract values from them.

#--- create a multi-layer SpatRaster ---#
prism_tmax_stack <- c(prism_tmax_0701_KS_sr, prism_tmax_0702_KS_sr)

#--- extract tmax values ---#
tmax_from_prism_stack <- terra::extract(prism_tmax_stack, vect(KS_wells))

#--- take a look ---#
head(tmax_from_prism_stack)
  ID PRISM_tmax_stable_4kmD2_20180701_bil PRISM_tmax_stable_4kmD2_20180702_bil
1  1                               34.241                               30.544
2  2                               29.288                               29.569
3  3                               32.585                               29.866
4  4                               30.104                               29.819
5  5                               34.232                               30.481
6  6                               35.168                               30.640

We now have two columns that hold values extracted from the raster cells: the 2nd column for the 1st raster layer and the 3rd column for the 2nd raster layer in prism_tmax_stack.

Side note:

Interestingly, terra::extract() can work with sf as long as the raster object is of type Raster\(^*\). In the code below, prism_tmax_stack is converted to RasterStack before terra::extract() does its job and it works just fine.

tmax_from_prism <- terra::extract(as(prism_tmax_stack, "Raster"), KS_wells)
#--- take a look ---#
head(tmax_from_prism)
  ID PRISM_tmax_stable_4kmD2_20180701_bil
1  1                               34.241
2  2                               29.288
3  3                               32.585
4  4                               30.104
5  5                               34.232
6  6                               35.168
#--- check the class ---#
class(tmax_from_prism)
[1] "data.frame"

As you can see, the resulting object is a matrix instead of a data.frame. Now, let’s see if we can use a SpatVect instead of an sf.

tmax_from_prism <-
  terra::extract(
    as(prism_tmax_stack, "Raster"),
    vect(KS_wells)
  )
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'extract' for signature '"RasterBrick", "SpatVector"'

So, it does not work. It turns out that terra::extract() works for the combinations of SpatRaster-SpatVect or Raster\(^*\)-sf/sp.

5.2.3 Extracting to Polygons (terra way)

You can use terra::extract() for extracting raster cell values for polygons as well. For each of the polygons, it will identify all the raster cells whose center lies inside the polygon and assign the vector of values of the cells to the polygon. Let’s first convert KS_county_sf (an sf object) to a SpatVector.

#--- Kansas boundary (SpatVector) ---#
KS_county_sv <- vect(KS_county_sf)

Now, let’s extract tmax values for each of the KS counties.

#--- extract values from the raster for each county ---#
tmax_by_county <- terra::extract(prism_tmax_0701_KS_sr, KS_county_sv)
#--- check the class ---#
class(tmax_by_county)
[1] "data.frame"
#--- take a look ---#
head(tmax_by_county)
  ID PRISM_tmax_stable_4kmD2_20180701_bil
1  1                               34.228
2  1                               34.222
3  1                               34.256
4  1                               34.268
5  1                               34.262
6  1                               34.477
#--- take a look ---#
tail(tmax_by_county)
       ID PRISM_tmax_stable_4kmD2_20180701_bil
12840 105                               34.185
12841 105                               34.180
12842 105                               34.241
12843 105                               34.381
12844 105                               34.295
12845 105                               34.267

So, terra::extract() returns a data.frame, where ID values represent the corresponding row number in the polygons data. For example, the observations with ID == n are for the nth polygon. Using this information, you can easily merge the extraction results to the polygons data. Suppose you are interested in the mean of the tmax values of the intersecting cells for each of the polygons, then you can do the following:

#--- get mean tmax ---#
mean_tmax <-
  tmax_by_county %>%
  group_by(ID) %>%
  summarize(tmax = mean(PRISM_tmax_stable_4kmD2_20180701_bil))

(
  KS_county_sf <-
    #--- back to sf ---#
    st_as_sf(KS_county_sv) %>%
    #--- define ID ---#
    mutate(ID := seq_len(nrow(.))) %>%
    #--- merge by ID ---#
    left_join(., mean_tmax, by = "ID")
)
Simple feature collection with 105 features and 14 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99302 xmax: -94.58841 ymax: 40.00316
Geodetic CRS:  NAD83
First 10 features:
   STATEFP COUNTYFP COUNTYNS       AFFGEOID GEOID         NAME
1       20      123 00485026 0500000US20123 20123     Mitchell
2       20      149 00485038 0500000US20149 20149 Pottawatomie
3       20      203 00485062 0500000US20203 20203      Wichita
4       20      151 00485039 0500000US20151 20151        Pratt
5       20      043 00484991 0500000US20043 20043     Doniphan
6       20      145 00485036 0500000US20145 20145       Pawnee
7       20      119 00485024 0500000US20119 20119        Meade
8       20      071 00485002 0500000US20071 20071      Greeley
9       20      001 00484970 0500000US20001 20001        Allen
10      20      169 00485047 0500000US20169 20169       Saline
              NAMELSAD STUSPS STATE_NAME LSAD      ALAND   AWATER ID     tmax
1      Mitchell County     KS     Kansas   06 1817632931 44979980  1 34.21421
2  Pottawatomie County     KS     Kansas   06 2177507162 54149295  2 32.58354
3       Wichita County     KS     Kansas   06 1861078197    61987  3 34.63195
4         Pratt County     KS     Kansas   06 1903740950  1771939  4 33.40829
5      Doniphan County     KS     Kansas   06 1019105691 12424174  5 34.01768
6        Pawnee County     KS     Kansas   06 1953531647   992041  6 35.54426
7         Meade County     KS     Kansas   06 2533222397  3258258  7 34.47948
8       Greeley County     KS     Kansas   06 2016057907        0  8 33.55797
9         Allen County     KS     Kansas   06 1295764235 13021929  9 34.77049
10       Saline County     KS     Kansas   06 1865429339  2760902 10 33.56413
                         geometry
1  POLYGON ((-98.49007 39.2416...
2  POLYGON ((-96.72833 39.4271...
3  POLYGON ((-101.5673 38.4666...
4  POLYGON ((-99.01535 37.6438...
5  POLYGON ((-95.3399 40, -95....
6  POLYGON ((-99.58479 38.3493...
7  POLYGON ((-100.6526 37.4022...
8  POLYGON ((-102.0453 38.5054...
9  POLYGON ((-95.52522 37.7473...
10 POLYGON ((-97.92865 38.8835...

Instead of finding the mean after applying terra::extract() as done above, you can do that within terra::extract() using the fun option.

#--- extract values from the raster for each county ---#
tmax_by_county <-
  terra::extract(
    prism_tmax_0701_KS_sr,
    KS_county_sv,
    fun = mean
  )
#--- take a look ---#
head(tmax_by_county)
  ID PRISM_tmax_stable_4kmD2_20180701_bil
1  1                             34.21421
2  2                             32.58354
3  3                             34.63195
4  4                             33.40829
5  5                             34.01768
6  6                             35.54426

You can apply other summary functions like min(), max(), sum().


Extracting values from a multi-layer raster data works exactly the same way except that data processing after the value extraction is slightly more complicated.

#--- extract from a multi-layer raster object ---#
tmax_by_county_from_stack <-
  terra::extract(
    prism_tmax_stack,
    KS_county_sv
  )

#--- take a look ---#
head(tmax_by_county_from_stack)
  ID PRISM_tmax_stable_4kmD2_20180701_bil PRISM_tmax_stable_4kmD2_20180702_bil
1  1                               33.290                               29.831
2  1                               33.305                               29.730
3  1                               33.299                               29.811
4  1                               33.257                               29.819
5  1                               33.181                               29.750
6  1                               33.235                               29.757

Similar to the single-layer case, the resulting object is a data.frame and you have two columns corresponding to each of the layers in the two-layer raster object.


Sometimes, you would like to have how much of the raster cells are intersecting with the intersecting polygon to find area-weighted summary later. In that case, you can add exact = TRUE option to terra::extract().

#--- extract from a multi-layer raster object ---#
tmax_by_county_from_stack <-
  terra::extract(
    prism_tmax_stack,
    KS_county_sv,
    exact = TRUE
  )

#--- take a look ---#
head(tmax_by_county_from_stack)
  ID PRISM_tmax_stable_4kmD2_20180701_bil PRISM_tmax_stable_4kmD2_20180702_bil
1  1                               33.348                               29.844
2  1                               33.259                               29.709
3  1                               33.248                               29.590
4  1                               33.222                               29.560
5  1                               33.215                               29.670
6  1                               33.147                               29.599
    fraction
1 0.02375324
2 0.11818764
3 0.11906455
4 0.11701276
5 0.11494852
6 0.11659929

As you can see, now you have fraction column to the resulting data.frame and you can find area-weighted summary of the extracted values like this:

tmax_by_county_from_stack %>%
  group_by(ID) %>%
  summarize(
    tmax_0701 = sum(fraction * PRISM_tmax_stable_4kmD2_20180701_bil) / sum(fraction),
    tmax_0702 = sum(fraction * PRISM_tmax_stable_4kmD2_20180702_bil) / sum(fraction)
  )
# A tibble: 105 × 3
      ID tmax_0701 tmax_0702
   <dbl>     <dbl>     <dbl>
 1     1      33.5      30.1
 2     2      35.7      28.8
 3     3      32.6      30.6
 4     4      32.9      29.0
 5     5      34.8      28.3
 6     6      33.0      30.0
 7     7      34.2      30.9
 8     8      31.7      30.1
 9     9      34.3      29.4
10    10      34.4      30.5
# ℹ 95 more rows

5.2.4 Extracting to Polygons (exactextractr way)

exact_extract() function from the exactextractr package is a faster alternative than terra::extract() for a large raster data as we confirm later (exact_extract() does not work with points data at the moment).79 exact_extract() also provides a coverage fraction value for each of the cell-polygon intersections. The syntax of exact_extract() is very much similar to terra::extract().

#--- syntax (this does not run) ---#
exact_extract(raster, polygons)

exact_extract() can accept both SpatRaster and Raster\(^*\) objects as the raster object. However, while it accepts sf as the polygons object, but it does not accept SpatVect.

Let’s get tmax values from the PRISM raster layer for Kansas county polygons, the following does the job:

library("exactextractr")

#--- extract values from the raster for each county ---#
tmax_by_county <-
  exact_extract(
    prism_tmax_0701_KS_sr,
    KS_county_sf,
    #--- this is for not displaying progress bar ---#
    progress = FALSE
  )

The resulting object is a list of data.frames where ith element of the list is for ith polygon of the polygons sf object.

#--- take a look at the first 6 rows of the first two list elements ---#
tmax_by_county[1:2] %>% lapply(function(x) head(x))
[[1]]
   value coverage_fraction
1 33.348        0.02374719
2 33.259        0.11816581
3 33.248        0.11903353
4 33.222        0.11700913
5 33.215        0.11494034
6 33.147        0.11656883

[[2]]
   value coverage_fraction
1 35.492        0.04248790
2 35.552        0.08910853
3 35.638        0.08570296
4 35.668        0.08580886
5 35.655        0.08667702
6 35.543        0.08706726

For each element of the list, you see value and coverage_fraction. value is the tmax value of the intersecting raster cells, and coverage_fraction is the fraction of the intersecting area relative to the full raster grid, which can help find coverage-weighted summary of the extracted values (like fraction variable when you use terra::extract() with exact = TRUE).

We can take advantage of dplyr::bind_rows() to combine the list of the datasets into a single data.frame. In doing so, you can use .id option to create a new identifier column that links each row to its original data.

(
  #--- combine ---#
  tmax_combined <- bind_rows(tmax_by_county, .id = "id") %>%
    as_tibble()
)
# A tibble: 15,149 × 3
   id    value coverage_fraction
   <chr> <dbl>             <dbl>
 1 1      33.3            0.0237
 2 1      33.3            0.118 
 3 1      33.2            0.119 
 4 1      33.2            0.117 
 5 1      33.2            0.115 
 6 1      33.1            0.117 
 7 1      33.2            0.117 
 8 1      33.2            0.116 
 9 1      33.1            0.112 
10 1      33.1            0.110 
# ℹ 15,139 more rows

data.table users can use rbindlist() with the idcol option like below:

rbindlist(tmax_by_county, idcol = "id")
        id  value coverage_fraction
    1:   1 33.348        0.02374719
    2:   1 33.259        0.11816581
    3:   1 33.248        0.11903353
    4:   1 33.222        0.11700913
    5:   1 33.215        0.11494034
   ---                             
15145: 105 35.028        0.83252323
15146: 105 35.059        0.83167112
15147: 105 35.111        0.83026636
15148: 105 35.093        0.82874411
15149: 105 35.053        0.28677887

Note that id value of i represents ith element of the list, which in turn corresponds to ith polygon in the polygons sf data. Let’s summarize the data by id and then merge it back to the polygons sf data. Here, we calculate coverage-weighted mean of tmax.

#--- weighted mean ---#
(
  tmax_by_id <- tmax_combined %>%
    #--- convert from character to numeric  ---#
    mutate(id = as.numeric(id)) %>%
    #--- group summary ---#
    group_by(id) %>%
    summarise(tmax_aw = sum(value * coverage_fraction) / sum(coverage_fraction))
)
# A tibble: 105 × 2
      id tmax_aw
   <dbl>   <dbl>
 1     1    33.5
 2     2    35.7
 3     3    32.6
 4     4    32.9
 5     5    34.8
 6     6    33.0
 7     7    34.2
 8     8    31.7
 9     9    34.3
10    10    34.4
# ℹ 95 more rows
#--- merge ---#
KS_county_sf %>%
  mutate(id := seq_len(nrow(.))) %>%
  left_join(., tmax_by_id, by = "id") %>%
  dplyr::select(id, tmax_aw)
Simple feature collection with 105 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -102.0517 ymin: 36.99302 xmax: -94.58841 ymax: 40.00316
Geodetic CRS:  NAD83
First 10 features:
   id  tmax_aw                       geometry
1   1 33.53676 POLYGON ((-98.49007 39.2416...
2   2 35.67419 POLYGON ((-96.72833 39.4271...
3   3 32.59908 POLYGON ((-101.5673 38.4666...
4   4 32.92020 POLYGON ((-99.01535 37.6438...
5   5 34.80511 POLYGON ((-95.3399 40, -95....
6   6 32.98675 POLYGON ((-99.58479 38.3493...
7   7 34.16909 POLYGON ((-100.6526 37.4022...
8   8 31.68179 POLYGON ((-102.0453 38.5054...
9   9 34.32246 POLYGON ((-95.52522 37.7473...
10 10 34.37591 POLYGON ((-97.92865 38.8835...

Extracting values from RasterStack works in exactly the same manner as RasterLayer.

tmax_by_county_stack <-
  exact_extract(
    prism_tmax_stack,
    KS_county_sf,
    progress = F
  )

#--- take a look at the first 6 lines of the first element---#
tmax_by_county_stack[[1]] %>% head()
  PRISM_tmax_stable_4kmD2_20180701_bil PRISM_tmax_stable_4kmD2_20180702_bil
1                               34.235                               29.632
2                               34.228                               29.562
3                               34.222                               29.557
4                               34.256                               29.662
5                               34.268                               29.704
6                               34.262                               29.750
  coverage_fraction
1      0.0007206142
2      0.7114530802
3      0.7162888646
4      0.7163895965
5      0.7166024446
6      0.7175790071

As you can see above, exact_extract() appends additional columns for additional layers.

#--- combine them ---#
tmax_all_combined <- bind_rows(tmax_by_county_stack, .id = "id")

#--- take a look ---#
head(tmax_all_combined)
  id PRISM_tmax_stable_4kmD2_20180701_bil PRISM_tmax_stable_4kmD2_20180702_bil
1  1                               34.235                               29.632
2  1                               34.228                               29.562
3  1                               34.222                               29.557
4  1                               34.256                               29.662
5  1                               34.268                               29.704
6  1                               34.262                               29.750
  coverage_fraction
1      0.0007206142
2      0.7114530802
3      0.7162888646
4      0.7163895965
5      0.7166024446
6      0.7175790071

In order to find the coverage-weighted tmax by date, you can first pivot it to a long format using dplyr::pivot_longer().

#--- pivot to a longer format ---#
(
  tmax_long <- pivot_longer(
    tmax_all_combined,
    -c(id, coverage_fraction),
    names_to = "date",
    values_to = "tmax"
  )
)
# A tibble: 30,298 × 4
   id    coverage_fraction date                                  tmax
   <chr>             <dbl> <chr>                                <dbl>
 1 1              0.000721 PRISM_tmax_stable_4kmD2_20180701_bil  34.2
 2 1              0.000721 PRISM_tmax_stable_4kmD2_20180702_bil  29.6
 3 1              0.711    PRISM_tmax_stable_4kmD2_20180701_bil  34.2
 4 1              0.711    PRISM_tmax_stable_4kmD2_20180702_bil  29.6
 5 1              0.716    PRISM_tmax_stable_4kmD2_20180701_bil  34.2
 6 1              0.716    PRISM_tmax_stable_4kmD2_20180702_bil  29.6
 7 1              0.716    PRISM_tmax_stable_4kmD2_20180701_bil  34.3
 8 1              0.716    PRISM_tmax_stable_4kmD2_20180702_bil  29.7
 9 1              0.717    PRISM_tmax_stable_4kmD2_20180701_bil  34.3
10 1              0.717    PRISM_tmax_stable_4kmD2_20180702_bil  29.7
# ℹ 30,288 more rows

And then find coverage-weighted tmax by date:

(
  tmax_long %>%
    group_by(id, date) %>%
    summarize(tmax = sum(tmax * coverage_fraction) / sum(coverage_fraction))
)
# A tibble: 210 × 3
# Groups:   id [105]
   id    date                                  tmax
   <chr> <chr>                                <dbl>
 1 1     PRISM_tmax_stable_4kmD2_20180701_bil  34.2
 2 1     PRISM_tmax_stable_4kmD2_20180702_bil  29.7
 3 10    PRISM_tmax_stable_4kmD2_20180701_bil  33.6
 4 10    PRISM_tmax_stable_4kmD2_20180702_bil  31.2
 5 100   PRISM_tmax_stable_4kmD2_20180701_bil  35.2
 6 100   PRISM_tmax_stable_4kmD2_20180702_bil  29.8
 7 101   PRISM_tmax_stable_4kmD2_20180701_bil  29.9
 8 101   PRISM_tmax_stable_4kmD2_20180702_bil  30.0
 9 102   PRISM_tmax_stable_4kmD2_20180701_bil  33.3
10 102   PRISM_tmax_stable_4kmD2_20180702_bil  30.2
# ℹ 200 more rows

For data.table users, this does the same:

(
  tmax_all_combined %>%
    data.table() %>%
    melt(id.var = c("id", "coverage_fraction")) %>%
    .[, .(tmax = sum(value * coverage_fraction) / sum(coverage_fraction)), by = .(id, variable)]
)
      id                             variable     tmax
  1:   1 PRISM_tmax_stable_4kmD2_20180701_bil 34.21148
  2:   2 PRISM_tmax_stable_4kmD2_20180701_bil 32.59907
  3:   3 PRISM_tmax_stable_4kmD2_20180701_bil 34.62680
  4:   4 PRISM_tmax_stable_4kmD2_20180701_bil 33.41405
  5:   5 PRISM_tmax_stable_4kmD2_20180701_bil 34.02482
 ---                                                  
206: 101 PRISM_tmax_stable_4kmD2_20180702_bil 30.01027
207: 102 PRISM_tmax_stable_4kmD2_20180702_bil 30.23891
208: 103 PRISM_tmax_stable_4kmD2_20180702_bil 30.16537
209: 104 PRISM_tmax_stable_4kmD2_20180702_bil 30.54567
210: 105 PRISM_tmax_stable_4kmD2_20180702_bil 30.85847

5.3 Extraction speed comparison

Here we compare the extraction speed of raster::extract(), terra::extract(), and exact_extract().

5.3.1 Points: terra::extract() and raster::extract()

terra::extract() uses C++ as the backend. Therefore, it is considerably faster than raster::extract().

#--- terra ---#
KS_wells_sv <- vect(KS_wells)
tic()
temp <- terra::extract(prism_tmax_0701_KS_sr, KS_wells_sv)
toc()
0.028 sec elapsed
#--- raster ---#
prism_tmax_0701_KS_lr <- as(prism_tmax_0701_KS_sr, "Raster")
tic()
temp <- raster::extract(prism_tmax_0701_KS_lr, KS_wells)
toc()
0.11 sec elapsed

As you can see, terra::extract() is much faster. The time differential between the two packages can be substantial as the raster data becomes larger. Until recently, raster::extract() was much much slower. However, recent updates to the package made significant improvement in extraction speed.

5.3.2 Polygons: exact_extract(), terra::extract(), and raster::extract()

terra::extract() is faster than exact_extract() for a relatively small raster data. Let’s time them and see the difference.

#--- terra::extract ---#
tic()
terra_extract_temp <-
  terra::extract(
    prism_tmax_0701_KS_sr,
    KS_county_sv
  )
toc()
0.03 sec elapsed
#--- exact_extract ---#
tic()
exact_extract_temp <-
  exact_extract(
    prism_tmax_0701_KS_sr,
    KS_county_sf,
    progress = FALSE
  )
toc()
0.083 sec elapsed
#--- raster::extract ---#
tic()
raster_extract_temp <-
  raster::extract(
    prism_tmax_0701_KS_lr,
    KS_county_sf
  )
toc()
1.032 sec elapsed

As you can see, raster::extract() is the slowest. terra::extract() is faster than exact_extract(). However, once the raster data becomes larger (or spatially finer), then exact_extact() starts to shine.


Let’s disaggregate the prism data by a factor of 10 to create a much larger raster data.80

#--- disaggregate ---#
(
  prism_tmax_0701_KS_sr_10 <- terra::disagg(prism_tmax_0701_KS_sr, fact = 10)
)
class       : SpatRaster 
dimensions  : 730, 1790, 1  (nrow, ncol, nlyr)
resolution  : 0.004166667, 0.004166667  (x, y)
extent      : -102.0625, -94.60417, 36.97917, 40.02083  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 
source(s)   : memory
name        : PRISM_tmax_stable_4kmD2_20180701_bil 
min value   :                               27.711 
max value   :                               37.656 

The disaggregated PRISM data now has 10 times more rows and columns.

#--- original ---#
dim(prism_tmax_0701_KS_sr)
[1]  73 179   1
#--- disaggregated ---#
dim(prism_tmax_0701_KS_sr_10)
[1]  730 1790    1

Now, let’s compare terra::extrct() and exact_extrct() using the disaggregated data.

#--- terra extract ---#
tic()
terra_extract_temp <-
  terra::extract(
    prism_tmax_0701_KS_sr_10,
    KS_county_sv
  )
toc()
2.064 sec elapsed
#--- exact extract ---#
tic()
exact_extract_temp <-
  exact_extract(
    prism_tmax_0701_KS_sr_10,
    KS_county_sf,
    progress = FALSE
  )
toc()
0.164 sec elapsed

As you can see, exact_extract() is considerably faster. The difference in time becomes even more pronounced as the size of the raster data becomes larger and the number of polygons are greater. The time difference of several seconds seem nothing, but imagine processing PRISM files for the entire US over 20 years, then you would appreciate the speed of exact_extract().

5.3.3 Single-layer vs multi-layer

Pretend that you have five dates of PRISM tmax data (here we repeat the same file five times) and would like to extract values from all of them. Extracting values from a multi-layer raster objects (RasterStack for raster package) takes less time than extracting values from the individual layers one at a time. This can be observed below.


terra::extract()

#--- extract from 5 layers one at a time ---#
tic()
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
toc()
11.424 sec elapsed
#--- extract from a 5-layer SpatRaster ---#
tic()
prism_tmax_ml_5 <-
  c(
    prism_tmax_0701_KS_sr_10,
    prism_tmax_0701_KS_sr_10,
    prism_tmax_0701_KS_sr_10,
    prism_tmax_0701_KS_sr_10,
    prism_tmax_0701_KS_sr_10
  )
temp <- terra::extract(prism_tmax_ml_5, KS_county_sv)
toc()
3.082 sec elapsed

exact_extract()

#--- extract from 5 layers one at a time ---#
tic()
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
toc()
0.88 sec elapsed
#--- extract from from a 5-layer SpatRaster ---#
tic()
prism_tmax_stack_5 <-
  c(
    prism_tmax_0701_KS_sr_10,
    prism_tmax_0701_KS_sr_10,
    prism_tmax_0701_KS_sr_10,
    prism_tmax_0701_KS_sr_10,
    prism_tmax_0701_KS_sr_10
  )
temp <-
  exact_extract(
    prism_tmax_stack_5,
    KS_county_sf,
    progress = FALSE
  )
toc()
0.269 sec elapsed

The reduction in computation time for both methods makes sense. Since both layers have exactly the same geographic extent and resolution, finding the polygons-cells correspondence is done once and then it can be used repeatedly across the layers for the multi-layer SparRaster and RasterStack. This clearly suggests that when you are processing many layers of the same spatial resolution and extent, you should first stack them and then extract values at the same time instead of processing them one by one as long as your memory allows you to do so.