5.3 Extraction speed comparison

Here we compare the extraction speed of raster::extract(), terra::extract(), and exact_extract().

5.3.1 Points: terra::extract() and raster::extract()

exact_extract() uses C++ as the backend. Therefore, it is considerably faster than raster::extract().

#--- terra ---#
KS_wells_sv <- vect(KS_wells)
tic()
temp <- terra::extract(prism_tmax_0701_KS_sr, KS_wells_sv)
toc()
0.062 sec elapsed
#--- raster ---#
prism_tmax_0701_KS_lr <- as(prism_tmax_0701_KS_sr, "Raster")
tic()
temp <- raster::extract(prism_tmax_0701_KS_lr, KS_wells)
toc()
0.388 sec elapsed

As you can see, terra::extract() is much faster. The time differential between the two packages can be substantial as the raster data becomes larger.

5.3.2 Polygons: exact_extract(), terra::extract(), and raster::extract()

terra::extract() is faster than exact_extract() for a relatively small raster data. Let’s time them and see the difference.

#--- terra::extract ---#
tic()
terra_extract_temp <- 
  terra::extract(
    prism_tmax_0701_KS_sr, 
    KS_county_sv
  )  
toc()
0.073 sec elapsed
#--- exact_extract ---#
tic()
exact_extract_temp <- 
  exact_extract(
    prism_tmax_0701_KS_sr, 
    KS_county_sf, 
    progress = FALSE
  )  
toc()
0.275 sec elapsed
#--- raster::extract ---#
tic()
raster_extract_temp <- 
  raster::extract(
    prism_tmax_0701_KS_lr, 
    KS_county_sf
  )  
toc()
2.859 sec elapsed

As you can see, raster::extract() is by far the slowest. terra::extract() is faster than exact_extract(). However, once the raster data becomes larger (or spatially finer), then exact_extact() starts to shine.


Let’s disaggregate the prism data by a factor of 10 to create a much larger raster data.75

#--- disaggregate ---#
(
prism_tmax_0701_KS_sr_10 <- terra::disagg(prism_tmax_0701_KS_sr, fact = 10)
)
class       : SpatRaster 
dimensions  : 730, 1790, 1  (nrow, ncol, nlyr)
resolution  : 0.004166667, 0.004166667  (x, y)
extent      : -102.0625, -94.60417, 36.97917, 40.02083  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 
source      : memory 
name        : PRISM_tmax_stable_4kmD2_20180701_bil 
min value   :                               27.711 
max value   :                               37.656 

The disaggregated PRISM data now has 10 times more rows and columns.

#--- original ---#
dim(prism_tmax_0701_KS_sr)  
[1]  73 179   1
#--- disaggregated ---#
dim(prism_tmax_0701_KS_sr_10)  
[1]  730 1790    1

Now, let’s compare terra::extrct() and exact_extrct() using the disaggregated data.

#--- terra extract ---#
tic()
terra_extract_temp <- 
  terra::extract(
    prism_tmax_0701_KS_sr_10, 
    KS_county_sv
  )  
toc()
4.608 sec elapsed
#--- exact extract ---#
tic()
exact_extract_temp <- 
  exact_extract(
    prism_tmax_0701_KS_sr_10, 
    KS_county_sf, 
    progress = FALSE
  )  
toc()
0.348 sec elapsed

As you can see, exact_extract() is considerably faster. The difference in time becomes even more pronounced as the size of the raster data becomes larger and the number of polygons are greater. The time difference of several seconds seem nothing, but imagine processing PRISM files for the entire US over 20 years, then you would appreciate the speed of exact_extract().

5.3.3 Single-layer vs multi-layer

Pretend that you have five dates of PRISM tmax data (here we repeat the same file five times) and would like to extract values from all of them. Extracting values from a multi-layer raster objects (RasterStack for raster package) takes less time than extracting values from the individual layers one at a time. This can be observed below.


terra::extract()

#--- extract from 5 layers one at a time ---#
tic()
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
toc()
24.179 sec elapsed
#--- extract from a 5-layer SpatRaster ---#
tic()
prism_tmax_ml_5 <- 
  c(
    prism_tmax_0701_KS_sr_10, 
    prism_tmax_0701_KS_sr_10, 
    prism_tmax_0701_KS_sr_10, 
    prism_tmax_0701_KS_sr_10, 
    prism_tmax_0701_KS_sr_10
  )
temp <- terra::extract(prism_tmax_ml_5, KS_county_sv)
toc()
5.861 sec elapsed

exact_extract()

#--- extract from 5 layers one at a time ---#
tic()
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
toc()
1.551 sec elapsed
#--- extract from from a 5-layer SpatRaster ---#
tic()
prism_tmax_stack_5 <- 
  c(
    prism_tmax_0701_KS_sr_10, 
    prism_tmax_0701_KS_sr_10, 
    prism_tmax_0701_KS_sr_10, 
    prism_tmax_0701_KS_sr_10, 
    prism_tmax_0701_KS_sr_10
  )
temp <- 
  exact_extract(
    prism_tmax_stack_5, 
    KS_county_sf, 
    progress = FALSE
  )
toc()
0.503 sec elapsed

The reduction in computation time for both methods makes sense. Since both layers have exactly the same geographic extent and resolution, finding the polygons-cells correspondence is done once and then it can be used repeatedly across the layers for the multi-layer SparRaster and RasterStack. This clearly suggests that when you are processing many layers of the same spatial resolution and extent, you should first stack them and then extract values at the same time instead of processing them one by one as long as your memory allows you to do so.


  1. We did not introduce this function as it is very rare that you need this function in research projects.↩︎