5.3 Extraction speed comparison
Here we compare the extraction speed of raster::extract()
, terra::extract()
, and exact_extract()
.
5.3.1 Points: terra::extract()
and raster::extract()
exact_extract()
uses C++ as the backend. Therefore, it is considerably faster than raster::extract()
.
#--- terra ---#
<- vect(KS_wells)
KS_wells_sv tic()
<- terra::extract(prism_tmax_0701_KS_sr, KS_wells_sv)
temp toc()
0.062 sec elapsed
#--- raster ---#
<- as(prism_tmax_0701_KS_sr, "Raster")
prism_tmax_0701_KS_lr tic()
<- raster::extract(prism_tmax_0701_KS_lr, KS_wells)
temp toc()
0.388 sec elapsed
As you can see, terra::extract()
is much faster. The time differential between the two packages can be substantial as the raster data becomes larger.
5.3.2 Polygons: exact_extract()
, terra::extract()
, and raster::extract()
terra::extract()
is faster than exact_extract()
for a relatively small raster data. Let’s time them and see the difference.
#--- terra::extract ---#
tic()
<-
terra_extract_temp ::extract(
terra
prism_tmax_0701_KS_sr,
KS_county_sv
) toc()
0.073 sec elapsed
#--- exact_extract ---#
tic()
<-
exact_extract_temp exact_extract(
prism_tmax_0701_KS_sr,
KS_county_sf, progress = FALSE
) toc()
0.275 sec elapsed
#--- raster::extract ---#
tic()
<-
raster_extract_temp ::extract(
raster
prism_tmax_0701_KS_lr,
KS_county_sf
) toc()
2.859 sec elapsed
As you can see, raster::extract()
is by far the slowest. terra::extract()
is faster than exact_extract()
. However, once the raster data becomes larger (or spatially finer), then exact_extact()
starts to shine.
Let’s disaggregate the prism data by a factor of 10 to create a much larger raster data.75
#--- disaggregate ---#
(<- terra::disagg(prism_tmax_0701_KS_sr, fact = 10)
prism_tmax_0701_KS_sr_10 )
class : SpatRaster
dimensions : 730, 1790, 1 (nrow, ncol, nlyr)
resolution : 0.004166667, 0.004166667 (x, y)
extent : -102.0625, -94.60417, 36.97917, 40.02083 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83
source : memory
name : PRISM_tmax_stable_4kmD2_20180701_bil
min value : 27.711
max value : 37.656
The disaggregated PRISM data now has 10 times more rows and columns.
#--- original ---#
dim(prism_tmax_0701_KS_sr)
[1] 73 179 1
#--- disaggregated ---#
dim(prism_tmax_0701_KS_sr_10)
[1] 730 1790 1
Now, let’s compare terra::extrct()
and exact_extrct()
using the disaggregated data.
#--- terra extract ---#
tic()
<-
terra_extract_temp ::extract(
terra
prism_tmax_0701_KS_sr_10,
KS_county_sv
) toc()
4.608 sec elapsed
#--- exact extract ---#
tic()
<-
exact_extract_temp exact_extract(
prism_tmax_0701_KS_sr_10,
KS_county_sf, progress = FALSE
) toc()
0.348 sec elapsed
As you can see, exact_extract()
is considerably faster. The difference in time becomes even more pronounced as the size of the raster data becomes larger and the number of polygons are greater. The time difference of several seconds seem nothing, but imagine processing PRISM files for the entire US over 20 years, then you would appreciate the speed of exact_extract()
.
5.3.3 Single-layer vs multi-layer
Pretend that you have five dates of PRISM tmax data (here we repeat the same file five times) and would like to extract values from all of them. Extracting values from a multi-layer raster objects (RasterStack
for raster
package) takes less time than extracting values from the individual layers one at a time. This can be observed below.
terra::extract()
#--- extract from 5 layers one at a time ---#
tic()
<- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp toc()
24.179 sec elapsed
#--- extract from a 5-layer SpatRaster ---#
tic()
<-
prism_tmax_ml_5 c(
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10
)<- terra::extract(prism_tmax_ml_5, KS_county_sv)
temp toc()
5.861 sec elapsed
exact_extract()
#--- extract from 5 layers one at a time ---#
tic()
<- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_sr_10, KS_county_sf, progress = FALSE)
temp toc()
1.551 sec elapsed
#--- extract from from a 5-layer SpatRaster ---#
tic()
<-
prism_tmax_stack_5 c(
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10
)<-
temp exact_extract(
prism_tmax_stack_5,
KS_county_sf, progress = FALSE
)toc()
0.503 sec elapsed
The reduction in computation time for both methods makes sense. Since both layers have exactly the same geographic extent and resolution, finding the polygons-cells correspondence is done once and then it can be used repeatedly across the layers for the multi-layer SparRaster
and RasterStack
. This clearly suggests that when you are processing many layers of the same spatial resolution and extent, you should first stack them and then extract values at the same time instead of processing them one by one as long as your memory allows you to do so.
We did not introduce this function as it is very rare that you need this function in research projects.↩︎