5.3 Extraction speed comparison
Here we compare the extraction speed of raster::extract()
, terra::extract()
, and exact_extract()
.
5.3.1 Points: terra::extract()
and raster::extract()
exact_extract()
uses C++ as the backend. Therefore, it is considerably faster than raster::extract()
.
0.007 sec elapsed
0.323 sec elapsed
As you can see, terra::extract()
is much faster. The time differential between the two packages can be substantial as the raster data becomes larger.
5.3.2 Polygons: exact_extract()
, terra::extract()
, and raster::extract()
terra::extract()
is faster than exact_extract()
for a relatively small raster data. Let’s time them and see the difference.
library(tictoc)
#--- terra extract ---#
tic()
terra_extract_temp <- terra::extract(prism_tmax_0701_KS_sr, KS_county_sv, progress = FALSE)
toc()
0.046 sec elapsed
#--- exact extract ---#
tic()
exact_extract_temp <- exact_extract(prism_tmax_0701_KS_rl, KS_county_sf, progress = FALSE)
toc()
0.363 sec elapsed
#--- raster::extract ---#
tic()
raster_extract_temp <- raster::extract(prism_tmax_0701_KS_rl, KS_county_sf)
toc()
2.862 sec elapsed
As you can see, raster::extract()
is by far the slowest. terra::extract()
is faster than exact_extract()
. However, once the raster data becomes larger (or spatially finer), then exact_extact()
starts to shine.
Let’s disaggregate the prism data by a factor of 10 to create a much larger raster data.85
#--- disaggregate ---#
(
prism_tmax_0701_KS_sr_10 <- terra::disaggregate(prism_tmax_0701_KS_sr, fact = 10)
)
class : SpatRaster
dimensions : 730, 1790, 1 (nrow, ncol, nlyr)
resolution : 0.004166667, 0.004166667 (x, y)
extent : -102.0625, -94.60417, 36.97917, 40.02083 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=NAD83 +no_defs
data source : memory
names : PRISM_tmax_stable_4kmD2_20180701_bil
min values : 27.711
max values : 37.656
#--- convert the disaggregated PRISM data to RasterLayer ---#
prism_tmax_0701_KS_rl_10 <- raster(prism_tmax_0701_KS_sr_10)
The disaggregated PRISM data now has 10 times more rows and columns (see below).
[1] 73 179 1
[1] 730 1790 1
Now, let’s compare terra::extrct()
and exact_extrct()
using the disaggregated data.
#--- terra extract ---#
tic()
terra_extract_temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
toc()
3.741 sec elapsed
#--- exact extract ---#
tic()
exact_extract_temp <- exact_extract(prism_tmax_0701_KS_rl_10, KS_county_sf, progress = FALSE)
toc()
0.572 sec elapsed
As you can see, exact_extract()
is considerably faster. The difference in time becomes even more pronounced as the size of the raster data becomes larger and the number of polygons are greater. The time difference of several seconds seem nothing, but imagine processing PRISM files for the entire US over 20 years, then you would appreciate the speed of exact_extract()
.
5.3.3 Single-layer vs multi-layer
Pretend that you have five dates of PRISM tmax data (here we repeat the same file five times) and would like to extract values from all of them. Extracting values from a multi-layer raster objects (RasterStack
for raster
package) takes less time than extracting values from the individual layers one at a time. This can be observed below.
terra::extract()
#--- extract from 5 layers one at a time ---#
tic()
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
temp <- terra::extract(prism_tmax_0701_KS_sr_10, KS_county_sv)
toc()
18.987 sec elapsed
#--- extract from a 5-layer stack ---#
prism_tmax_ml_5 <- c(
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10,
prism_tmax_0701_KS_sr_10
)
tic()
temp <- terra::extract(prism_tmax_ml_5, KS_county_sv)
toc()
4.388 sec elapsed
exact_extract()
#--- extract from 5 layers one at a time ---#
tic()
temp <- exact_extract(prism_tmax_0701_KS_rl_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_rl_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_rl_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_rl_10, KS_county_sf, progress = FALSE)
temp <- exact_extract(prism_tmax_0701_KS_rl_10, KS_county_sf, progress = FALSE)
toc()
2.868 sec elapsed
#--- extract from from a 5-layer stack ---#
prism_tmax_stack_5 <- stack(
prism_tmax_0701_KS_rl_10,
prism_tmax_0701_KS_rl_10,
prism_tmax_0701_KS_rl_10,
prism_tmax_0701_KS_rl_10,
prism_tmax_0701_KS_rl_10
)
tic()
temp <- exact_extract(prism_tmax_stack_5, KS_county_sf, progress = FALSE)
toc()
0.882 sec elapsed
The reduction in computation time for both methods makes sense. Since both layers have exactly the same geographic extent and resolution, finding the polygons-cells correspondence is done once and then it can be used repeatedly across the layers for the multi-layer SparRaster
and RasterStack
. This clearly suggests that when you are processing many layers of the same spatial resolution and extent, you should first stack them and then extract values at the same time instead of processing them one by one as long as your memory allows you to do so.
We did not introduce this function as it is very rare that you need this function in research projects.↩︎