Before you start

In this chapter, we learn how to parallelize raster data extraction for polygons data. We do not cover parallelization of raster data extraction for points data because it is very fast. Thus, the repeated raster data extractions for points is unlikely to be a bottleneck in your work. We first start with parallelizing data extraction from a single-layer raster data. We then move on to a multi-layer raster data case.

There are different ways of parallelizing the same extraction process. We will discuss several parallelization approaches in terms of their speed and memory footprint. You will learn how to parallelize matters. A naive parallelization can actually increase the time of raster data extraction, while a clever parallelization approach can save you hours or even days (depending on the size of the extraction job, of course).

We will use the future.apply and parallel packages for parallelization. Basic knowledge of parallelization using these packages is assumed. Those who are not familiar with parallelized looping using lapply() and parallelization using mclapply() (Mac and Linux users only) or future_lapply() (including Windows), see Chapter A first.

Direction for replication

Datasets

All the datasets that you need to import are available here. In this chapter, the path to files is set relative to my own working directory (which is hidden). To run the codes without having to mess with paths to the files, follow these steps:

set a folder (any folder) as the working directory using setwd()
create a folder called “Data” inside the folder designated as the working directory (if you have created a “Data” folder previously, skip this step)
download the pertinent datasets from here
place all the files in the downloaded folder in the “Data” folder

Warning: the folder includes a series of daily PRISM datasets stored by month for 10 years. They amount to \(12.75\) GB of data.

Packages

Run the following code to install or load (if already installed) the pacman package, and then install or load (if already installed) the listed package inside the pacman::p_load() function.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  parallel, # for parallelization
  future.apply, # for parallelization
  terra, # handle raster data
  raster, # handle raster data
  exactextractr, # fast extractions
  sf, # vector data operations
  dplyr, # data wrangling
  data.table, # data wrangling
  prism # download PRISM data
)