Before you start
In this chapter, we learn how to parallelize raster data extraction for polygons data. We do not cover parallelization of raster data extraction for points data because it is very fast. Thus, the repeated raster data extractions for points is unlikely to be a bottleneck in your work. We first start with parallelizing data extraction from a single-layer raster data. We then move on to a multi-layer raster data case.
There are different ways of parallelizing the same extraction process. We will discuss several parallelization approaches in terms of their speed and memory footprint. You will learn how to parallelize matters. A naive parallelization can actually increase the time of raster data extraction, while a clever parallelization approach can save you hours or even days (depending on the size of the extraction job, of course).
We will use the future.apply
and parallel
packages for parallelization. Basic knowledge of parallelization using these packages is assumed. Those who are not familiar with parallelized looping using lapply()
and parallelization using mclapply()
(Mac and Linux users only) or future_lapply()
(including Windows), see Chapter A first.
Direction for replication
Datasets
All the datasets that you need to import are available here. In this chapter, the path to files is set relative to my own working directory (which is hidden). To run the codes without having to mess with paths to the files, follow these steps:
- set a folder (any folder) as the working directory using
setwd()
- create a folder called “Data” inside the folder designated as the working directory (if you have created a “Data” folder previously, skip this step)
- download the pertinent datasets from here
- place all the files in the downloaded folder in the “Data” folder
Warning: the folder includes a series of daily PRISM datasets stored by month for 10 years. They amount to \(12.75\) GB of data.
Packages
Run the following code to install or load (if already installed) the pacman
package, and then install or load (if already installed) the listed package inside the pacman::p_load()
function.
if (!require("pacman")) install.packages("pacman")
::p_load(
pacman# for parallelization
parallel, # for parallelization
future.apply, # handle raster data
terra, # handle raster data
raster, # fast extractions
exactextractr, # vector data operations
sf, # data wrangling
dplyr, # data wrangling
data.table, # download PRISM data
prism )