4 Raster Data Handling

Before you start

In this chapter, we will learn how to use the raster and terra package to handle raster data. The raster package has been the most popular and commonly used package for raster data handling. However, we are in the period of transitioning from the raster package to the terra package. The terra package has been under active development to replace the raster package (see the most up-to-date version of the package here). terra is written in C++ and thus is faster than the raster package in many raster data operations. The raster and terra packages share the same function name for many of the raster operations. Key differences will be discussed and will become clear later.

For economists, raster data extraction for vector data will be by far the most common use case of raster data and also the most time-consuming part of the whole raster data handling experience. Therefore, we will introduce only the essential knowledge of raster data operation required to effectively implement the task of extracting values, which will be covered extensively in Chapter 5. For example, we do not cover raster arithmetic, focal operations, or aggregation. Those who are interested in a fuller treatment of the raster or terra package are referred to Spatial Data Science with R and “terra” or Chapters 3, 4, and 5 of Geocomputation with R, respectively.

Even though the terra package is a replacement of the raster package and it has been out on CRAN for more than several years, we still learn the raster object classes defined by the raster package and how to switch between the raster and terra object classes. This is because other useful packages for us economists were written to work with the raster object classes and have still not been adapted to support terra object classes at the moment.

Finally, you might benefit from learning the stars package for raster data operations (covered in Chapter 7), particularly if you often work with raster data with the temporal dimension (e.g., PRISM, Daymet). It provides a data model that makes working with raster data with temporal dimensions easier. It also allows you to apply dplyr verbs for data wrangling.

Direction for replication

Datasets

All the datasets that you need to import are available here. In this chapter, the path to files is set relative to my own working directory (which is hidden). To run the codes without having to mess with paths to the files, follow these steps:

  • set a folder (any folder) as the working directory using setwd()
  • create a folder called “Data” inside the folder designated as the working directory (if you have created a “Data” folder previously, skip this step)
  • download the pertinent datasets from here
  • place all the files in the downloaded folder in the “Data” folder

Packages

Run the following code to install or load (if already installed) the pacman package, and then install or load (if already installed) the listed package inside the pacman::p_load() function.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  terra, # handle raster data
  raster, # handle raster data
  cdlTools, # download CDL data
  mapview, # create interactive maps
  dplyr, # data wrangling
  sf # vector data handling
)

4.1 Raster data object classes

4.1.1 raster package: RasterLayer, RasterStack, and RasterBrick

Let’s start with taking a look at raster data. We will use the CDL data for Iowa in 2015.

class      : RasterLayer 
dimensions : 11671, 17795, 207685445  (nrow, ncol, ncell)
resolution : 30, 30  (x, y)
extent     : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
crs        : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
source     : IA_cdl_2015.tif 
names      : IA_cdl_2015 
values     : 0, 229  (min, max)

Evaluating the imported raster object provides you with information about the raster data, such as dimensions (number of cells, number of columns, number of cells), spatial resolution (30 meter by 30 meter for this raster data), extent, CRS and the minimum and maximum values recorded in this raster layer. The class of the downloaded data is RasterLayer, which is a raster data class defined by the raster package. A RasterLayer consists of only one layer, meaning that only a single variable is associated with the cells (here it is land use category code in integer).


You can stack multiple raster layers of the same spatial resolution and extent to create a RasterStack using raster::stack() or RasterBrick using raster::brick(). Often times, processing a multi-layer object has computational advantages over processing multiple single-layer one by one72.

To create a RasterStack and RasterBrick, let’s load the CDL data for IA in 2016 and stack it with the 2015 data.

class      : RasterStack 
dimensions : 11671, 17795, 207685445, 2  (nrow, ncol, ncell, nlayers)
resolution : 30, 30  (x, y)
extent     : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
crs        : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
names      : IA_cdl_2015, IA_cdl_2016 
min values :           0,           0 
max values :         229,         241 

IA_cdl_stack is of class RasterStack, and it has two layers of variables: CDL for 2015 and 2016. You can make it a RasterBrick using raster::brick():

#--- stack the two ---#
IA_cdl_brick <- brick(IA_cdl_stack)

#--- or this works as well ---#
# IA_cdl_brick <- brick(IA_cdl_2015, IA_cdl_2016)

#--- take a look ---#
IA_cdl_brick

You probably noticed that it took some time to create the RasterBrick object73. While spatial operations on RasterBrick are supposedly faster than RasterStack, the time to create a RasterBrick object itself is often long enough to kill the speed advantage entirely74. Often, the three raster object types are collectively referred to as Raster\(^*\) objects for shorthand in the documentation of the raster and other related packages.

4.1.2 terra package: SpatRaster

terra package has only one object class for raster data, SpatRaster and no distinctions between one-layer and multi-layer rasters is necessary. Let’s first convert a RasterLayer to a SpatRaster using terra::rast() function.

#--- convert to a SpatRaster ---#
IA_cdl_2015_sr <- rast(IA_cdl_2015)

#--- take a look ---#
IA_cdl_2015_sr
class       : SpatRaster 
dimensions  : 11671, 17795, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
source      : IA_cdl_2015.tif 
color table : 1 
name        : IA_cdl_2015 
min value   :           0 
max value   :         229 

You can see that the number of layers (nlyr in dimensions) is \(1\) because the original object is a RasterLayer, which by definition has only one layer. Now, let’s convert a RasterStack to a SpatRaster using terra::rast().

#--- convert to a SpatRaster ---#
IA_cdl_stack_sr <- rast(IA_cdl_stack)

#--- take a look ---#
IA_cdl_stack_sr
class       : SpatRaster 
dimensions  : 11671, 17795, 2  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
sources     : IA_cdl_2015.tif  
              IA_cdl_2016.tif  
color table : 1, 2 
names       : IA_cdl_2015, IA_cdl_2016 
min values  :           0,           0 
max values  :         229,         241 

Again, it is a SpatRaster, and you now see that the number of layers is 2. We just confirmed that terra has only one class for raster data whether it is single-layer or multiple-layer ones.

In order to make multi-layer SpatRaster from multiple single-layer SpatRaster you can just use c() like below:

#--- create a single-layer SpatRaster ---#
IA_cdl_2016_sr <- rast(IA_cdl_2016)

#--- concatenate ---#
(
  IA_cdl_ml_sr <- c(IA_cdl_2015_sr, IA_cdl_2016_sr)
)
(
  IA_cdl_ml_sr <- c(IA_cdl_2015_sr, IA_cdl_2016_sr)
)
class       : SpatRaster 
dimensions  : 11671, 17795, 2  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
sources     : IA_cdl_2015.tif  
              IA_cdl_2016.tif  
color table : 1, 2 
names       : IA_cdl_2015, IA_cdl_2016 
min values  :           0,           0 
max values  :         229,         241 

4.1.3 Converting a SpatRaster object to a Raster\(^*\) object.

You can convert a SpatRaster object to a Raster\(^*\) object using raster(), stack(), and brick(). Keep in mind that if you use rater() even though SpatRaster has multiple layers, the resulting RasterLayer object has only the first of the multiple layers.

#--- RasterLayer (only 1st layer) ---#
IA_cdl_stack_sr %>% raster()
class      : RasterLayer 
dimensions : 11671, 17795, 207685445  (nrow, ncol, ncell)
resolution : 30, 30  (x, y)
extent     : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
crs        : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
source     : IA_cdl_2015.tif 
names      : IA_cdl_2015 
values     : 0, 229  (min, max)
#--- RasterLayer ---#
IA_cdl_stack_sr %>% stack()
class      : RasterStack 
dimensions : 11671, 17795, 207685445, 2  (nrow, ncol, ncell, nlayers)
resolution : 30, 30  (x, y)
extent     : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
crs        : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
names      : IA_cdl_2015, IA_cdl_2016 
min values :           0,           0 
max values :         229,         241 
#--- RasterLayer (this takes some time) ---#
IA_cdl_stack_sr %>% brick()
class      : RasterStack 
dimensions : 11671, 17795, 207685445, 2  (nrow, ncol, ncell, nlayers)
resolution : 30, 30  (x, y)
extent     : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
crs        : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
names      : IA_cdl_2015, IA_cdl_2016 
min values :           0,           0 
max values :         229,         241 

Or, you can just use as(SpatRast, "Raster") like below:

as(IA_cdl_stack_sr, "Raster")
class      : RasterStack 
dimensions : 11671, 17795, 207685445, 2  (nrow, ncol, ncell, nlayers)
resolution : 30, 30  (x, y)
extent     : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
crs        : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
names      : IA_cdl_2015, IA_cdl_2016 
min values :           0,           0 
max values :         229,         241 

This works for any Raster\(^*\) object and you do not have to pick the right function like above.

4.1.4 Vector data in the terra package

terra package has its own class for vector data, called SpatVector. While we do not use any of the vector data functionality provided by the terra package, we learn how to convert an sf object to SpatVector because some of the terra functions do not support sf as of now (this will likely be resolved very soon). We will see some use cases of this conversion in the next chapter when we learn raster value extractions for vector data using terra::extract().

As an example, let’s use Illinois county border data.

Simple feature collection with 102 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -91.51308 ymin: 36.9703 xmax: -87.01993 ymax: 42.50848
Geodetic CRS:  NAD83
First 10 features:
    STATEFP COUNTYFP COUNTYNS GEOID    NAME       NAMELSAD LSAD CLASSFP MTFCC
86       17      067 00424235 17067 Hancock Hancock County   06      H1 G4020
93       17      025 00424214 17025    Clay    Clay County   06      H1 G4020
132      17      185 00424293 17185  Wabash  Wabash County   06      H1 G4020
149      17      113 01784833 17113  McLean  McLean County   06      H1 G4020
159      17      005 00424204 17005    Bond    Bond County   06      H1 G4020
160      17      009 00424206 17009   Brown   Brown County   06      H1 G4020
214      17      083 00424243 17083  Jersey  Jersey County   06      H1 G4020
255      17      147 00424275 17147   Piatt   Piatt County   06      H1 G4020
267      17      151 00424277 17151    Pope    Pope County   06      H1 G4020
304      17      011 00424207 17011  Bureau  Bureau County   06      H1 G4020
    CSAFP CBSAFP METDIVFP FUNCSTAT      ALAND   AWATER    INTPTLAT     INTPTLON
86    161  22800     <NA>        A 2055798688 53563362 +40.4013180 -091.1688008
93   <NA>   <NA>     <NA>        A 1213022841  3064720 +38.7468187 -088.4823254
132  <NA>   <NA>     <NA>        A  578403995 10973569 +38.4458209 -087.8391674
149   145  14010     <NA>        A 3064597287  7804856 +40.4945594 -088.8445391
159   476  41180     <NA>        A  985065096  6462629 +38.8859240 -089.4365916
160  <NA>   <NA>     <NA>        A  791828626  4144346 +39.9620694 -090.7503095
214   476  41180     <NA>        A  957415216 20333974 +39.0801945 -090.3613850
255  <NA>  16580     <NA>        A 1137492089   754122 +40.0090327 -088.5923546
267  <NA>   <NA>     <NA>        A  955362037 14662240 +37.4171687 -088.5423737
304   176  36837     <NA>        A 2250884551 11523914 +41.4013043 -089.5283772
                          geometry
86  MULTIPOLYGON (((-91.37421 4...
93  MULTIPOLYGON (((-88.69517 3...
132 MULTIPOLYGON (((-87.9446 38...
149 MULTIPOLYGON (((-89.2665 40...
159 MULTIPOLYGON (((-89.36179 3...
160 MULTIPOLYGON (((-90.91469 4...
214 MULTIPOLYGON (((-90.59216 3...
255 MULTIPOLYGON (((-88.74516 4...
267 MULTIPOLYGON (((-88.70963 3...
304 MULTIPOLYGON (((-89.85691 4...

You can convert an sf object to SpatVector object using terra::vect().

(
  IL_county_sv <- vect(IL_county)
)
 class       : SpatVector 
 geometry    : polygons 
 dimensions  : 102, 17  (geometries, attributes)
 extent      : -91.51308, -87.01993, 36.9703, 42.50848  (xmin, xmax, ymin, ymax)
 coord. ref. : lon/lat NAD83 (EPSG:4269) 
 names       : STATEFP COUNTYFP COUNTYNS GEOID    NAME       NAMELSAD  LSAD
 type        :   <chr>    <chr>    <chr> <chr>   <chr>          <chr> <chr>
 values      :      17      067 00424235 17067 Hancock Hancock County    06
                    17      025 00424214 17025    Clay    Clay County    06
                    17      185 00424293 17185  Wabash  Wabash County    06
 CLASSFP MTFCC CSAFP (and 7 more)
   <chr> <chr> <chr>             
      H1 G4020   161             
      H1 G4020    NA             
      H1 G4020    NA             

4.2 Read and write a raster data file

Sometimes we can download raster data of interest as we saw in Section 3.1. But, most of the time you need to read raster data stored as a file. Raster data files can come in numerous different formats. For example, PRPISM comes in the Band Interleaved by Line (BIL) format, some of the Daymet data comes in netCDF format. Other popular formats include GeoTiff, SAGA, ENVI, and many others.

4.2.1 Read raster file(s)

You use terra::rast() to read raster data of many common formats, and it should be almost always the case that the raster data you got can be read using this function. Here, we read a GeoTiff file (a file with .tif extension):

(
  IA_cdl_2015_sr <- rast("Data/IA_cdl_2015.tif")
)
class       : SpatRaster 
dimensions  : 11671, 17795, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
source      : IA_cdl_2015.tif 
color table : 1 
name        : IA_cdl_2015 
min value   :           0 
max value   :         229 

One important thing to note here is that the cell values of the raster data are actually not in memory when you “read” raster data from a file. You basically just established a connection to the file. This helps to reduce the memory footprint of raster data handling. You can check this by raster::inMemory() function for Raster\(^*\) objects, but the same function has not been implemented for terra yet.

You can read multiple single-layer raster datasets of the same spatial extent and resolution at the same time to have a multi-layer SpatRaster object. Here, we import two single-layer raster datasets (IA_cdl_2015.tif and IA_cdl_2016.tif) to create a two-layer SpatRaster object.

#--- the list of path to the files ---#
files_list <- c("Data/IA_cdl_2015.tif", "Data/IA_cdl_2016.tif")

#--- read the two at the same time ---#
(
  multi_layer_sr <- rast(files_list)
)
class       : SpatRaster 
dimensions  : 11671, 17795, 2  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
sources     : IA_cdl_2015.tif  
              IA_cdl_2016.tif  
color table : 1, 2 
names       : IA_cdl_2015, IA_cdl_2016 
min values  :           0,           0 
max values  :         229,         241 

Of course, this only works because the two datasets have the identical spatial extent and resolution. There are, however, no restrictions on what variable each of the raster layers represent. For example, you can combine PRISM temperature and precipitation raster layers.

4.2.2 Write raster files

You can write a SpatRaster object using terra::writeRaster().

terra::writeRaster(IA_cdl_2015_sr, "Data/IA_cdl_stack.tif", filetype = "GTiff", overwrite = TRUE)

The above code saves IA_cdl_2015_sr (a SpatRaster object) as a GeoTiff file.75 The filetype option can be dropped as writeRaster() infers the filetype from the extension of the file name. The overwrite = TRUE option is necessary if a file with the same name already exists and you are overwriting it. This is one of the many areas terra is better than raster. raster::writeRaster() can be frustratingly slow for a large Raster\(^*\) object. terra::writeRaster() is much faster.

You can also save a multi-layer SpatRaster object just like you save a single-layer SpatRaster object.

terra::writeRaster(IA_cdl_stack_sr, "Data/IA_cdl_stack.tif", filetype = "GTiff", overwrite = TRUE)

The saved file is a multi-band raster datasets. So, if you have many raster files of the same spatial extent and resolution, you can “stack” them on R and then export it to a single multi-band raster datasets, which cleans up your data folder.

4.3 Extract information from raster data object

4.3.1 Get CRS

You often need to extract the CRS of a raster object before you interact it with vector data (e.g., extracting values from a raster layer to vector data, or cropping a raster layer to the spatial extent of vector data), which can be done using terra::crs():

terra::crs(IA_cdl_2015_sr)
[1] "PROJCRS[\"unnamed\",\n    BASEGEOGCRS[\"NAD83\",\n        DATUM[\"North American Datum 1983\",\n            ELLIPSOID[\"GRS 1980\",6378137,298.257222101004,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4269]],\n    CONVERSION[\"Albers Equal Area\",\n        METHOD[\"Albers Equal Area\",\n            ID[\"EPSG\",9822]],\n        PARAMETER[\"Latitude of false origin\",23,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8821]],\n        PARAMETER[\"Longitude of false origin\",-96,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8822]],\n        PARAMETER[\"Latitude of 1st standard parallel\",29.5,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8823]],\n        PARAMETER[\"Latitude of 2nd standard parallel\",45.5,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8824]],\n        PARAMETER[\"Easting at false origin\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8826]],\n        PARAMETER[\"Northing at false origin\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8827]]],\n    CS[Cartesian,2],\n        AXIS[\"easting\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]],\n        AXIS[\"northing\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]]"

4.3.2 Subset

You can access specific layers in a multi-layer raster object by indexing:

#--- index ---#
IA_cdl_stack_sr[[2]] # (originally IA_cdl_2016.tif)
class       : SpatRaster 
dimensions  : 11671, 17795, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -52095, 481755, 1938165, 2288295  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
source      : IA_cdl_2016.tif 
color table : 1 
name        : IA_cdl_2016 
min value   :           0 
max value   :         241 

4.3.3 Get cell values

You can access the values stored in a SpatRaster object using the terra::values() function:

#--- terra::values ---#
values_from_rs <- terra::values(IA_cdl_stack_sr)

#--- take a look ---#
head(values_from_rs)
     Layer_1 Layer_1
[1,]       0       0
[2,]       0       0
[3,]       0       0
[4,]       0       0
[5,]       0       0
[6,]       0       0

The returned values come in a matrix form of two columns because we are getting values from a two-layer SpatRaster object (one column for each layer). In general, terra::values() returns a \(X\) by \(n\) matrix, where \(X\) is the number of cells and \(n\) is the number of layers.

4.4 Turning a raster object into a data.frame

When creating maps using ggplot2() from a SpatRaster or Raster\(^*\) object, it is necessary to first convert it to a data.frame (see Chapter 8.2). You can use the as.data.frame() function with xy = TRUE option to construct a data.frame where each row represents a single cell that has cell values for each layer and its coordinates (the center of the cell).

#--- converting to a data.frame ---#
IA_cdl_df <- as.data.frame(IA_cdl_stack_sr, xy = TRUE) # this works with Raster* objects as well

#--- take a look ---#
head(IA_cdl_df)
       x       y Layer_1 Layer_1.1
1 -52080 2288280       0         0
2 -52050 2288280       0         0
3 -52020 2288280       0         0
4 -51990 2288280       0         0
5 -51960 2288280       0         0
6 -51930 2288280       0         0

Note: I have seen cases where a raster object is converted to a data.frame and then to ‘sf’ for the purpose of interacting it with polygons using st_join() to extract and assign cell values for the polygons (see Chapter 3 for this type of operation). This is, however, not generally recommended for two reasons. First of all, it is much slower than using functions that work on a raster object and polygons directly to extract values for the polygons, such as terra::extract() or exactextractr::exact_extract(), which are introduced in Chapter 5. This is primarily because converting a raster object to a data.frame() is very slow. Second, once a raster data object is converted to a point sf data, one can no longer weight cell values according to the degree of their overlaps with the target polygons. The only case I have found the conversion beneficial (or simply necessary) is to create a map using ggplot() from a raster object (the tmap package can work directly with raster objects.). While the desire to work with a data.frame is understandable as we are very comfortable with working on a data.frame, it is very likely that the conversion to a data.frame is not only unnecessary, but also inefficient76.

4.5 Quick visualization

To have a quick visualization of the data values of SpatRaster objects, you can simply use plot():

plot(IA_cdl_2015_sr)

For a more elaborate map using raster data, see Chapter 8.2.

4.6 Working with netCDFs

It is worth talking about how to read a netCDFs format – a multidimensional file format that resembles a raster stack/brick. A netCDF file contains data with a specific structure: a two-dimensional spatial grid (e.g., longitude and latitude) and a third dimension which is usually date or time. This structure is convenient for weather data measured on a consistent grid over time. One such dataset is called gridMET which maintains a gridded dataset of weather variables at 4km resolution. Let’s download the daily precipitation data for 2018 using downloader::download()77. We set the destination file name (what to call the file and where we want it to be), and the mode to wb for a binary download.

#--- download gridMET precipitation 2018 ---#
downloader::download(
  url = str_c("http://www.northwestknowledge.net/metdata/data/pr_2018.nc"),
  destfile = "Data/pr_2018.nc",
  mode = "wb"
)

This code should have stored the data as pr_2018.nc in the Data folder. You can read a netCDF file using terra::rast().

(
  pr_2018_gm <- rast("Data/pr_2018.nc")
)
class       : SpatRaster 
dimensions  : 585, 1386, 365  (nrow, ncol, nlyr)
resolution  : 0.04166667, 0.04166667  (x, y)
extent      : -124.7875, -67.0375, 25.04583, 49.42083  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : pr_2018.nc 
varname     : precipitation_amount (pr) 
names       : preci~43099, preci~43100, preci~43101, preci~43102, preci~43103, preci~43104, ... 
unit        :          mm,          mm,          mm,          mm,          mm,          mm, ... 

You can see that it has 365 layers: one layer per day in 2018. Let’s now look at layer names:

head(names(pr_2018_gm))
[1] "precipitation_amount_day=43099" "precipitation_amount_day=43100"
[3] "precipitation_amount_day=43101" "precipitation_amount_day=43102"
[5] "precipitation_amount_day=43103" "precipitation_amount_day=43104"

Since we have 365 layers and the number at the end of the layer names increase by 1, you would think that nth layer represents nth day of 2018. In this case, you are correct. However, it is always a good practice to confirm what each layer represents without assuming anything. Now, let’s use the ncdf4 package, which is built specifically to handle netCDF4 objects.

(
  pr_2018_nc <- ncdf4::nc_open("Data/pr_2018.nc")
)
File Data/pr_2018.nc (NC_FORMAT_NETCDF4):

     1 variables (excluding dimension variables):
        unsigned short precipitation_amount[lon,lat,day]   (Chunking: [231,98,61])  (Compression: level 9)
            _FillValue: 32767
            units: mm
            description: Daily Accumulated Precipitation
            long_name: pr
            standard_name: pr
            missing_value: 32767
            dimensions: lon lat time
            grid_mapping: crs
            coordinate_system: WGS84,EPSG:4326
            scale_factor: 0.1
            add_offset: 0
            coordinates: lon lat
            _Unsigned: true

     4 dimensions:
        lon  Size:1386 
            units: degrees_east
            description: longitude
            long_name: longitude
            standard_name: longitude
            axis: X
        lat  Size:585 
            units: degrees_north
            description: latitude
            long_name: latitude
            standard_name: latitude
            axis: Y
        day  Size:365 
            description: days since 1900-01-01
            units: days since 1900-01-01 00:00:00
            long_name: time
            standard_name: time
            calendar: gregorian
        crs  Size:1 
            grid_mapping_name: latitude_longitude
            longitude_of_prime_meridian: 0
            semi_major_axis: 6378137
            long_name: WGS 84
            inverse_flattening: 298.257223563
            GeoTransform: -124.7666666333333 0.041666666666666 0  49.400000000000000 -0.041666666666666
            spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]

    19 global attributes:
        geospatial_bounds_crs: EPSG:4326
        Conventions: CF-1.6
        geospatial_bounds: POLYGON((-124.7666666333333 49.400000000000000, -124.7666666333333 25.066666666666666, -67.058333300000015 25.066666666666666, -67.058333300000015 49.400000000000000, -124.7666666333333 49.400000000000000))
        geospatial_lat_min: 25.066666666666666
        geospatial_lat_max: 49.40000000000000
        geospatial_lon_min: -124.7666666333333
        geospatial_lon_max: -67.058333300000015
        geospatial_lon_resolution: 0.041666666666666
        geospatial_lat_resolution: 0.041666666666666
        geospatial_lat_units: decimal_degrees north
        geospatial_lon_units: decimal_degrees east
        coordinate_system: EPSG:4326
        author: John Abatzoglou - University of Idaho, jabatzoglou@uidaho.edu
        date: 03 May 2021
        note1: The projection information for this file is: GCS WGS 1984.
        note2: Citation: Abatzoglou, J.T., 2013, Development of gridded surface meteorological data for ecological applications and modeling, International Journal of Climatology, DOI: 10.1002/joc.3413
        note3: Data in slices after last_permanent_slice (1-based) are considered provisional and subject to change with subsequent updates
        note4: Data in slices after last_provisional_slice (1-based) are considered early and subject to change with subsequent updates
        note5: Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)

As you can see from the output, there is tons of information that we did not see when we read the data using rast(), which includes the explanation of the third dimension (day) of this raster object. It turned out that the numerical values at the end of layer names in the SpatRaster object are days since 1900-01-01. So, the first layer (named precipitation_amount_day=43099) represents:

ymd("1900-01-01") + 43099
[1] "2018-01-01"

Actually, if we use raster::brick(), instead of terra::rast(), then we can see the naming convention of the layers:

(
  pr_2018_b <- brick("Data/pr_2018.nc")
)
class      : RasterBrick 
dimensions : 585, 1386, 810810, 365  (nrow, ncol, ncell, nlayers)
resolution : 0.04166667, 0.04166667  (x, y)
extent     : -124.7875, -67.0375, 25.04583, 49.42083  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : pr_2018.nc 
names      : X43099, X43100, X43101, X43102, X43103, X43104, X43105, X43106, X43107, X43108, X43109, X43110, X43111, X43112, X43113, ... 
day (days since 1900-01-01 00:00:00): 43099, 43463 (min, max)
varname    : precipitation_amount 

SpatRaster or RasterBrick objects are easier to work with as many useful functions accept them as inputs, but not the ncdf4 object. Personally, I first scrutinize a netCDFs file using nc_open() and then import it as a SpatRaster or RasterBrick object78. Recovering the dates for the layers is particularly important as we often wrangle the resulting data based on date (e.g., subset the data so that you have only April to September). An example of date recovery can be seen in Chapter 9.5.

Very detailed description of how to work with ncdf4 object is provided here. However, I would say that economists are unlikely to benefit much from it. As I have stated several times already, we (economists) rarely need to manipulate raster data itself. Our observation units are almost always points (e.g., farms, houses) or polygons (e.g., county boundary) and we just need to associate those units with the values held in raster data (e.g., precipitation). We can simply use raster objects as they are and supply them to functions that take care of the business of raster value extraction, which is covered in the next chapter (Chapter 5).