4.6 Working with netCDFs

It is worth talking about how to read a netCDFs format – a multidimensional file format that resembles a raster stack/brick. A netCDF file contains data with a specific structure: a two-dimensional spatial grid (e.g., longitude and latitude) and a third dimension which is usually date or time. This structure is convenient for weather data measured on a consistent grid over time. One such dataset is called gridMET which maintains a gridded dataset of weather variables at 4km resolution. Let’s download the daily precipitation data for 2018 using downloader::download()72. We set the destination file name (what to call the file and where we want it to be), and the mode to wb for a binary download.

#--- download gridMET precipitation 2018 ---#
downloader::download(
  url = str_c("http://www.northwestknowledge.net/metdata/data/pr_2018.nc"),
  destfile = "Data/pr_2018.nc",
  mode = "wb"
)

This code should have stored the data as pr_2018.nc in the Data folder. You can read a netCDF file using terra::rast().

(
  pr_2018_gm <- rast("Data/pr_2018.nc")
)
class       : SpatRaster 
dimensions  : 585, 1386, 365  (nrow, ncol, nlyr)
resolution  : 0.04166667, 0.04166667  (x, y)
extent      : -124.7875, -67.0375, 25.04583, 49.42083  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : pr_2018.nc 
varname     : precipitation_amount (pr) 
names       : preci~43099, preci~43100, preci~43101, preci~43102, preci~43103, preci~43104, ... 
unit        :          mm,          mm,          mm,          mm,          mm,          mm, ... 

You can see that it has 365 layers: one layer per day in 2018. Let’s now look at layer names:

head(names(pr_2018_gm))
[1] "precipitation_amount_day=43099" "precipitation_amount_day=43100"
[3] "precipitation_amount_day=43101" "precipitation_amount_day=43102"
[5] "precipitation_amount_day=43103" "precipitation_amount_day=43104"

Since we have 365 layers and the number at the end of the layer names increase by 1, you would think that nth layer represents nth day of 2018. In this case, you are correct. However, it is always a good practice to confirm what each layer represents without assuming anything. Now, let’s use the ncdf4 package, which is built specifically to handle netCDF4 objects.

(
  pr_2018_nc <- ncdf4::nc_open("Data/pr_2018.nc")
)
File Data/pr_2018.nc (NC_FORMAT_NETCDF4):

     1 variables (excluding dimension variables):
        unsigned short precipitation_amount[lon,lat,day]   (Chunking: [231,98,61])  (Compression: level 9)
            _FillValue: 32767
            units: mm
            description: Daily Accumulated Precipitation
            long_name: pr
            standard_name: pr
            missing_value: 32767
            dimensions: lon lat time
            grid_mapping: crs
            coordinate_system: WGS84,EPSG:4326
            scale_factor: 0.1
            add_offset: 0
            coordinates: lon lat
            _Unsigned: true

     4 dimensions:
        lon  Size:1386 
            units: degrees_east
            description: longitude
            long_name: longitude
            standard_name: longitude
            axis: X
        lat  Size:585 
            units: degrees_north
            description: latitude
            long_name: latitude
            standard_name: latitude
            axis: Y
        day  Size:365 
            description: days since 1900-01-01
            units: days since 1900-01-01 00:00:00
            long_name: time
            standard_name: time
            calendar: gregorian
        crs  Size:1 
            grid_mapping_name: latitude_longitude
            longitude_of_prime_meridian: 0
            semi_major_axis: 6378137
            long_name: WGS 84
            inverse_flattening: 298.257223563
            GeoTransform: -124.7666666333333 0.041666666666666 0  49.400000000000000 -0.041666666666666
            spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]

    19 global attributes:
        geospatial_bounds_crs: EPSG:4326
        Conventions: CF-1.6
        geospatial_bounds: POLYGON((-124.7666666333333 49.400000000000000, -124.7666666333333 25.066666666666666, -67.058333300000015 25.066666666666666, -67.058333300000015 49.400000000000000, -124.7666666333333 49.400000000000000))
        geospatial_lat_min: 25.066666666666666
        geospatial_lat_max: 49.40000000000000
        geospatial_lon_min: -124.7666666333333
        geospatial_lon_max: -67.058333300000015
        geospatial_lon_resolution: 0.041666666666666
        geospatial_lat_resolution: 0.041666666666666
        geospatial_lat_units: decimal_degrees north
        geospatial_lon_units: decimal_degrees east
        coordinate_system: EPSG:4326
        author: John Abatzoglou - University of Idaho, jabatzoglou@uidaho.edu
        date: 03 May 2021
        note1: The projection information for this file is: GCS WGS 1984.
        note2: Citation: Abatzoglou, J.T., 2013, Development of gridded surface meteorological data for ecological applications and modeling, International Journal of Climatology, DOI: 10.1002/joc.3413
        note3: Data in slices after last_permanent_slice (1-based) are considered provisional and subject to change with subsequent updates
        note4: Data in slices after last_provisional_slice (1-based) are considered early and subject to change with subsequent updates
        note5: Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)

As you can see from the output, there is tons of information that we did not see when we read the data using rast(), which includes the explanation of the third dimension (day) of this raster object. It turned out that the numerical values at the end of layer names in the SpatRaster object are days since 1900-01-01. So, the first layer (named precipitation_amount_day=43099) represents:

ymd("1900-01-01") + 43099
[1] "2018-01-01"

Actually, if we use raster::brick(), instead of terra::rast(), then we can see the naming convention of the layers:

(
  pr_2018_b <- brick("Data/pr_2018.nc")
)
class      : RasterBrick 
dimensions : 585, 1386, 810810, 365  (nrow, ncol, ncell, nlayers)
resolution : 0.04166667, 0.04166667  (x, y)
extent     : -124.7875, -67.0375, 25.04583, 49.42083  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : pr_2018.nc 
names      : X43099, X43100, X43101, X43102, X43103, X43104, X43105, X43106, X43107, X43108, X43109, X43110, X43111, X43112, X43113, ... 
day (days since 1900-01-01 00:00:00): 43099, 43463 (min, max)
varname    : precipitation_amount 

SpatRaster or RasterBrick objects are easier to work with as many useful functions accept them as inputs, but not the ncdf4 object. Personally, I first scrutinize a netCDFs file using nc_open() and then import it as a SpatRaster or RasterBrick object73. Recovering the dates for the layers is particularly important as we often wrangle the resulting data based on date (e.g., subset the data so that you have only April to September). An example of date recovery can be seen in Chapter 9.5.

Very detailed description of how to work with ncdf4 object is provided here. However, I would say that economists are unlikely to benefit much from it. As I have stated several times already, we (economists) rarely need to manipulate raster data itself. Our observation units are almost always points (e.g., farms, houses) or polygons (e.g., county boundary) and we just need to associate those units with the values held in raster data (e.g., precipitation). We can simply use raster objects as they are and supply them to functions that take care of the business of raster value extraction, which is covered in the next chapter (Chapter 5).


  1. gridMET data is also available in the Google Earth Engine Data Catalog, which can be accessed with the R library rgee↩︎

  2. Even though RasterBrick provides the description of how layers are named, I think it is a good practice to see the full description in the ncdf4 object.↩︎