4.6 Working with netCDFs
It is worth talking about how to read a netCDFs format – a multidimensional file format that resembles a raster stack/brick. A netCDF file contains data with a specific structure: a two-dimensional spatial grid (e.g., longitude and latitude) and a third dimension which is usually date or time. This structure is convenient for weather data measured on a consistent grid over time. One such dataset is called gridMET which maintains a gridded dataset of weather variables at 4km resolution. Let’s download the daily precipitation data for 2018 using downloader::download()
72. We set the destination file name (what to call the file and where we want it to be), and the mode to wb
for a binary download.
#--- download gridMET precipitation 2018 ---#
::download(
downloaderurl = str_c("http://www.northwestknowledge.net/metdata/data/pr_2018.nc"),
destfile = "Data/pr_2018.nc",
mode = "wb"
)
This code should have stored the data as pr_2018.nc in the Data folder. You can read a netCDF file using terra::rast()
.
(<- rast("Data/pr_2018.nc")
pr_2018_gm )
class : SpatRaster
dimensions : 585, 1386, 365 (nrow, ncol, nlyr)
resolution : 0.04166667, 0.04166667 (x, y)
extent : -124.7875, -67.0375, 25.04583, 49.42083 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : pr_2018.nc
varname : precipitation_amount (pr)
names : preci~43099, preci~43100, preci~43101, preci~43102, preci~43103, preci~43104, ...
unit : mm, mm, mm, mm, mm, mm, ...
You can see that it has 365 layers: one layer per day in 2018. Let’s now look at layer names:
head(names(pr_2018_gm))
[1] "precipitation_amount_day=43099" "precipitation_amount_day=43100"
[3] "precipitation_amount_day=43101" "precipitation_amount_day=43102"
[5] "precipitation_amount_day=43103" "precipitation_amount_day=43104"
Since we have 365 layers and the number at the end of the layer names increase by 1, you would think that nth layer represents nth day of 2018. In this case, you are correct. However, it is always a good practice to confirm what each layer represents without assuming anything. Now, let’s use the ncdf4
package, which is built specifically to handle netCDF4 objects.
(<- ncdf4::nc_open("Data/pr_2018.nc")
pr_2018_nc )
File Data/pr_2018.nc (NC_FORMAT_NETCDF4):
1 variables (excluding dimension variables):
unsigned short precipitation_amount[lon,lat,day] (Chunking: [231,98,61]) (Compression: level 9)
_FillValue: 32767
units: mm
description: Daily Accumulated Precipitation
long_name: pr
standard_name: pr
missing_value: 32767
dimensions: lon lat time
grid_mapping: crs
coordinate_system: WGS84,EPSG:4326
scale_factor: 0.1
add_offset: 0
coordinates: lon lat
_Unsigned: true
4 dimensions:
lon Size:1386
units: degrees_east
description: longitude
long_name: longitude
standard_name: longitude
axis: X
lat Size:585
units: degrees_north
description: latitude
long_name: latitude
standard_name: latitude
axis: Y
day Size:365
description: days since 1900-01-01
units: days since 1900-01-01 00:00:00
long_name: time
standard_name: time
calendar: gregorian
crs Size:1
grid_mapping_name: latitude_longitude
longitude_of_prime_meridian: 0
semi_major_axis: 6378137
long_name: WGS 84
inverse_flattening: 298.257223563
GeoTransform: -124.7666666333333 0.041666666666666 0 49.400000000000000 -0.041666666666666
spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]
19 global attributes:
geospatial_bounds_crs: EPSG:4326
Conventions: CF-1.6
geospatial_bounds: POLYGON((-124.7666666333333 49.400000000000000, -124.7666666333333 25.066666666666666, -67.058333300000015 25.066666666666666, -67.058333300000015 49.400000000000000, -124.7666666333333 49.400000000000000))
geospatial_lat_min: 25.066666666666666
geospatial_lat_max: 49.40000000000000
geospatial_lon_min: -124.7666666333333
geospatial_lon_max: -67.058333300000015
geospatial_lon_resolution: 0.041666666666666
geospatial_lat_resolution: 0.041666666666666
geospatial_lat_units: decimal_degrees north
geospatial_lon_units: decimal_degrees east
coordinate_system: EPSG:4326
author: John Abatzoglou - University of Idaho, jabatzoglou@uidaho.edu
date: 03 May 2021
note1: The projection information for this file is: GCS WGS 1984.
note2: Citation: Abatzoglou, J.T., 2013, Development of gridded surface meteorological data for ecological applications and modeling, International Journal of Climatology, DOI: 10.1002/joc.3413
note3: Data in slices after last_permanent_slice (1-based) are considered provisional and subject to change with subsequent updates
note4: Data in slices after last_provisional_slice (1-based) are considered early and subject to change with subsequent updates
note5: Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)
As you can see from the output, there is tons of information that we did not see when we read the data using rast()
, which includes the explanation of the third dimension (day) of this raster object. It turned out that the numerical values at the end of layer names in the SpatRaster
object are days since 1900-01-01. So, the first layer (named precipitation_amount_day=43099) represents:
ymd("1900-01-01") + 43099
[1] "2018-01-01"
Actually, if we use raster::brick()
, instead of terra::rast()
, then we can see the naming convention of the layers:
(<- brick("Data/pr_2018.nc")
pr_2018_b )
class : RasterBrick
dimensions : 585, 1386, 810810, 365 (nrow, ncol, ncell, nlayers)
resolution : 0.04166667, 0.04166667 (x, y)
extent : -124.7875, -67.0375, 25.04583, 49.42083 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
source : pr_2018.nc
names : X43099, X43100, X43101, X43102, X43103, X43104, X43105, X43106, X43107, X43108, X43109, X43110, X43111, X43112, X43113, ...
day (days since 1900-01-01 00:00:00): 43099, 43463 (min, max)
varname : precipitation_amount
SpatRaster
or RasterBrick
objects are easier to work with as many useful functions accept them as inputs, but not the ncdf4
object. Personally, I first scrutinize a netCDFs file using nc_open()
and then import it as a SpatRaster
or RasterBrick
object73. Recovering the dates for the layers is particularly important as we often wrangle the resulting data based on date (e.g., subset the data so that you have only April to September). An example of date recovery can be seen in Chapter 9.5.
Very detailed description of how to work with ncdf4
object is provided here. However, I would say that economists are unlikely to benefit much from it. As I have stated several times already, we (economists) rarely need to manipulate raster data itself. Our observation units are almost always points (e.g., farms, houses) or polygons (e.g., county boundary) and we just need to associate those units with the values held in raster data (e.g., precipitation). We can simply use raster objects as they are and supply them to functions that take care of the business of raster value extraction, which is covered in the next chapter (Chapter 5).
gridMET data is also available in the Google Earth Engine Data Catalog, which can be accessed with the R library
rgee
↩︎Even though
RasterBrick
provides the description of how layers are named, I think it is a good practice to see the full description in thencdf4
object.↩︎