09-6: R as GIS: Download Spatial Datasets using R

Before you start


Learning objectives

Learn how to download publicly available agriculture-related data from within R.


Tips to make the most of the lecture notes


Interactive navigation tools

  • Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want

  • Hit letter “o” on your keyboard and you will have a panel view of all the slides


Running and writing codes

  • The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
  • Hit the “Run Code” button to execute all the code inside the code area.
  • You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
  • If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
  • You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

U.S. county and state boundary

  • U.S. county and state boundary data are commonly used in many scientific studies.

  • The tigris package is one of the packages that let you download them from within R.

  • It lets you download much more than just county and state boundaries. See other type of data here

  • You can use tigris::states() to download the state boundary data as an sf object.

  • By default, the most detailed boundary data is downloaded, which can be quite large

    • creating map using ggplot() can take significantly more time
    • the map can be quite large in size
  • By adding cb = TRUE, you will get generalized (less detailed) boundary data, which is usually sufficient.

states_sf <- tigris::states(cb = TRUE, progress_bar = FALSE)


ggplot(states_sf) +
  geom_sf() +
  theme_void()

  • You can use tigris::counties() to download the county boundary data as an sf object.

  • You can specify states by the state option.

IL_IN_county <- 
  tigris::counties(
    state = c("IL", "IN"), 
    cb = TRUE, 
    progress_bar = FALSE
  )


ggplot() +
  geom_sf(data = IL_IN_county) +
  geom_sf(
    data = dplyr::filter(states_sf, NAME %in% c("Illinois", "Indiana")),
    fill = NA,
    color = "blue",
    linewidth = 1
  ) +
  theme_void()

USDA-NASS

  • USDA NASS Quick Stats provides wealth of agriculture-related datasets such as harvested acres or irrigated acres by crop at different spatial resolutions (e.g., state, county) from both survey and census.

  • We use the tidyUSDA package to download dat from USDA NASS Quick Stat.

  • A nice thing about tidyUSDA is that it gives you an option to download data as an sf object, which means you can immediately visualize the data or spatially interact it with other spatial objects.

  • First thing you want to do is to get an API key from this website.
    • click on obtain an API key
    • save the API key somewhere
  • You use this API key every time you download data from R using tidyUSDA.

You can download data using tidyUSDA::getQuickstat().


Syntax

getQuickstat(
    key,
    program,
    data_item,
    geographic_level,
    state,
    year,
    geometry
  )


  • key: API key
  • program: either “Survey” or “Census”
  • data_item: name of the variable to download
  • geographic_level: set the level of geographical unit (“County”, “State”)
  • state: vector of states
  • year: vector of years in character
  • geometry: if TRUE, then the downloaded data will be sf with geometry included. If false, a data.frame without geometry is returned.


Note

  • There are many other options. Run ?getQuickstat to see all the options.
  • The above options should cover most of your use cases.

Sometimes, you know what you would like to download, but do not the name of the variable for it. In such a case, you can first get a list of all the data item names with this:


You can then narrow down the list using keywords. Suppose you are interested in getting irrigated grain corn yield measured in bu/acre.


You can now copy the first entry of the results and paste it for data_item option.

The code below download county-level irrigated grain corn yield (bu/acre) in Illinois and Nebraska from 2000 through 2005.

(
IL_NE_ir_corn_yield <-
  tidyUSDA::getQuickstat(
    key = nass_api_key, # you need to replace it with your API key
    program = "SURVEY",
    data_item = "CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE",
    geographic_level = "COUNTY",
    state = c("ILLINOIS", "NEBRASKA"),
    year = as.character(2000:2005),
    geometry = TRUE
  )
)
Simple feature collection with 546 features and 57 fields (with 6 geometries empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -104.0535 ymin: 39.99993 xmax: -95.30829 ymax: 43.00171
Geodetic CRS:  NAD83
First 10 features:
   GEOID year      ALAND unit_desc
1  31007 2005 1932231568 BU / ACRE
2  31007 2004 1932231568 BU / ACRE
3  31007 2003 1932231568 BU / ACRE
4  31007 2002 1932231568 BU / ACRE
5  31007 2001 1932231568 BU / ACRE
6  31007 2000 1932231568 BU / ACRE
7  31013 2005 2784618473 BU / ACRE
8  31013 2004 2784618473 BU / ACRE
9  31013 2003 2784618473 BU / ACRE
10 31013 2002 2784618473 BU / ACRE
                                              short_desc Value region_desc
1  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   165            
2  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   160            
3  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   142            
4  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   127            
5  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   143            
6  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   125            
7  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   177            
8  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   151            
9  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   168            
10 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   147            
   prodn_practice_desc county_code statisticcat_desc state_alpha asd_code
1            IRRIGATED         007             YIELD          NE       10
2            IRRIGATED         007             YIELD          NE       10
3            IRRIGATED         007             YIELD          NE       10
4            IRRIGATED         007             YIELD          NE       10
5            IRRIGATED         007             YIELD          NE       10
6            IRRIGATED         007             YIELD          NE       10
7            IRRIGATED         013             YIELD          NE       10
8            IRRIGATED         013             YIELD          NE       10
9            IRRIGATED         013             YIELD          NE       10
10           IRRIGATED         013             YIELD          NE       10
    country_name zip_5 agg_level_desc state_ansi CV (%) county_name
1  UNITED STATES               COUNTY         31             BANNER
2  UNITED STATES               COUNTY         31             BANNER
3  UNITED STATES               COUNTY         31             BANNER
4  UNITED STATES               COUNTY         31             BANNER
5  UNITED STATES               COUNTY         31             BANNER
6  UNITED STATES               COUNTY         31             BANNER
7  UNITED STATES               COUNTY         31          BOX BUTTE
8  UNITED STATES               COUNTY         31          BOX BUTTE
9  UNITED STATES               COUNTY         31          BOX BUTTE
10 UNITED STATES               COUNTY         31          BOX BUTTE
   watershed_code reference_period_desc domaincat_desc country_code week_ending
1        00000000                  YEAR  NOT SPECIFIED         9000            
2        00000000                  YEAR  NOT SPECIFIED         9000            
3        00000000                  YEAR  NOT SPECIFIED         9000            
4        00000000                  YEAR  NOT SPECIFIED         9000            
5        00000000                  YEAR  NOT SPECIFIED         9000            
6        00000000                  YEAR  NOT SPECIFIED         9000            
7        00000000                  YEAR  NOT SPECIFIED         9000            
8        00000000                  YEAR  NOT SPECIFIED         9000            
9        00000000                  YEAR  NOT SPECIFIED         9000            
10       00000000                  YEAR  NOT SPECIFIED         9000            
    class_desc state_name state_fips_code watershed_desc congr_district_code
1  ALL CLASSES   NEBRASKA              31                                   
2  ALL CLASSES   NEBRASKA              31                                   
3  ALL CLASSES   NEBRASKA              31                                   
4  ALL CLASSES   NEBRASKA              31                                   
5  ALL CLASSES   NEBRASKA              31                                   
6  ALL CLASSES   NEBRASKA              31                                   
7  ALL CLASSES   NEBRASKA              31                                   
8  ALL CLASSES   NEBRASKA              31                                   
9  ALL CLASSES   NEBRASKA              31                                   
10 ALL CLASSES   NEBRASKA              31                                   
                    location_desc               load_time end_code source_desc
1     NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000       00      SURVEY
2     NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000       00      SURVEY
3     NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000       00      SURVEY
4     NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000       00      SURVEY
5     NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000       00      SURVEY
6     NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000       00      SURVEY
7  NEBRASKA, NORTHWEST, BOX BUTTE 2012-01-01 00:00:00.000       00      SURVEY
8  NEBRASKA, NORTHWEST, BOX BUTTE 2012-01-01 00:00:00.000       00      SURVEY
9  NEBRASKA, NORTHWEST, BOX BUTTE 2012-01-01 00:00:00.000       00      SURVEY
10 NEBRASKA, NORTHWEST, BOX BUTTE 2012-01-01 00:00:00.000       00      SURVEY
   county_ansi freq_desc sector_desc  group_desc util_practice_desc begin_code
1          007    ANNUAL       CROPS FIELD CROPS              GRAIN         00
2          007    ANNUAL       CROPS FIELD CROPS              GRAIN         00
3          007    ANNUAL       CROPS FIELD CROPS              GRAIN         00
4          007    ANNUAL       CROPS FIELD CROPS              GRAIN         00
5          007    ANNUAL       CROPS FIELD CROPS              GRAIN         00
6          007    ANNUAL       CROPS FIELD CROPS              GRAIN         00
7          013    ANNUAL       CROPS FIELD CROPS              GRAIN         00
8          013    ANNUAL       CROPS FIELD CROPS              GRAIN         00
9          013    ANNUAL       CROPS FIELD CROPS              GRAIN         00
10         013    ANNUAL       CROPS FIELD CROPS              GRAIN         00
    asd_desc domain_desc commodity_desc COUNTYKEY STATEFP COUNTYFP COUNTYNS
1  NORTHWEST       TOTAL           CORN     31007      31      007 00835826
2  NORTHWEST       TOTAL           CORN     31007      31      007 00835826
3  NORTHWEST       TOTAL           CORN     31007      31      007 00835826
4  NORTHWEST       TOTAL           CORN     31007      31      007 00835826
5  NORTHWEST       TOTAL           CORN     31007      31      007 00835826
6  NORTHWEST       TOTAL           CORN     31007      31      007 00835826
7  NORTHWEST       TOTAL           CORN     31013      31      013 00835991
8  NORTHWEST       TOTAL           CORN     31013      31      013 00835991
9  NORTHWEST       TOTAL           CORN     31013      31      013 00835991
10 NORTHWEST       TOTAL           CORN     31013      31      013 00835991
        NAME         NAMELSAD LSAD CLASSFP MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT
1     Banner    Banner County   06      H1 G4020  <NA>   <NA>     <NA>        A
2     Banner    Banner County   06      H1 G4020  <NA>   <NA>     <NA>        A
3     Banner    Banner County   06      H1 G4020  <NA>   <NA>     <NA>        A
4     Banner    Banner County   06      H1 G4020  <NA>   <NA>     <NA>        A
5     Banner    Banner County   06      H1 G4020  <NA>   <NA>     <NA>        A
6     Banner    Banner County   06      H1 G4020  <NA>   <NA>     <NA>        A
7  Box Butte Box Butte County   06      H1 G4020  <NA>   <NA>     <NA>        A
8  Box Butte Box Butte County   06      H1 G4020  <NA>   <NA>     <NA>        A
9  Box Butte Box Butte County   06      H1 G4020  <NA>   <NA>     <NA>        A
10 Box Butte Box Butte County   06      H1 G4020  <NA>   <NA>     <NA>        A
    AWATER    INTPTLAT     INTPTLON                       geometry
1   392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
2   392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
3   392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
4   392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
5   392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
6   392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
7  7150304 +42.2103804 -103.0817795 MULTIPOLYGON (((-103.4444 4...
8  7150304 +42.2103804 -103.0817795 MULTIPOLYGON (((-103.4444 4...
9  7150304 +42.2103804 -103.0817795 MULTIPOLYGON (((-103.4444 4...
10 7150304 +42.2103804 -103.0817795 MULTIPOLYGON (((-103.4444 4...

As you saw earlier, it has 58 columns, most of which are not necessary.

Here is the list of only variables you will probably need:

IL_NE_ir_corn_yield %>%
  dplyr::select(
    year, county_name, county_code, state_name,
    state_fips_code, short_desc, Value
  )
Simple feature collection with 546 features and 7 fields (with 6 geometries empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -104.0535 ymin: 39.99993 xmax: -95.30829 ymax: 43.00171
Geodetic CRS:  NAD83
First 10 features:
   year county_name county_code state_name state_fips_code
1  2005      BANNER         007   NEBRASKA              31
2  2004      BANNER         007   NEBRASKA              31
3  2003      BANNER         007   NEBRASKA              31
4  2002      BANNER         007   NEBRASKA              31
5  2001      BANNER         007   NEBRASKA              31
6  2000      BANNER         007   NEBRASKA              31
7  2005   BOX BUTTE         013   NEBRASKA              31
8  2004   BOX BUTTE         013   NEBRASKA              31
9  2003   BOX BUTTE         013   NEBRASKA              31
10 2002   BOX BUTTE         013   NEBRASKA              31
                                              short_desc Value
1  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   165
2  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   160
3  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   142
4  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   127
5  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   143
6  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   125
7  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   177
8  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   151
9  CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   168
10 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE   147
                         geometry
1  MULTIPOLYGON (((-104.0529 4...
2  MULTIPOLYGON (((-104.0529 4...
3  MULTIPOLYGON (((-104.0529 4...
4  MULTIPOLYGON (((-104.0529 4...
5  MULTIPOLYGON (((-104.0529 4...
6  MULTIPOLYGON (((-104.0529 4...
7  MULTIPOLYGON (((-103.4444 4...
8  MULTIPOLYGON (((-103.4444 4...
9  MULTIPOLYGON (((-103.4444 4...
10 MULTIPOLYGON (((-103.4444 4...


Note

  • The value of the variable of your interest is stored in Value column.

You cannot retrieve more than 50,000 (the limit is set by QuickStat) rows of data. The query below requests much more than 50,000 observations, and fail. In this case, you need to narrow the search and chop the task into smaller tasks.

Replace nass_api_key with your own API key and run the ode on your computer.

many_states_corn <-
  getQuickstat(
    key = nass_api_key,
    program = "SURVEY",
    commodity = "CORN",
    geographic_level = "COUNTY",
    state = c("ILLINOIS", "COLORADO", "NEBRASKA", "IOWA", "KANSAS"),
    year = as.character(1995:2018),
    geometry = TRUE
  ) 

A query returns an error when there is no observation that satisfy your query criteria. For example, even though “CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE” does exists as a data_item, there is no entry for the statistic in Illinois in 2018. Therefore, the following query fails.

many_states_corn <-
  getQuickstat(
    key = key_get("usda_nass_qs_api"),
    program = "SURVEY",
    data_item = "CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE",
    geographic_level = "COUNTY",
    state = "ILLINOIS",
    year = "2018",
    geometry = TRUE
  ) 
  • As mentioned here, there is a limit to how much data you can download with one query.
  • When your target dataset exceeds the limit, you can break it up into pieces and download them repeatedly using loop.

There are two dimensions that seems easy to loop over: state and year. Here, let’s loop over year.

We first create a sequence of years to loop over one by one:

#--- for the sake of shorter run time, we use 2015:2018 here ---#
year_list <- as.character(2015:2018)


We now download data year by year using for loop:

lapply(
  year_list, # list of objects to loop over
  \(x) {
    getQuickstat(
      key = nass_api_key,
      program = "SURVEY",
      commodity = "CORN",
      geographic_level = "COUNTY",
      state = c("ILLINOIS", "COLORADO", "NEBRASKA", "IOWA", "KANSAS"),
      year = x, # use the year 
      geometry = TRUE
    )
  }
) %>%
# combine a list of sf into a single sf
dplyr::bind_rows() 

You are interested in getting soybean harvested acres data. Search for the data_item name for this variable from tidyUSDA::allDataItem


Answer

Code
tidyUSDA::allDataItem %>%
  #--- find data items that include SOY ---#
  grep(pattern = "SOY", ., value = TRUE) %>%
  grep(pattern = "HARVESTED", ., value = TRUE) 

Now, using the data_item name you got earlier, download the county-level data for Colorado and Kansas from 1990 through 1994 as an sf obejct.


Answer

Code
KS_CO_soy_hacres <-
  getQuickstat(
    key = nass_api_key,
    program = "SURVEY",
    data_item = "SOYBEANS - ACRES HARVESTED",
    geographic_level = "COUNTY",
    state = c("KANSAS", "COLORADO"),
    year = as.character(1990:1994),
    geometry = TRUE
  ) %>%
  dplyr::select(
    year, county_name, county_code, state_name,
    state_fips_code, short_desc, Value
  )

Create a map of soybean harvested acres faceted by year.


Answer

Code
ggplot() +
  geom_sf(data = KS_CO_soy_hacres, aes(fill = Value)) +
  facet_wrap(. ~ year) +
  theme_void()

PRISM

PRISM dataset provide model-based estimates of daily precipitation, maximum temperature, and minimum temperature for the U.S. at the 4km by 4km spatial resolution.

You can use get_prism_dailys() from the prism package to download PRISM data.


Syntax

prism::get_prism_dailys(
  type = variable type,
  minDate = starting date as character,
  maxDate = ending date as character,
  keepZip = TRUE or FALSE
) 
  • type: you can select from “ppt” (precipitation), “tmean” (mean temperature), “tmin” (minimum temperature), and “tmax” (maximum temperature).
  • minDate: starting date specified in format YYYY-MM-DD
  • maxDate: end date specified in format YYYY-MM-DD
  • keepZip: if FALSE, the zipped folders of the downloaded files will not be kept; otherwise, they will be kept.


Before you download PRISM data using the function, it is recommended that you set the path to folder in which the downloaded PRISM files will be stored using.

options(prism.path = "path")

First set the path:

library(prism)
options(prism.path = "Lectures/Data/PRISM")


Now, download:

prism::get_prism_dailys(
  type = "ppt",
  minDate = "2024-01-01",
  maxDate = "2024-01-05",
  keepZip = FALSE
)

This will create a single folder for each day of the specified date range. Inside of the folders, you will see bunch of files with the same name except extensions.

As you have seen, we would have many files to open unless the specified date range is very short. In such case, you should take advantage of a simple for loop.

First, the following code gives you the name of all the PRISM files with .bill extension.

(
prism_files_list <- 
  list.files("Data/PRISM", recursive = TRUE, full.names = TRUE) %>%
  .[grep("\\.bil", .)] %>%
  .[!grepl("aux", .)]
)
[1] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240101_bil/PRISM_ppt_provisional_4kmD2_20240101_bil.bil"
[2] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240102_bil/PRISM_ppt_provisional_4kmD2_20240102_bil.bil"
[3] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240103_bil/PRISM_ppt_provisional_4kmD2_20240103_bil.bil"
[4] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240104_bil/PRISM_ppt_provisional_4kmD2_20240104_bil.bil"
[5] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240105_bil/PRISM_ppt_provisional_4kmD2_20240105_bil.bil"
[6] "Data/PRISM/PRISM_tmax_stable_4kmD2_20120801_bil/PRISM_tmax_stable_4kmD2_20120801_bil.bil"        
[7] "Data/PRISM/tmax/PRISM_tmax_stable_4kmD2_20230601_bil/PRISM_tmax_stable_4kmD2_20230601_bil.bil"   
[8] "Data/PRISM/tmax/PRISM_tmax_stable_4kmD2_20230602_bil/PRISM_tmax_stable_4kmD2_20230602_bil.bil"   
[9] "Data/PRISM/tmax/PRISM_tmax_stable_4kmD2_20230603_bil/PRISM_tmax_stable_4kmD2_20230603_bil.bil"   

Just replace "Data/PRISM" with your folder path to the PRISM files.

We can now read them using terra::rast() like below:

terra::rast(prism_files_list) 
class       : SpatRaster 
dimensions  : 621, 1405, 9  (nrow, ncol, nlyr)
resolution  : 0.04166667, 0.04166667  (x, y)
extent      : -125.0208, -66.47917, 24.0625, 49.9375  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 
sources     : PRISM_ppt_provisional_4kmD2_20240101_bil.bil  
              PRISM_ppt_provisional_4kmD2_20240102_bil.bil  
              PRISM_ppt_provisional_4kmD2_20240103_bil.bil  
              ... and 6 more source(s)
names       : PRISM~1_bil, PRISM~2_bil, PRISM~3_bil, PRISM~4_bil, PRISM~5_bil, PRISM~1_bil, ... 
min values  :       0.000,      0.0000,      0.0000,      0.0000,      0.0000,       7.408, ... 
max values  :      42.917,     10.2093,     87.7497,     37.0813,     90.0984,      46.303, ... 

Download PRISM maximum temperature data from “06-01-2023” to “06-03-2023”.


Answer


Code
#--- set the path (you need to change the path) ---# 
options(prism.path = "Data/PRISM/tmax")

#--- download ---#
prism::get_prism_dailys(
  type = "tmax",
  minDate = "2023-06-01",
  maxDate = "2023-06-03",
  keepZip = FALSE
)

Read all the maximum temperature data files you just downloaded using terra::rast().


Answer

Code
prism_files_list <-
  list.files("Data/PRISM/tmax", recursive = TRUE, full.names = TRUE) %>%
  .[grep("\\.bil", .)] %>%
  .[!grepl("aux", .)]

prism_max_temp <- terra::rast(prism_files_list)

Using the SpatRaster object, create a faceted map of maximum temperature.


Answer

Code
ggplot() +
  geom_spatraster(data = prism_max_temp) +
  facet_wrap(~lyr)

Crop Data Layer

  • The Cropland Data Layer (CDL) is a data product produced by the National Agricultural Statistics Service of U.S. Department of Agriculture.

  • CDL provides geo-referenced, high accuracy, 30 (after 2007) or 56 (in 2006 and 2007) meter resolution, crop-specific cropland land cover information for up to 48 contiguous states in the U.S. from 1997 to the present.

  • This data product has been extensively used in agricultural research. CropScape is an interactive Web CDL exploring system, and it was developed to query, visualize, disseminate, and analyze CDL data geospatially through standard geospatial web services in a publicly accessible on-line environment (Han et al., 2012).

  • This section shows how to use the CropScapeR package (Chen 2020) to download and explore the CDL data.

  • The package implements some of the most useful geospatial processing services provided by the CropScape, and it allows users to efficiently process the CDL data within the R environment.

  • Specifically, the CropScapeR package provides four functions that implement different kinds of geospatial processing services provided by the CropScape.

  • GetCDLData() in particular is the most important function as it lets you download the raw CDL data.

  • The other functions provide the users with the CDL data summarized or transformed in particular manners that may suit the need of some users.

GetCDLData() allows us to obtain CDL data for any Area of Interest (AOI) in a given year. It requires three parameters to make a valid data request:

  • aoi: Area of Interest (AOI).
  • year: Year of the data to request.
  • type: Type of AOI.

The following AOI-type combinations are accepted:

  • any spatial object as an sf or sfc object - type = "b"
  • county (defined by a 5-digit county FIPS code) - type = "f"
  • state (defined by a 2-digit state FIPS code) - type = "f"
  • bounding box (defined by four corner points) - type = "b"
  • polygon area (defined by at least three coordinates) - type = "ps"
  • single point (defined by a coordinate) - type = "p"

Important

The downloaded raster data is RasterLayer defined by the raster package, not SpatRaster by the terra package.

  • Suppose you are interested in getting CDL data for the entire Nebraska.

  • In this case we can use the state FIP code for NE (31) for aoi and specify type to be "f" (Note that this would take some time if you run it.).

  • This can take a while. Since the spatial resolution is 30m, the CDL data covering the entire IL would have lots of cells and thus memory-intensive.

cdl_NE <-
  CropScapeR::GetCDLData(
    aoi = 31,
    year = "2018",
    type = "f"
  )

In this example, we are interested in obtaining CDL data for the following four counties in Illinois: Champaign, Vermilion, Ford, and Iroquois.

Let’s first get the county boundary data for them:

IL_county <- tigris::counties(state = "IL", cb = TRUE, progress_bar = FALSE)

IL_4_county <- dplyr::filter(IL_county, NAME %in% c("Champaign", "Vermilion", "Ford", "Iroquois"))


Here is where they are:

When you provide aoi using an sf object, CDL data for the bounding box of the sf will be downloaded. So, you should pick "b" as the type.

GetCDLData(
  aoi = IL_4_county,
  year = "2018",
  type = "b"
)

Using the tigris package, download the county boundary for Iowa, and then filter it to keep only the Sioux county. Name the sf file sioux_county.


Answer

Code
sioux_county <-
  tigris::counties(state = "IA", cb = TRUE) %>%
  dplyr::filter(NAME == "Sioux")

Using the CropScapeR::GetCDLData(), download the 2022 CDL data covering the Sioux County. Then, convert it to a SpatRaster object. Name the final product sioux_cdl_2022.


Answer

Code
sioux_cdl_2022 <-
  GetCDLData(
    aoi = sioux_county,
    year = "2022",
    type = "b"
  ) %>%
  terra::rast()

Mask the CDL data you just downloaded using sioux_county using terra::mask().


Answer

Code
sioux_masked <- terra::mask(sioux_cdl_2022, st_transform(sioux_county, crs(sioux_cdl_2022)))

Before creating a map from the downloaded CDL layer data, let’s aggregate the data by factor of 10 using terra::aggregate(). Call it sioux_aggregated.


Answer

Code
sioux_aggregated <- terra::aggregate(sioux_masked, fact = 10)

Create a map of land use type using the aggregate Sioux county CDL data using tidyterra::geom_spatraster().


Answer

Code
ggplot() +
  geom_spatraster(data = sioux_aggregated)