Learn how to download publicly available agriculture-related data from within R.
Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides
U.S. county and state boundary data are commonly used in many scientific studies.
The tigris
package is one of the packages that let you download them from within R.
It lets you download much more than just county and state boundaries. See other type of data here
You can use tigris::states()
to download the state boundary data as an sf
object.
By default, the most detailed boundary data is downloaded, which can be quite large
ggplot()
can take significantly more timeBy adding cb = TRUE
, you will get generalized (less detailed) boundary data, which is usually sufficient.
You can use tigris::counties()
to download the county boundary data as an sf
object.
You can specify states by the state
option.
USDA NASS Quick Stats provides wealth of agriculture-related datasets such as harvested acres or irrigated acres by crop at different spatial resolutions (e.g., state, county) from both survey and census.
We use the tidyUSDA
package to download dat from USDA NASS Quick Stat.
A nice thing about tidyUSDA
is that it gives you an option to download data as an sf
object, which means you can immediately visualize the data or spatially interact it with other spatial objects.
tidyUSDA
.You can download data using tidyUSDA::getQuickstat()
.
Syntax
key
: API keyprogram
: either “Survey” or “Census”data_item
: name of the variable to downloadgeographic_level
: set the level of geographical unit (“County”, “State”)state
: vector of statesyear
: vector of years in charactergeometry
: if TRUE, then the downloaded data will be sf
with geometry included. If false, a data.frame
without geometry is returned.Note
?getQuickstat
to see all the options.Sometimes, you know what you would like to download, but do not the name of the variable for it. In such a case, you can first get a list of all the data item names with this:
You can then narrow down the list using keywords. Suppose you are interested in getting irrigated grain corn yield measured in bu/acre.
You can now copy the first entry of the results and paste it for data_item
option.
The code below download county-level irrigated grain corn yield (bu/acre) in Illinois and Nebraska from 2000 through 2005.
(
IL_NE_ir_corn_yield <-
tidyUSDA::getQuickstat(
key = nass_api_key, # you need to replace it with your API key
program = "SURVEY",
data_item = "CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE",
geographic_level = "COUNTY",
state = c("ILLINOIS", "NEBRASKA"),
year = as.character(2000:2005),
geometry = TRUE
)
)
Simple feature collection with 546 features and 57 fields (with 6 geometries empty)
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -104.0535 ymin: 39.99993 xmax: -95.30829 ymax: 43.00171
Geodetic CRS: NAD83
First 10 features:
GEOID year ALAND unit_desc
1 31007 2005 1932231568 BU / ACRE
2 31007 2004 1932231568 BU / ACRE
3 31007 2003 1932231568 BU / ACRE
4 31007 2002 1932231568 BU / ACRE
5 31007 2001 1932231568 BU / ACRE
6 31007 2000 1932231568 BU / ACRE
7 31013 2005 2784618473 BU / ACRE
8 31013 2004 2784618473 BU / ACRE
9 31013 2003 2784618473 BU / ACRE
10 31013 2002 2784618473 BU / ACRE
short_desc Value region_desc
1 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 165
2 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 160
3 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 142
4 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 127
5 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 143
6 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 125
7 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 177
8 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 151
9 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 168
10 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 147
prodn_practice_desc county_code statisticcat_desc state_alpha asd_code
1 IRRIGATED 007 YIELD NE 10
2 IRRIGATED 007 YIELD NE 10
3 IRRIGATED 007 YIELD NE 10
4 IRRIGATED 007 YIELD NE 10
5 IRRIGATED 007 YIELD NE 10
6 IRRIGATED 007 YIELD NE 10
7 IRRIGATED 013 YIELD NE 10
8 IRRIGATED 013 YIELD NE 10
9 IRRIGATED 013 YIELD NE 10
10 IRRIGATED 013 YIELD NE 10
country_name zip_5 agg_level_desc state_ansi CV (%) county_name
1 UNITED STATES COUNTY 31 BANNER
2 UNITED STATES COUNTY 31 BANNER
3 UNITED STATES COUNTY 31 BANNER
4 UNITED STATES COUNTY 31 BANNER
5 UNITED STATES COUNTY 31 BANNER
6 UNITED STATES COUNTY 31 BANNER
7 UNITED STATES COUNTY 31 BOX BUTTE
8 UNITED STATES COUNTY 31 BOX BUTTE
9 UNITED STATES COUNTY 31 BOX BUTTE
10 UNITED STATES COUNTY 31 BOX BUTTE
watershed_code reference_period_desc domaincat_desc country_code week_ending
1 00000000 YEAR NOT SPECIFIED 9000
2 00000000 YEAR NOT SPECIFIED 9000
3 00000000 YEAR NOT SPECIFIED 9000
4 00000000 YEAR NOT SPECIFIED 9000
5 00000000 YEAR NOT SPECIFIED 9000
6 00000000 YEAR NOT SPECIFIED 9000
7 00000000 YEAR NOT SPECIFIED 9000
8 00000000 YEAR NOT SPECIFIED 9000
9 00000000 YEAR NOT SPECIFIED 9000
10 00000000 YEAR NOT SPECIFIED 9000
class_desc state_name state_fips_code watershed_desc congr_district_code
1 ALL CLASSES NEBRASKA 31
2 ALL CLASSES NEBRASKA 31
3 ALL CLASSES NEBRASKA 31
4 ALL CLASSES NEBRASKA 31
5 ALL CLASSES NEBRASKA 31
6 ALL CLASSES NEBRASKA 31
7 ALL CLASSES NEBRASKA 31
8 ALL CLASSES NEBRASKA 31
9 ALL CLASSES NEBRASKA 31
10 ALL CLASSES NEBRASKA 31
location_desc load_time end_code source_desc
1 NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000 00 SURVEY
2 NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000 00 SURVEY
3 NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000 00 SURVEY
4 NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000 00 SURVEY
5 NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000 00 SURVEY
6 NEBRASKA, NORTHWEST, BANNER 2012-01-01 00:00:00.000 00 SURVEY
7 NEBRASKA, NORTHWEST, BOX BUTTE 2012-01-01 00:00:00.000 00 SURVEY
8 NEBRASKA, NORTHWEST, BOX BUTTE 2012-01-01 00:00:00.000 00 SURVEY
9 NEBRASKA, NORTHWEST, BOX BUTTE 2012-01-01 00:00:00.000 00 SURVEY
10 NEBRASKA, NORTHWEST, BOX BUTTE 2012-01-01 00:00:00.000 00 SURVEY
county_ansi freq_desc sector_desc group_desc util_practice_desc begin_code
1 007 ANNUAL CROPS FIELD CROPS GRAIN 00
2 007 ANNUAL CROPS FIELD CROPS GRAIN 00
3 007 ANNUAL CROPS FIELD CROPS GRAIN 00
4 007 ANNUAL CROPS FIELD CROPS GRAIN 00
5 007 ANNUAL CROPS FIELD CROPS GRAIN 00
6 007 ANNUAL CROPS FIELD CROPS GRAIN 00
7 013 ANNUAL CROPS FIELD CROPS GRAIN 00
8 013 ANNUAL CROPS FIELD CROPS GRAIN 00
9 013 ANNUAL CROPS FIELD CROPS GRAIN 00
10 013 ANNUAL CROPS FIELD CROPS GRAIN 00
asd_desc domain_desc commodity_desc COUNTYKEY STATEFP COUNTYFP COUNTYNS
1 NORTHWEST TOTAL CORN 31007 31 007 00835826
2 NORTHWEST TOTAL CORN 31007 31 007 00835826
3 NORTHWEST TOTAL CORN 31007 31 007 00835826
4 NORTHWEST TOTAL CORN 31007 31 007 00835826
5 NORTHWEST TOTAL CORN 31007 31 007 00835826
6 NORTHWEST TOTAL CORN 31007 31 007 00835826
7 NORTHWEST TOTAL CORN 31013 31 013 00835991
8 NORTHWEST TOTAL CORN 31013 31 013 00835991
9 NORTHWEST TOTAL CORN 31013 31 013 00835991
10 NORTHWEST TOTAL CORN 31013 31 013 00835991
NAME NAMELSAD LSAD CLASSFP MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT
1 Banner Banner County 06 H1 G4020 <NA> <NA> <NA> A
2 Banner Banner County 06 H1 G4020 <NA> <NA> <NA> A
3 Banner Banner County 06 H1 G4020 <NA> <NA> <NA> A
4 Banner Banner County 06 H1 G4020 <NA> <NA> <NA> A
5 Banner Banner County 06 H1 G4020 <NA> <NA> <NA> A
6 Banner Banner County 06 H1 G4020 <NA> <NA> <NA> A
7 Box Butte Box Butte County 06 H1 G4020 <NA> <NA> <NA> A
8 Box Butte Box Butte County 06 H1 G4020 <NA> <NA> <NA> A
9 Box Butte Box Butte County 06 H1 G4020 <NA> <NA> <NA> A
10 Box Butte Box Butte County 06 H1 G4020 <NA> <NA> <NA> A
AWATER INTPTLAT INTPTLON geometry
1 392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
2 392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
3 392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
4 392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
5 392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
6 392073 +41.5397495 -103.7262626 MULTIPOLYGON (((-104.0529 4...
7 7150304 +42.2103804 -103.0817795 MULTIPOLYGON (((-103.4444 4...
8 7150304 +42.2103804 -103.0817795 MULTIPOLYGON (((-103.4444 4...
9 7150304 +42.2103804 -103.0817795 MULTIPOLYGON (((-103.4444 4...
10 7150304 +42.2103804 -103.0817795 MULTIPOLYGON (((-103.4444 4...
As you saw earlier, it has 58 columns, most of which are not necessary.
Here is the list of only variables you will probably need:
IL_NE_ir_corn_yield %>%
dplyr::select(
year, county_name, county_code, state_name,
state_fips_code, short_desc, Value
)
Simple feature collection with 546 features and 7 fields (with 6 geometries empty)
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -104.0535 ymin: 39.99993 xmax: -95.30829 ymax: 43.00171
Geodetic CRS: NAD83
First 10 features:
year county_name county_code state_name state_fips_code
1 2005 BANNER 007 NEBRASKA 31
2 2004 BANNER 007 NEBRASKA 31
3 2003 BANNER 007 NEBRASKA 31
4 2002 BANNER 007 NEBRASKA 31
5 2001 BANNER 007 NEBRASKA 31
6 2000 BANNER 007 NEBRASKA 31
7 2005 BOX BUTTE 013 NEBRASKA 31
8 2004 BOX BUTTE 013 NEBRASKA 31
9 2003 BOX BUTTE 013 NEBRASKA 31
10 2002 BOX BUTTE 013 NEBRASKA 31
short_desc Value
1 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 165
2 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 160
3 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 142
4 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 127
5 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 143
6 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 125
7 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 177
8 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 151
9 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 168
10 CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE 147
geometry
1 MULTIPOLYGON (((-104.0529 4...
2 MULTIPOLYGON (((-104.0529 4...
3 MULTIPOLYGON (((-104.0529 4...
4 MULTIPOLYGON (((-104.0529 4...
5 MULTIPOLYGON (((-104.0529 4...
6 MULTIPOLYGON (((-104.0529 4...
7 MULTIPOLYGON (((-103.4444 4...
8 MULTIPOLYGON (((-103.4444 4...
9 MULTIPOLYGON (((-103.4444 4...
10 MULTIPOLYGON (((-103.4444 4...
Note
Value
column.You cannot retrieve more than 50,000 (the limit is set by QuickStat) rows of data. The query below requests much more than 50,000 observations, and fail. In this case, you need to narrow the search and chop the task into smaller tasks.
Replace nass_api_key
with your own API key and run the ode on your computer.
A query returns an error when there is no observation that satisfy your query criteria. For example, even though “CORN, GRAIN, IRRIGATED - YIELD, MEASURED IN BU / ACRE” does exists as a data_item
, there is no entry for the statistic in Illinois in 2018. Therefore, the following query fails.
There are two dimensions that seems easy to loop over: state
and year
. Here, let’s loop over year.
We first create a sequence of years to loop over one by one:
We now download data year by year using for loop:
lapply(
year_list, # list of objects to loop over
\(x) {
getQuickstat(
key = nass_api_key,
program = "SURVEY",
commodity = "CORN",
geographic_level = "COUNTY",
state = c("ILLINOIS", "COLORADO", "NEBRASKA", "IOWA", "KANSAS"),
year = x, # use the year
geometry = TRUE
)
}
) %>%
# combine a list of sf into a single sf
dplyr::bind_rows()
You are interested in getting soybean harvested acres data. Search for the data_item
name for this variable from tidyUSDA::allDataItem
Answer
Now, using the data_item
name you got earlier, download the county-level data for Colorado and Kansas from 1990 through 1994 as an sf
obejct.
Answer
KS_CO_soy_hacres <-
getQuickstat(
key = nass_api_key,
program = "SURVEY",
data_item = "SOYBEANS - ACRES HARVESTED",
geographic_level = "COUNTY",
state = c("KANSAS", "COLORADO"),
year = as.character(1990:1994),
geometry = TRUE
) %>%
dplyr::select(
year, county_name, county_code, state_name,
state_fips_code, short_desc, Value
)
PRISM dataset provide model-based estimates of daily precipitation, maximum temperature, and minimum temperature for the U.S. at the 4km by 4km spatial resolution.
You can use get_prism_dailys()
from the prism
package to download PRISM data.
Syntax
type
: you can select from “ppt” (precipitation), “tmean” (mean temperature), “tmin” (minimum temperature), and “tmax” (maximum temperature).minDate
: starting date specified in format YYYY-MM-DDmaxDate
: end date specified in format YYYY-MM-DDkeepZip
: if FALSE
, the zipped folders of the downloaded files will not be kept; otherwise, they will be kept.Before you download PRISM data using the function, it is recommended that you set the path to folder in which the downloaded PRISM files will be stored using.
First set the path:
Now, download:
This will create a single folder for each day of the specified date range. Inside of the folders, you will see bunch of files with the same name except extensions.
As you have seen, we would have many files to open unless the specified date range is very short. In such case, you should take advantage of a simple for loop.
First, the following code gives you the name of all the PRISM files with .bill extension.
(
prism_files_list <-
list.files("Data/PRISM", recursive = TRUE, full.names = TRUE) %>%
.[grep("\\.bil", .)] %>%
.[!grepl("aux", .)]
)
[1] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240101_bil/PRISM_ppt_provisional_4kmD2_20240101_bil.bil"
[2] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240102_bil/PRISM_ppt_provisional_4kmD2_20240102_bil.bil"
[3] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240103_bil/PRISM_ppt_provisional_4kmD2_20240103_bil.bil"
[4] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240104_bil/PRISM_ppt_provisional_4kmD2_20240104_bil.bil"
[5] "Data/PRISM/PRISM_ppt_provisional_4kmD2_20240105_bil/PRISM_ppt_provisional_4kmD2_20240105_bil.bil"
[6] "Data/PRISM/PRISM_tmax_stable_4kmD2_20120801_bil/PRISM_tmax_stable_4kmD2_20120801_bil.bil"
[7] "Data/PRISM/tmax/PRISM_tmax_stable_4kmD2_20230601_bil/PRISM_tmax_stable_4kmD2_20230601_bil.bil"
[8] "Data/PRISM/tmax/PRISM_tmax_stable_4kmD2_20230602_bil/PRISM_tmax_stable_4kmD2_20230602_bil.bil"
[9] "Data/PRISM/tmax/PRISM_tmax_stable_4kmD2_20230603_bil/PRISM_tmax_stable_4kmD2_20230603_bil.bil"
Just replace "Data/PRISM"
with your folder path to the PRISM files.
We can now read them using terra::rast()
like below:
class : SpatRaster
dimensions : 621, 1405, 9 (nrow, ncol, nlyr)
resolution : 0.04166667, 0.04166667 (x, y)
extent : -125.0208, -66.47917, 24.0625, 49.9375 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83
sources : PRISM_ppt_provisional_4kmD2_20240101_bil.bil
PRISM_ppt_provisional_4kmD2_20240102_bil.bil
PRISM_ppt_provisional_4kmD2_20240103_bil.bil
... and 6 more source(s)
names : PRISM~1_bil, PRISM~2_bil, PRISM~3_bil, PRISM~4_bil, PRISM~5_bil, PRISM~1_bil, ...
min values : 0.000, 0.0000, 0.0000, 0.0000, 0.0000, 7.408, ...
max values : 42.917, 10.2093, 87.7497, 37.0813, 90.0984, 46.303, ...
Download PRISM maximum temperature data from “06-01-2023” to “06-03-2023”.
Answer
Read all the maximum temperature data files you just downloaded using terra::rast()
.
Answer
The Cropland Data Layer (CDL) is a data product produced by the National Agricultural Statistics Service of U.S. Department of Agriculture.
CDL provides geo-referenced, high accuracy, 30 (after 2007) or 56 (in 2006 and 2007) meter resolution, crop-specific cropland land cover information for up to 48 contiguous states in the U.S. from 1997 to the present.
This data product has been extensively used in agricultural research. CropScape is an interactive Web CDL exploring system, and it was developed to query, visualize, disseminate, and analyze CDL data geospatially through standard geospatial web services in a publicly accessible on-line environment (Han et al., 2012).
This section shows how to use the CropScapeR package (Chen 2020) to download and explore the CDL data.
The package implements some of the most useful geospatial processing services provided by the CropScape, and it allows users to efficiently process the CDL data within the R environment.
Specifically, the CropScapeR package provides four functions that implement different kinds of geospatial processing services provided by the CropScape.
GetCDLData()
in particular is the most important function as it lets you download the raw CDL data.
The other functions provide the users with the CDL data summarized or transformed in particular manners that may suit the need of some users.
GetCDLData()
allows us to obtain CDL data for any Area of Interest (AOI) in a given year. It requires three parameters to make a valid data request:
aoi
: Area of Interest (AOI).year
: Year of the data to request.type
: Type of AOI.The following AOI-type combinations are accepted:
type = "b"
type = "f"
type = "f"
type = "b"
type = "ps"
type = "p"
Important
The downloaded raster data is RasterLayer
defined by the raster
package, not SpatRaster
by the terra
package.
Suppose you are interested in getting CDL data for the entire Nebraska.
In this case we can use the state FIP code for NE (31) for aoi
and specify type
to be "f"
(Note that this would take some time if you run it.).
This can take a while. Since the spatial resolution is 30m, the CDL data covering the entire IL would have lots of cells and thus memory-intensive.
In this example, we are interested in obtaining CDL data for the following four counties in Illinois: Champaign, Vermilion, Ford, and Iroquois.
Let’s first get the county boundary data for them:
Here is where they are:
Using the tigris
package, download the county boundary for Iowa, and then filter it to keep only the Sioux county. Name the sf
file sioux_county
.
Answer
Using the CropScapeR::GetCDLData()
, download the 2022 CDL data covering the Sioux County. Then, convert it to a SpatRaster
object. Name the final product sioux_cdl_2022
.
Answer
Mask the CDL data you just downloaded using sioux_county
using terra::mask()
.
Answer
Before creating a map from the downloaded CDL layer data, let’s aggregate the data by factor of 10 using terra::aggregate()
. Call it sioux_aggregated
.
Answer