Preface

Why R as GIS for Economists?

R has extensive capabilities as GIS software. In my opinion, \(99\%\) of your spatial data processing needs as an economist will be satisfied by R. But, there are several popular options for GIS tasks other than R:

  • Python
  • ArcGIS
  • QGIS

Here, I compare them briefly and discuss why R is a good option.

R vs Python

Both R and Python are actually heavily dependent on open source software GDAL and GEOS for their core GIS operations (GDAL for reading spatial data, and GEOS for geometrical operations like intersecting two spatial layers).1 So, when you run GIS tasks on R or Python you basically tell R or Python what you want to do and they talk to the software, let it do the job, and return the results to you. This means that R and Python are not much different in their capability at GIS tasks as they are dependent on the common open source software for many GIS tasks. When GDAL and GEOS get better, R and Python get better (with a short lag all thanks to those who make updates available on R). Both of them have good spatial visualization tools as well. Moreover, both R and Python can communicate with QGIS and ArcGIS (as long as you have them installed of course) and use their functionalities from within R and Python via the bridging packages: RQGIS and PyQGIS for QGIS, and R-ArcGIS and ArcPy.2 So, if you are more familiar with Python than R, go ahead and go with Python. From now on, my discussions assume that you are going for the R option, as otherwise, you would not be reading the rest of the book anyway.

R vs ArcGIS or QGIS

ArcGIS is commercial software and it is quite expensive (you are likely to be able to get a significant discount if you are a student at or work for a University). On the other hand, QGIS is open source and free. It has seen significant developments over the decade, and I would say it is just as competitive as ArcGIS. QGIS also uses open source geospatial software GDAL, GEOS, and others (SAGA, GRASS GIS). Both of them have a graphical interface that aids you implement various GIS tasks unlike R which requires programming.

Now, since R can use ArcGIS and QGIS through the bridging packages, a more precise question we should be asking is whether you should program GIS tasks using R (possibly using the bridging packages) or manually implement GIS tasks using the graphical interface of ArcGIS or QGIS. The answer is programming GIS tasks using R. First, manual GIS operations are hard to repeat. It is often the case that in the course of a project you need to redo the same GIS task except that the underlying datasets have changed. If you have programmed the process with R, you just run the same code and that’s it. You get the desired results. If you did not program it, you need to go through many clicks on the graphical interface all over again, potentially trying to remember how you actually did it the last time.3 Second and more important, manual operations are not scalable. It has become much more common that we need to process many large spatial datasets. Imagine you are doing the same operations on \(1,000\) files using a graphical interface, or even \(50\) files. Do you know what is good at doing the same tasks over and over again without complaining? A computer. Just let it do what it likes to do. You have better things do.

Finally, should you learn ArcGIS or QGIS in addition to (or before) R? I am doubtful. As economists, the GIS tasks we need to do are not super convoluted most of the time. Suppose \(\Omega_R\) and \(\Omega_{AQ}\) represent the set of GIS tasks R and \(ArcGIS/QGIS\) can implement, respectively. Further, let \(\Omega_E\) represent the set of skills economists need to implement. Then, \(\Omega_E \in \Omega_R\) \(99\%\) (or maybe \(95\%\) to be safe) of the time and \(\Omega_E \not\subset \Omega_{AQ}\setminus\Omega_R\) \(99\%\) of the time. Personally, I have never had to rely on either ArcGIS or QGIS for my research projects after I learned how to use R as GIS.

One of the things ArcGIS and QGIS can do but R cannot do (\(\Omega_{AQ}\setminus\Omega_R\)) is create spatial objects by hand using a graphical user interface, like drawing polygons and lines. Another thing that R lags behind ArcGIS and QGIS is 3D data visualization. But, I must say neither of them is essential for economists at the moment. Finally, sometime it is easier and faster to make a map using ArcGIS and QGIS especially for a complicated map.4

Using R as GIS, however, comes with a learning curve for those who have never used R because basic knowledge of R and general programming knowledge is required. On the other hand, the GUI-based use of ArcGIS and QGIS has a very low start-up cost. For those who have used R for other purposes like data wrangling and regression analysis, you have already (or almost) climbed up the hill and are ready to learn how to use R as GIS.

Summary

  • You have never used any GIS software, but are very comfortable with R?

Learn how to use R as GIS first. If you find out you really cannot complete the GIS tasks you would like to do using R, then turn to other options.

  • You have never used any GIS software and R?

This is tough. If you expect significant amount of GIS work, learning R basics and how to use R as GIS is a good investment of your time.

  • You have used ArcGIS or QGIS and do not like them because they crash often?

Why don’t you try R?5 You may realize you actually do not need them.

  • You have used ArcGIS or QGIS before and are very comfortable with them, but you need to program repetitive GIS tasks?

Learn R and maybe take advantage of R-ArcGIS or RQGIS, which this book does not cover.

  • You know for sure that you need to run only a simple GIS task once and never have to do any GIS tasks ever again?

Stop reading and ask one of your friends to do the job. Pay him/her \(\$20\) per hour, which is way below the opportunity cost of setting up either ArcGIS or QGIS and learning to do that simple task.

How is this book different from other online books and resources?

We are seeing an explosion of online (and free) resources that teach how to use R for spatial data processing.6 Here is an incomplete list of such resources:

Thanks to all these resources, it has become much easier to self-teach R for GIS work than 10 years ago when I first started using R for GIS. Even though I have not read through all these resources carefully, I am pretty sure every topic found in this book can also be found somewhere in these resources (except the demonstrations). So, you may wonder why on earth you can benefit from reading this book. It all boils down to search costs. Researchers in different disciplines require different sets of spatial data skills. The available resources are typically very general covering so many topics, some of which economists are unlikely to use. It is particularly hard for those who do not have much experience in GIS to identify whether particular skills are essential or not. So, they could spend so much time learning something that is not really useful. The value of this book lies in its deliberate incomprehensiveness. It only packages materials that satisfy the need of most economists, cutting out many topics that are likely to be of limited use for economists.

For those who are looking for more comprehensive treatments of spatial data handling and processing in one book, I personally like Geocomputation with R a lot. Increasingly, the developer of R packages created a website dedicated to their R packages, where you can often find vignettes (tutorials), like Simple Features for R.

Topics covered in this book

The book starts with the very basics of spatial data handling (e.g., importing and exporting spatial datasets) and moves on to more practical spatial data operations (e.g., spatial data join) that are useful for research projects. Some parts of this books are still under development. Right now, Chapters 1 through 8, parts of Chapter 9, and Appendix A are available.

  • Chapter 1: Demonstrations of R as GIS
    • groundwater pumping and groundwater level
    • precision agriculture
    • land use and weather
    • corn planted acreage and railroads
    • groundwater pumping and weather
    • slave trade and economic development in Africa
    • terrain ruggedness and economic development in Africa
    • TseTse fly and economic developtment in Africa
  • Chapter 2: The basics of vector data handling using sf package
    • spatial data structure in sf
    • import and export vector data
    • (re)projection of spatial datasets
    • single-layer geometrical operations (e.g., create buffers, find centroids)
    • other miscellaneous basic operations
  • Chapter 3: Spatial interactions of vector datasets
    • understand topological relations of multiple sf objects
    • spatially subset a layer based on another layer
    • extracting values from one layer to another layer
  • Chapter 4: The basics of raster data handling using the raster and terra packages
    • understand object classes by the terra and raster packages
    • import and export raster data
    • stack raster data
    • quick plotting
    • handle netCDF files
  • Chapter 5: Spatial interactions of vector and raster datasets
    • cropping a raster layer to the geographic extent of a vector layer
    • extracting values from a raster layer to a vector layer
  • Chapter 6: Speed things up
    • make raster data extraction faster by parallelization
  • Chapter 7: Spatiotemporal raster data handling with the stars package
  • Chapter 8: Creating Maps using the ggplot2 package
    • use the ggplot2 packages to create maps
  • Chapter 9: Download and process publicly available spatial datasets (partially available)
    • USDA NASS QuickStat (tidyUSDA) - available
    • PRISM (prism) - available
    • Daymet (daymetr) - available
    • gridMET - available
    • Cropland Data Layer (CropScapeR) - available
    • USGS (dataRetrieval) - under construction
    • Sentinel 2 (sen2r) - under construction
    • Census (tidycensus) - under construction
  • Appendix A: Loop and parallel computation
  • Appendix B: Cheatsheet - under construction

As you can see above, this book does not spend any time on the very basics of GIS concepts. Before you start reading the book, you should know the followings at least (it’s not much):

  • What Geographic Coordinate System (GCS), Coordinate Reference System (CRS), and projection are (this is a good resource)
  • Distinctions between vector and raster data (this is a simple summary of the difference)

This book is about spatial data processing and does not provide detailed explanations on non-spatial R operations, assuming some basic knowledge of R. In particular, the dplyr and data.table packages are extensively used for data wrangling. For data wrangling using tidyverse (a collection of packages including dplyr), see R for Data Science. For data.table, this is a good resource.

Finally, this book does not cover spatial statistics or spatial econometrics at all. This book is about spatial data processing. Spatial analysis is something you do after you have processed spatial data.

Conventions of the book and some notes

Here are some notes of the conventions of this book and notes for R beginners and those who are not used to reading rmarkdown-generated html documents.

Texts in gray boxes

They are one of the following:

  • objects defined on R during demonstrations
  • R functions
  • R packages

When it is a function, I always put parentheses at the end like this: st_read(). Sometimes, I combine a package and function in one like this: sf::st_read(). This means it is a function called st_read() from the sf package.

Colored Boxes

Codes are in blue boxes, and outcomes are in red boxes.

Codes:

Outcomes:

## [1] 0.7999254 0.5457961 0.1648999 0.8990746 0.2197834

Parentheses around codes

Sometimes you will see codes enclosed by parenthesis like this:

(
  a <- runif(5)
)
## [1] 0.83101244 0.44637269 0.34451835 0.06503508 0.31936829

The parentheses prints what’s inside of a newly created object (here a) without explicitly evaluating the object. So, basically I am signaling that we will be looking inside of the object that was just created.

This one prints nothing.

a <- runif(5)

Footnotes

Footnotes appear at the bottom of the page. You can easily get to a footnote by clicking on the footnote number. You can also go back to the main narrative where the footnote number is by clicking on the curved arrow at the end of the footnote. So, don’t worry about having to scroll all the way up to where you were after reading footnotes.

Session Information

Here is the session information when compiling the book:

## R version 4.2.1 Patched (2022-06-23 r82516)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] rstudioapi_0.15.0 magrittr_2.0.3    knitr_1.39        xml2_1.3.3       
##  [5] downlit_0.4.2     R6_2.5.1          rlang_1.1.1       fastmap_1.1.0    
##  [9] stringr_1.5.0     tools_4.2.1       xfun_0.31         cli_3.6.0        
## [13] jquerylib_0.1.4   withr_2.5.0       htmltools_0.5.5   yaml_2.3.5       
## [17] digest_0.6.29     lifecycle_1.0.3   bookdown_0.27     vctrs_0.5.2      
## [21] sass_0.4.1        fs_1.5.2          memoise_2.0.1     glue_1.6.2       
## [25] cachem_1.0.6      evaluate_0.15     rmarkdown_2.16    stringi_1.7.12   
## [29] compiler_4.2.1    bslib_0.3.1       jsonlite_1.8.7