01-1: Introduction to R

Learning Objectives

  • become familiar with programming
  • become capable of using R software to conduct research independently
    • manipulate data
    • visualize data
    • report results
    • spatial data management

Table of contents

  1. Introduction to R and RStudio
  2. Various object types
  3. Functions and packages
  4. Some fundamentals on vector, matrix, list, and data.frame

Introduction to R and RStudio


R

  • a very popular statistical programming language used in academia and industry
  • started out as software to do statistics, designed by a number of statisticians
  • is open-source and free
  • has been and is evolving rapidly by the contributions of its users
  • state-of-the-art statistical methods (e.g., machine learning algorithms) written by the developers of the methods
  • geographic information system (GIS)
  • big data handling and analysis

RStudio

  • R has a terrible graphic user interface
  • RStudio is by far the most popular graphic user interface of R

R User Interface

Install R and RStudio

Introduction to RStudio

Four panes

  • R script (upper left)
  • Console (lower left)
  • Environment (upper right)
  • Files, plots, packages, and help (lower right)

Small tips

  • Appearance
  • Pane Layout

Getting started with R and RStudio


Objectives

Learn how to

  • do basic mathematical operations
  • define objects in R
  • learn different object types
  • use RStudio at the same time

Basic element types (atomic mode)

  • integer: e.g., 1, 3,
  • numeric (double): e.g., 1, 1.3
  • complex:
  • logical (boolean): true or false
  • character: combination of letters (numerical operations not allowed)

Basic arithmetic: R as a calculator


RStudio Tip

You can run the selected codes by hitting

  • Mac: command + enter
  • Windows: Control + enter

logical values and operators

Character

Contents enclosed by double (or single) quotation marks will be recognized as characters.


You cannot do addition using characters


We will learn string manipulations later using the stringr package.

Assigning contents to an object

  • You can assign contents (numeric numbers, character, boolean, etc) to an object on R and reuse it later using either <- or =.
object_name <- contents
object_name = contents


  • It does not really matter which of <- or = to use. You should pick whichever makes sense for you (though it is often recommended to use <-). But, it is a good idea to be consistent.


Notice that these objects are now in the list of objects on the environment tab of RStudio.

Note

You can insert the assignment operator (<-) by hitting

  • Mac: Option + -
  • Windows Alt + -

Once objects are created, you can evaluate them on the console to see what is inside:


Note

I often ask you to evaluate an R object. That just means looking inside the R object to see what is inside.

Several things to remember about assignment:

  • If you assign contents to an object of the same name, the object that had the same name will be overwritten


  • Object names cannot start with a numeric number. Try the following:


  • You cannot use a reserved word as the name of an object (complete list found here)

Various object types


Objects

  • R is an object-oriented programming (OOP), which basically means:

“Everything is an object and everything has a name.”

  • R has many different object types (classes)

    • vector
    • matrix
    • data.frame
    • list
    • function

Definition

A vectors is a class of object that consists of elements of the same kind (it can have only one type of elements). You use c() to create a vector.

Example


Different modes?

What if we mix elements of different mode


All the numeric values are converted to characters.

Definition

A list is a class of object that consists of elements of mixed types.

Example

  • A list is very flexible. It can hold basically any type of R objects as its elements.


  • We will see more complex examples later.

Definition

A matrix is a class of object that consists of elements of the same kind (it can have only one element) stored in a two-dimensional array.

Examples


data.frame is like a matrix (or a list of columns)



There are different kinds of objects that are like “data.frame”

  • tibble
  • data.table

We will talk about some of them later.

It is critical to recognize the class of the objects:

  • the same function does different things depending on the class of the object to which the function is applied
  • some functions work on some object classes, but not on others

Many of the errors you will encounter while working on R has something to do with applying functions that are not applicable to the objects you are working on!

Use class, typeof, and str commands to know more about what kind of objects you are dealing with:

You could also use View() function for visual inspection:

View(yield_data)

Function and package


Function

A function takes R objects (vector, data.frame, etc), processes them, and returns R objects


Example:

min() takes a vector of values as an argument and returns the minimum of all the values in the vector

  • Functions (both base and user-written) are what makes R compelling to use as major statistical and programming software!

  • Indeed, this course is pretty much all about learning useful functions that make your life easier

  • We will learn lots of functions that are made available through user-written packages

  • create a sequence of values


  • repeat values


  • sum values


  • find the length of an vector
  1. generate a vector (call it \(x\)) that starts from 1 and increase by 2 until 99


  1. calculate the sample mean of \(x\)

\(\frac{1}{n}\sum_{i=1}^n x_i\)


  1. calculate the sample variance of \(x\)

\(\frac{1}{n}\sum_{i=1}^n (x_i-\bar{x})^2\), where \(\bar{x}\) is the sample mean

Package

A drawer in your work space (R environment) that has specialized tools (functions) to complete tasks.


Example packages:

  • dplyr (data wrangling)
  • data.table (data wrangling)
  • ggplot2 (data visualization)
  • sf (spatial vector data handling)
  • raster (spatial raster data handling)
  • stars (spatiotemporal data handling)
  • Before you use tools (functions) in the drawer (package), you need to buy (install) it first. You can install a package using the following syntax:
install.packages("package name")


  • For example,
install.packages("ggplot2")


  • You need to bring the drawer (package) to your working space (R environment) by using the library() function:
library(ggplot2)


  • Now, you can start using specialized tools (functions) in the drawer (package)!!

Working with R (or any computer programs)

  • You are the architect who has the blueprint of the final product, but does not have an ability to build specific pieces by yourself

  • You work with one worker (R) who can build specific pieces perfectly without any error if given right tools and instructions

  • This worker is weird. If you do not give right tools or your instruction is wrong, he/she will speak up and tell you there has been an error. He/she will not try to figure out how things could have been done differently by himself/herself.

  • Your job is to provide the right tools and instructions to the worker (R), and correct your instructions when you found out you made a mistake (debugging)

A bit more on vector, matrix, list, and data.frame


Vector

Let’s define two vectors to work with

Note

Vector arithmetic operations happen element by element!

To access element(s) of a vector, you use [] like below:


You can access multiple elements of a vector

Matrix

To access element(s) of a matrix, you use [] just like we did for a vector. But, now you have two arguments inside [].

matrix[indices for rows, indices for columns]  


Examples

List

To access element(s) of a matrix, you can use either [[]] operator for accessing a single element and use [] for multiple elements.

Example: single element


Example: multiple elements

You can also use $ operator to access a single element of a list as long as the element has a name.

data.frame

data.frame (and its relatives)

  • is the most common object type we use.
  • is a special kind of list of vectors that are of the same length that makes a matrix-like structure
  • shares properties of both the matrix and the list

Accessing parts of a data.frame works like accessing elements of a matrix or list.

Examples


We will spend lots of time on how to do data wrangling on data.frames using the tidyverse package!

Next class: Quarto