06-1: How to write R codes, manage Projects, work with RStudio

Tips to make the most of the lecture notes

Interactive navigation tools
Running and writing codes

Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides

The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
Hit the “Run Code” button to execute all the code inside the code area.
You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

Learning objectives

Learn how to organize your project: folder, codes, data files, etc
Learn how to organize R codes
Learn how to use various RStudio tips for efficient programming

Reproducibility

What is it?
How
High v.s. low quality
Why high-quality reproducible research
Check list

First of all, you may have heard of “reproducibility” and “replicability.” While they sometimes are used interchangeably, they mean different things. Here are commonly used definitions of the two terms (Cacioppo et al. 2015).

Definition: Reproducibility

A research study is reproducible if anybody (including the author of the study) can generate exactly the same results by using the same materials (e.g., data) and procedures used in the study.

Definition: Replicability

A research study is replicable if other teams reach the same conclusion by applying the same procedure to the different materials (e.g., data).

This lecture focuses only on reproducibility and do not deal with replicability.

Minimum Requirement

Every single action taken during the entire research process is documented in a way that anybody can follow to implement the same actions (no hidden actions) to produce exactly the same results.

Note that this does not necessarily mean every single action needs to be computer-programmed and automated. Even if you manually delete rows of data on Excel (highly discouraged), this does not make your research non-reproducible as long as this action is recorded and the original data (before deletion of the rows) are provided because anybody can implement this action.

Your project is reproducible if the minimum requirement is satisfied, but is of low-quality if it is too costly/time-consuming to reproduce. A high-quality reproducible project exhibits the following characteristics:

Organized Project Folder:

It maintains a well-structured and organized project folder, making it easy to locate files your are looking for.

Streamlined Automation:

Workflows are automated with well-annotated computer programs, simplifying the replication process and providing clarity in the workflow.

Comprehensive Documentation:

Robust documentation, encompassing data and reproduction guidance, ensures transparency, saving time on data interpretation and replication instructions.

The main beneficiaries of reproducible research include:

You (Y)
Members of your team (M)
The scientific community (S)

Here are the benefits of high-quality reproducible research with their beneficiaries:

Scientific Integrity and Error Prevention (S)
Educational Value (MS)
Repeatability (YM)
Transferability (YM)
Reducing Errors (YM)

The project has organized folder system and all the files (code, manuscript, journal articles in pdf) are placed where they should be
Data is clearly documented
All the actions (data processing, analysis, figure and table creation) are computer-programmed without any manual procedures (e.g., deleting rows of a CSV file on Microsoft Excel)
The computer programs are well annotated and organized
An instruction to reproduce (what computer programs to run in what order) is provided
There are no unnecessary files in the project folder

How to organize your project

Motivation
RStudio project
Folder organization

You should have a single dedicated folder for a research project. This will avoid

confusions between objects of the same or similar name (accidentally using the one you do not intend to use)
wasting memory by holding objects on the global environment that are completely irrelevant to your working project

We can initiate an RStudio project with a dedicated folder from within RStudio.

Step 1
Step 2
Step 3

At the top right corner of RStudio, navigate through:

Project (None)

-> New Project…

-> New Directory

-> New Project

type in a directory name
select the directory in which the project folder (directory) is going to be created
hit the create folder button

You will be automatically taken to a new R session inside the newly-created RStudio project.

In this folder you just created, you have a single file named .Rproj (here, it is test.Rproj). It holds information about this project, but you do not have to touch it.

Here is a recommended folder organization. You should modify/add folders as you see fit.

Code: all the codes go in here
- DataPrep:
- Analysis:
Data
- Raw: place the raw datasets here
- Processed: save the intermediate datasets here
Literature: journal articles and other relevant documents
Results: results (regression, figures, tables)
Writing: qmd, WORD, Latex files

Recommendation

Use a qmd file instead of an R file whenever you write codes

It is much easier to make comments in a qmd file than an R file
You can better organize your codes with markdown section headers (e.g., #, ##)
R crashed at a certain chunk and had to restart R and then run all the R codes up to the problematic chunk? Use Run All Chunks Above button (click on the triangle right to the Run button and select the option, or hit option + command + P).
Easily move between sections and subsection using the navigator at the bottom lower corner of the source pane

Recommended Stucture

Objective statement
- state objectives
- input: state input files and datasets
- output: state output files and datasets
Setup
- set the working directory (if necessary)
- load packages
- load functions
Actions
- Action 1
- Action 2
- .
- .
- .
- Action n

Note

Dynamically edit the “Objective statement” as its objectives, input, and output are likely to change.

R code readability
styler package

You can write R codes however you would like. But, your code may get more readable to you and others who might read your codes by following a style guideline that is accepted by many R users. There are several popular styles of formatting R codes:

Examples

Here are some examples of the tidyverse style:

The styler package can help you follow partially the tidyverse coding style.

Once the package is installed, you can highlight the lines of codes and hit cmd + shift + A for Mac (ctrl + shift + A for Windows) to reformat the codes to conform with the tidyverse style.

Alternatively, you can click on Addins in the middle of the menu at the top, and select style selection.

Rules 1

place all the raw datasets (nothing else) in a designated folder inside the Data/Raw folder
do not ever override them, you only read them and keep them intact

Rules 2

write R codes to process (transform, merge, etc) the raw data and save all the R codes inside the Code/DataPrep folder
save intermediate R objects or datasets in the Data/Processed folder
do not mix codes and datasets in a single folder

Rules 3

write R codes to do analysis and save them in Code/Analysis
save the results/outputs (regression tables, figures, tables) in the Results folder

Rules 4

if you are using qmd to write a journal article or report, put them in the Writing folder (same goes for WORD)
refer to figures and tables in the Results folder to integrate them in the output document

Recommendation

Name files so that you know what purposes they serve for you later
Place numbers as prefix to indicate the order in which they should be run

Example

Data Collection and Preparation (in Code/DataPrep)
- 01-1-download-weather-data.R
- 01-2-download-political-boundary-data.R
- 01-3-summarize-data.R
- 01-4-merge-data.R
Data Analysis and Results Preparation (in Code/Analysis)
- 02-1-regression-analysis.R
- 02-2-gen-figures-tables.R

Example project

Let’s take a look at an example project that is designed to be reproducible. First, Clone this repository. We will then look at how the project is organized and developed.

RStudio tips

Code snippets

What is it?
Examples
How to add snippets
Context-specificity
Variables
More examples

Code snippets are functions that maps sequence of letters and symbols (short) to other sequence of letters and symbols (more complicated and long)

Syntax

snippet (combination of letters to invoke)
  (what you want to print)

Important

You need a tab before (what you want to print)

piping operator
assignment operator

snippet pi
  %>%

Once you add this, you can type “pi” and hit tab (and hit enter if there are other competing shortcuts) to have %>% printed.

snippet as
  <-

Once you add this, you can type “as” and hit tab (and hit enter if there are other competing shortcuts) to have <- printed.

Follow Tools $\rightarrow$ Global Options $\rightarrow$ Code $\rightarrow$ Edit Snippets , select R tab, and add snippets.

Try yourself

Place the following

snippet as
  <-

type “as” and hit tab inside an R code chunk

What is it?
Example

Suppose you are working on an Quarto document.

You are in the R context if your cursor is in an R code chunk
You are in the Markdown context if your cursor is outside of an R code chunk

Snippets defined in the R (Markdown) tab only works in the R (Markdown) context.

This snippet will let you create an R code chunk with typing “rmc”. Place it in the Markdown tab of the snippets list and hit shift+tab to invoke it.

snippet rmc
  `r ''````{r }
  ```

Confirm that this snippet does not work in the R environment.

snippet rmc
  `r ''````{r ${1:chunk_title}}
    ${2:chunk_content}
  ```

$ is used as a special character to denote where the cursor should jump (by hitting tab) after completing each section of a snippet.

ggplot (scatter plot)

snippet gl
  ggplot(data = ${1:dataset}) +
  geom_line(aes(y = ${2:y}, x = ${3:x}))

ggplot (density plot)

snippet gd
  ggplot(data = ${1:dataset}) +
  geom_density(aes(x = ${2:x}))

06-1: How to write R codes, manage Projects, work with RStudio

Tips to make the most of the lecture notes

Learning objectives

Reproducibility

Reproducibility

How to organize your project

How to organize your project

Files

Example project

RStudio tips

Code snippets

Resources