5  Introduction

This chapter of the book aims at learning how to use a certain set of R packages to generate (mostly) publication-ready articles in WORD and PDF in an automated manner (except for typing narratives of course).

Attention

For those who are proficient in Latex for manuscript writing, I strongly advise that you read Section 5.5 first, which describes how the benefit of using Rmarkdown or Quarto over Latex may be much smaller than you may think for manuscript writing if your intended file format is PDF. It may be wise to not invest your time in learning how to use Rmarkdown and Quarto for manuscript writing. However, they become immensely valuable when you must produce content in WORD.

5.1 Prerequisites for Engaging with This Chapter

For a fruitful journey through this book, it is presumed that readers:

  1. Understand Rmarkdown Fundamentals: Familiarity with the basics of Rmarkdown is crucial. If you’re new or need a refresher, consider diving into an excellent introduction to Rmarkdown before proceeding.

  2. Are Versed in LaTeX for Equations: A comfort level with crafting mathematical equations using LaTeX (or a similar syntax) is essential. This skill is invaluable for presenting formulas and equations in a clear, professional manner.

  3. Possess Proficiency with Citation Managers: As a scholar, managing references efficiently is non-negotiable. For the scope of this book, knowledge of at least one citation manager is vital. Examples include:

    • Bibtex/Biblatex: A popular tool for creating and managing bibliographies.
    • EndNote: A widely-used reference management software that integrates with Word and other word processing software.

With these prerequisites, you’ll be optimally poised to extract the most value from the content, discussions, and examples provided in this book.

5.2 What can you do with Rmarkdown/Quarto?

Within a single Rmarkdown file, you can seamlessly integrate numerous functionalities to produce a comprehensive document. Here’s a rundown:

  1. R Code Integration:
    • Execute R codes directly within your document to:
      • Conduct statistical analyses or modeling.
      • Generate figures, leveraging libraries like ggplot2.
      • Craft tables using packages such as modelsummary and flextable.
  2. Embedding R Objects:
    • Embed generated R objects like data sets, tables, or figures directly into the output document. This ensures your results are always up-to-date with the latest data and analyses.
  3. Cross-referencing:
    • Utilize specific syntax to cross-reference figures and tables, ensuring readers can effortlessly locate related content.
  4. Citations & References:
    • Incorporate citations directly into your text and automatically generate a reference list.
    • Utilize the Citation Style Language to format citations and the reference list according to a myriad of established academic and professional standards.
  5. Equations with LaTeX Syntax:
    • Use LaTeX (or a similar syntax) to write intricate mathematical equations. Be mindful that the exact syntax can vary slightly depending on your output choice (PDF vs. WORD).
  6. Document Structuring with Markdown:
    • Markdown syntax provides an intuitive way to structure your document. Features include:
      • Defining various heading levels like sections, subsections, and subsubsections.
      • Adding footnotes for additional context or explanations.
  7. Complete Formatting:
    • If your target is a PDF, Rmarkdown/Quarto allows you to thoroughly format the output. This means once you knit your document, there’s no additional formatting required on the output file. However, while WORD output is generally well-formatted, you might occasionally need minor adjustments due to the inherent differences between the two formats.

5.3 Why Rmarkdown/Quarto

Rmarkdown/Quarto revolutionizes the way we handle data-driven documents by eliminating:

  • The drudgery of manually copying and pasting tables and figures.

  • The chore of manual formatting for tables and figures.

  • The hassle of updating data-driven numbers (like coefficient estimates or summary statistics) within the text.

Rewinding 17 years, during my graduate student days, Rmarkdown was yet to be born, although the sweave package was available. Being a greenhorn in academia, my workflow for generating reports for econometrics or drafting manuscripts – before mastering Latex – was incredibly tedious:

  • Execute R codes to produce tables (be it regression results, summary stats, etc.), then screenshot these tables.

  • Run R codes to craft plots in R and save them as images.

  • Manually import these tables and figures into a WORD document.

  • Manually format tables within WORD.

Any alteration in the underlying data or parameters necessitated redoing this entire process from scratch. If you have ever been part of a project aiming for a published paper, you’d know that such changes aren’t rare but rather frequent.

Enter the knitr and Rmarkdown packages, and the landscape transformed. These tools allow for a seamless blend of narratives and R code within a single document. When you knit (or compile) this file, everything required manifests in the output, be it in WORD, PDF, or HTML format.

Today’s Rmarkdown system has matured to the point where crafting an academic manuscript in both PDF and WORD is not just feasible but efficient. While the latter (WORD) might be a tad clunkier, it’s certainly doable. This book guides you through the intricacies of this process.

5.4 WORD or PDF?

When writing a paper, the choice between generating a PDF or a WORD document often hinges on various factors (I really hope that journals start accepting html file).

If I were the sole author and the target journal accepts LaTeX files post-acceptance, I would gravitate towards producing a PDF. This is especially true if there is a readily available template to shape the output PDF in line with the journal’s specifications.

However, circumstances might arise that steer you towards WORD:

  • Journal Specifications: Numerous journals exclusively accept submissions in WORD format. Adhering to their submission guidelines is paramount.

  • Collaborative Dynamics: Even if your chosen journal is open to PDF submissions, coauthors might have a preference for WORD because this format can be more accessible and familiar, allowing for easier edits and revisions.

In essence, while PDF might offer sleek formatting, especially with LaTeX, the practicalities of collaboration and specific journal requirements might necessitate a WORD document.

Aside

A common scenario where opting for Word as the output file format proves beneficial is when you are writing a grant proposal. In my experience, grant proposal development typically does not involve the use of Latex (except for instances like a small grant proposal I essentially authored independently). This preference is largely due to the ease of providing feedback and collaborative editing in Word, especially when multiple collaborators are involved. If you are the PI and would not like to write in Word but find the need to generate content in Word for others, you can explore the Rmarkdown-Word option, as discussed in Chapter 6.

5.5 What if you are proficient in Latex?

The advantage of using Rmarkdown or Quarto over LaTeX is not large if you are already proficient in LaTeX and your output file type is PDF (If it is WORD, than that is an entirely different story. See below for this discussion). LaTeX itself is capable of inserting externally created tables (written in a .tex file) and figures (in various formats). This means that you can generate a table and save it as a .tex file using R, or create a figure and save it as an image using R, which can then be imported into your LaTeX manuscript. This is a perfectly valid approach for producing a reproducible manuscript.

While Rmarkdown and Quarto enable the creation of figures and tables directly within the main manuscript file using R scripts, this does not inherently make it more reproducible than creating tables and figures externally with R and importing them later. Both approaches can be equally streamlined.

One of the differences between the two approaches lies in the ability to reference R objects within the manuscript file. For instance, you might want to discuss your regression results by referring to a coefficient of a variable of interest. With Rmarkdown and Quarto, you can programmatically reference this coefficient, eliminating the need to manually type the number. This can be highly advantageous, especially since research results often evolve during a project. As long as you refer to the number programmatically, you will not need to manually update it whenever there are changes. Think carefully if this advantage is important enough to learn a new way of compiling a manuscript using Rmarkdown or Quarto.

However, if your intended file format is Word, as required by some journals, then Rmarkdown and Quarto become immensely valuable. They allow you to compose content in a Latex-like style while generating a Word document. Unlike Latex, where programmatically inserting tables or figures into Word is challenging, Rmarkdown and Quarto resolve this limitation seamlessly. They function as pseudo-Latex in this context.

Aside

Moreover, Rmarkdown and Quarto truly shine, especially when creating educational materials involving coding. Their ability to effortlessly present code and its outcomes is of exceptional value. However, in cases like ours, where the manuscript does not include code display, this particular advantage may not be as relevant.

5.6 Markdown Basics

This section introduces certain markdown syntaxes that will be particularly useful, or even necessary, when using the Rmarkdown and Quarto systems to create submission-ready journal manuscripts. A more comprehensive guide to markdown syntax can be found here. This chapter intentionally omits syntaxes that are more suited for HTML output.

5.6.1 Section, subsection, subsubsections, …

You can define sections, subsections, and subsubsections by using #, ##, and ### respectively at the beginning of a line.

# This becomes section title

## This becomes subsection title

### This becomes subsubsection title

5.6.2 List

You can create an unordered list by simply using either +, -, or * in front of each item. For example,

+ item 1
+ item 2
+ item 3

would print as below in the output file.

  • item 1
  • item 2
  • item 3

You can create a nested list like below

+ item 1
  - item 1.1
  - item 1.2
+ item 2
+ item 3

would print as below in the output file.

  • item 1
    • item 1.1
    • item 1.2
  • item 2
  • item 3

Here is an ordered list.

1. item 1
2. item 2
3. item 3
  1. item 1
  2. item 2
  3. item 3

5.6.3 Emphasis

  • bold font: **this becomes bold** (or __this becomes bold__)
  • italic font: *this becomes italic*

5.6.4 Footnote

You can add a footnote using ^[] like this:

regular texts^[this is a footnote]

Footnotes are automatically numbered.


5.6.5 Tables

You can create a table like below.

| Header 1 | Header 2 |
| -------- | -------- |
| Cell 1A  | Cell 1B  |
| Cell 2A  | Cell 2B  |

Creating tables using markdown syntax is generally not recommended. Unless the table is simple and unlikely to change in the future, it’s recommended to use R packages for table creation whenever possible.

5.6.6 Escape Characters

If you want to display a literal character that usually has a special meaning in Markdown, you can escape it with a backslash (\):

\* This is not italicized. \*