08-2: Make Regression and Summary Tables with modelsummary

Tips to make the most of the lecture notes

  • Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want

  • Hit letter “o” on your keyboard and you will have a panel view of all the slides

  • The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
  • Hit the “Run Code” button to execute all the code inside the code area.
  • You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
  • If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
  • You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

Create regression tables


Create regression tables with the modelsummary package

We use county_yield throughout this lecture.

First install the r.spatial.workshop.datasets package.

#--- install the r.spatial.workshop.datasets package ---#
install.packages("r.spatial.workshop.datasets", repos = c("https://tmieno2.r-universe.dev", "https://cran.r-project.org"))


Then, get the data:

#--- get the data ---#
data(county_yield, package = "r.spatial.workshop.datasets")

county_yield <- dplyr::select(county_yield, - geometry)
county_yield
# A tibble: 1,956 × 10
   corn_yield soy_yield  year county_code state_name d0_5_9 d1_5_9 d2_5_9 d3_5_9
        <dbl>     <dbl> <int> <chr>       <chr>       <dbl>  <dbl>  <dbl>  <dbl>
 1       123       42    2000 053         Kansas       2.49  2.87   0.134   0   
 2       188.      NA    2017 095         Kansas       8.72  0      0       0   
 3       169.      58.4  2016 095         Kansas       1     0      0       0   
 4       198.      NA    2015 095         Kansas       1.76  1.21   2.09    0   
 5       152.      NA    2012 095         Kansas       6.28  1.47   9.54    4.46
 6       170       42    2007 095         Kansas       0     0      0       0   
 7       193       49    2005 095         Kansas       4.32  0      0       0   
 8       173       47    2003 095         Kansas       2.29  5.16   4.46    1.09
 9       165       40    2002 095         Kansas       3.71  1.48   1.90    0   
10       171       52    2001 095         Kansas       9.88  0.188  0       0   
# ℹ 1,946 more rows
# ℹ 1 more variable: d4_5_9 <dbl>


Variable Definitions

  • soy_yield: soybean yield (bu/acre)
  • corn_yield: corn yield (bu/acre)
  • d0_5_9: ratio of weeks under drought severity of 0 from May to September
  • d1_5_9: ~ drought severity of 1 from May to September
  • d2_5_9: ~ drought severity of 2 from May to September
  • d3_5_9: ~ drought severity of 3 from May to September
  • d4_5_9: ~ drought severity of 4 from May to September

Let’s first run regressions which we are going to report in tables.

model_1_corn <- lm(corn_yield ~ d1_5_9 + d2_5_9, data = county_yield)
model_2_corn <- lm(corn_yield ~ d1_5_9 + d2_5_9 + d3_5_9 + d4_5_9, data = county_yield)
model_1_soy <- lm(soy_yield ~ d1_5_9 + d2_5_9, data = county_yield)
model_2_soy <- lm(soy_yield ~ d1_5_9 + d2_5_9 + d3_5_9 + d4_5_9, data = county_yield)


Get White-Huber robust variance-covariance matrix for the regressions:

vcov_1_corn <- vcovHC(model_1_corn)
vcov_2_corn <- vcovHC(model_2_corn)
vcov_1_soy <- vcovHC(model_1_soy)
vcov_2_soy <- vcovHC(model_2_soy)

You can supply a list of regression results to modelsummary::msummary() to create a default regression table.

modelsummary::msummary(
  list(
    model_1_corn,
    model_2_corn,
    model_1_soy,
    model_2_soy
  )
)
tinytable_8e8473ydwt94gai2n2t4
(1) (2) (3) (4)
(Intercept) 181.978 183.882 56.049 56.202
(0.678) (0.690) (0.288) (0.295)
d1_5_9 -0.216 -0.367 -0.062 -0.069
(0.135) (0.133) (0.055) (0.055)
d2_5_9 -1.081 -0.836 -0.327 -0.298
(0.124) (0.129) (0.053) (0.055)
d3_5_9 -0.754 -0.173
(0.158) (0.090)
d4_5_9 -2.194 -0.137
(0.320) (0.213)
Num.Obs. 1956 1956 1100 1100
R2 0.050 0.099 0.047 0.052
R2 Adj. 0.049 0.097 0.046 0.049
AIC 17806.4 17708.0 7475.7 7474.2
BIC 17828.8 17741.4 7495.8 7504.2
Log.Lik. -8899.218 -8847.985 -3733.873 -3731.078
F 51.768 53.480 27.207 15.043
RMSE 22.89 22.30 7.21 7.19

modelsummary::msummary() offers multiple options to modify the default regression table to your liking:

  • title: put a title to the table
  • stars: place significance symbols (and modify the symbol placement rules)
  • coef_map: change the order and label of variable names
  • notes: add footnotes
  • fmt: change the format of numbers
  • statistic: type of statistics you display along with coefficient estimates
  • gof_map: define which model statistics to display
  • gof_omit: define which model statistics to omit from the default selection of model statistics
  • add_rows: add rows of arbitrary contents to the table

Add stars = TRUE in modelsummary::msummary() to add significance markers.

You can modify significance levels and markers by supplying a named vector with its elements being the significance levels and their corresponding names being the significance markers.


Example:

#--- create a named vector ---#
stars_label <- c("+" = 0.1, "&+" = 0.05, "+*+" = 0.01)

#--- create a table ---#
modelsummary::msummary(model_1_corn, stars = stars_label)
tinytable_pwrzaaks3ht3yz27njek
(1)
+ p < 0.1, &+ p < 0.05, +*+ p < 0.01
(Intercept) 181.978+*+
(0.678)
d1_5_9 -0.216
(0.135)
d2_5_9 -1.081+*+
(0.124)
Num.Obs. 1956
R2 0.050
R2 Adj. 0.049
AIC 17806.4
BIC 17828.8
Log.Lik. -8899.218
F 51.768
RMSE 22.89

coef_map allows you to reorder coefficient rows and change their labels.

Similarly with the stars option, you supply a named vector where its names are the existing labels and their corresponding elements are the new labels.

In the table, the coefficient rows are placed in the order they are ordered in the named vector.


#--- define a coef_map vector ---#
coef_map_vec <- c(
  "d1_5_9" = "DI: category 1", 
  "d2_5_9" = "DI: category 2", 
  "d3_5_9" = "DI: category 3", 
  "d4_5_9" = "DI: category 4", 
  "(Intercept)" = "Constant"
) 

#--- create a table ---#
modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  coef_map = coef_map_vec
)
tinytable_xsgqf7zwtw35oaikwe3a
(1) (2)
DI: category 1 -0.367 -0.069
(0.133) (0.055)
DI: category 2 -0.836 -0.298
(0.129) (0.055)
DI: category 3 -0.754 -0.173
(0.158) (0.090)
DI: category 4 -2.194 -0.137
(0.320) (0.213)
Constant 183.882 56.202
(0.690) (0.295)
Num.Obs. 1956 1100
R2 0.099 0.052
R2 Adj. 0.097 0.049
AIC 17708.0 7474.2
BIC 17741.4 7504.2
Log.Lik. -8847.985 -3731.078
F 53.480 15.043
RMSE 22.30 7.19

coef_omit() lets you omit coefficient rows from the default selections.

You supply a vector of strings (and/or regular expressions), and coefficient rows that match the string pattern will be omitted.


Example

modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  coef_omit ="d2"
)


d2 matches with d2_5_9, and rows associated with the coefficients on d2_5_9 are removed.

tinytable_gh8iwgfdhdm8cow4efvx
(1) (2)
(Intercept) 183.882 56.202
(0.690) (0.295)
d1_5_9 -0.367 -0.069
(0.133) (0.055)
d3_5_9 -0.754 -0.173
(0.158) (0.090)
d4_5_9 -2.194 -0.137
(0.320) (0.213)
Num.Obs. 1956 1100
R2 0.099 0.052
R2 Adj. 0.097 0.049
AIC 17708.0 7474.2
BIC 17741.4 7504.2
Log.Lik. -8847.985 -3731.078
F 53.480 15.043
RMSE 22.30 7.19

gof_omit() lets you omit model statistics like \(R^2\) from the default selections.

You supply a vector of strings (and/or regular expressions), and statistics that match the string pattern will be omitted.


Example

modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  gof_omit ="IC|Adj"
)

IC matches with AIC and BIC, and Adj matches with R2 Adj

tinytable_32bb5ztm5levnkpjsrco
(1) (2)
(Intercept) 183.882 56.202
(0.690) (0.295)
d1_5_9 -0.367 -0.069
(0.133) (0.055)
d2_5_9 -0.836 -0.298
(0.129) (0.055)
d3_5_9 -0.754 -0.173
(0.158) (0.090)
d4_5_9 -2.194 -0.137
(0.320) (0.213)
Num.Obs. 1956 1100
R2 0.099 0.052
Log.Lik. -8847.985 -3731.078
F 53.480 15.043
RMSE 22.30 7.19

add_rows() can be used to insert arbitrary rows into a table. Adding rows using add_rows() is a two-step process:

  • Creating a data.frame (or tibble) to insert
#--- create a table (data.frame) to insert ---#
(
rows <- data.frame(
  c1 = c("FE: County", "FE: Year"),
  c2 = c("Yes", "Yes"),
  c3 = c("No", "Now")
  )
)
          c1  c2  c3
1 FE: County Yes  No
2   FE: Year Yes Now


  • Tell which rows you will inset the data.frame by attr(data.frame, "position") <- row number.
#--- tell where to insert ---#
attr(rows, "position") <- c(3, 4)

#--- create a table with rows inserted ---#
modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit ="IC|Adj",
  coef_omit = "d",
  add_row = rows #<<
)
tinytable_6fc7irdh9een9p5wvqi5
Moddel1 Model2
(Intercept) 183.882 56.202
(0.690) (0.295)
FE: County Yes No
FE: Year Yes Now
Num.Obs. 1956 1100
R2 0.099 0.052
Log.Lik. -8847.985 -3731.078
F 53.480 15.043
RMSE 22.30 7.19

It is often the case that we replace the default variance-covariance matrix with a robust one for valid statistical testing.

You can achieve this using the statistic_override option. You will give a list of variance-covariance matrices in the order their corresponding regression results appear on the table.


Syntax:

statistic_override = list(vcov_1, vcov_2, ...)

Default:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit = "IC|R",
  coef_omit = "d3|d4",
  #--- no statistical override ---#
)  
tinytable_4dgo9443aul76fu9bs67
Moddel1 Model2
(Intercept) 183.882 56.202
(0.690) (0.295)
d1_5_9 -0.367 -0.069
(0.133) (0.055)
d2_5_9 -0.836 -0.298
(0.129) (0.055)
Num.Obs. 1956 1100
Log.Lik. -8847.985 -3731.078
F 53.480 15.043

VCOV swapped:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit = "IC|R",
  coef_omit = "d3|d4",
  statistic_override = list(vcov_2_corn, vcov_2_soy) #<<
)  
tinytable_0qv6e3tqhn07bmncjsgc
Moddel1 Model2
(Intercept) 183.882 56.202
(0.690) (0.295)
d1_5_9 -0.367 -0.069
(0.133) (0.055)
d2_5_9 -0.836 -0.298
(0.129) (0.055)
Num.Obs. 1956 1100
Log.Lik. -8847.985 -3731.078
F 53.480 15.043

You can save the table to a file by providing a file name to the output option.

The supported file types are:

  • .html
  • .tex
  • .md
  • .txt
  • .docx, pptx
  • .png
  • .jpg


Example:

The docx option may be particularly useful for those who want to put finishing touches on the table manually on WORD:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy),
  output = "reg_results_table.docx" #<<
)

Further modify regression tables with other packages

Using the output option in modelsummary::msummary(), you can turn the regression table into R objects that are readily modifiable by the gt, kableExtra, and flextable packages.


Example: flextable

#--- create a regression table and turn it into a gt_tbl ---#
reg_table_ft <- list(model_1_corn, model_1_soy)%>% 
  modelsummary::msummary(output = "flextable")

#--- check the class ---#
class(reg_table_ft)
[1] "flextable"


Example: gt

#--- create a regression table and turn it into a gt_tbl ---#
reg_table_gt <- list(model_1_corn, model_1_soy)%>% 
  modelsummary::msummary(output = "gt")

#--- check the class ---#
class(reg_table_gt)
[1] "gt_tbl" "list"  

Now that the regression table created using modelsummary::msummary() with output = "flextable" is a flextable object.

So, we can use our knowledge of the flextable package to further modify the regression table if you would like.

For the details of how to use the flextable package visit the flextable lecture notes.

Here I will just given you an example of the use of flextable operations.


Example

list(
  "Corn 1" = model_1_corn, 
  "Corn 2" =  model_2_corn, 
  "Soy 1" = model_1_soy, 
  "Soy 2" = model_2_soy
) %>% 
modelsummary::msummary(
  output = "flextable",
  gof_omit ="IC|Adj",
) %>%  
bold(i = 9, j = c(3, 5), bold = TRUE) %>% 
color(i = 3, j = 2, color = "red")

Corn 1

Corn 2

Soy 1

Soy 2

(Intercept)

181.978

183.882

56.049

56.202

(0.678)

(0.690)

(0.288)

(0.295)

d1_5_9

-0.216

-0.367

-0.062

-0.069

(0.135)

(0.133)

(0.055)

(0.055)

d2_5_9

-1.081

-0.836

-0.327

-0.298

(0.124)

(0.129)

(0.053)

(0.055)

d3_5_9

-0.754

-0.173

(0.158)

(0.090)

d4_5_9

-2.194

-0.137

(0.320)

(0.213)

Num.Obs.

1956

1956

1100

1100

R2

0.050

0.099

0.047

0.052

Log.Lik.

-8899.218

-8847.985

-3733.873

-3731.078

F

51.768

53.480

27.207

15.043

RMSE

22.89

22.30

7.21

7.19

Now that the regression table is a gt_tbl object, we can use our knowledge of the gt package to modify the regression table.

For the details of how to use the gt package go here. Here I will just given you an example of the use of gt operations.

Example

list(
  "Corn 1" = model_1_corn, 
  "Corn 2" =  model_2_corn, 
  "Soy 1" = model_1_soy, 
  "Soy 2" = model_2_soy
) %>% 
  modelsummary::msummary(
    output = "gt",
    gof_omit ="IC|Adj",
  ) %>%  
  gt::tab_spanner( #<<
    label = "Corn", #<<
    columns = vars("Corn 1", "Corn 2") #<<
  ) %>% #<<
  gt::tab_style( #<<
    style = cell_text(color = 'red'), #<<
    locations = cells_body(rows = 7:8) #<<
  ) #<<
Corn Soy 1 Soy 2
Corn 1 Corn 2
(Intercept) 181.978 183.882 56.049 56.202
(0.678) (0.690) (0.288) (0.295)
d1_5_9 -0.216 -0.367 -0.062 -0.069
(0.135) (0.133) (0.055) (0.055)
d2_5_9 -1.081 -0.836 -0.327 -0.298
(0.124) (0.129) (0.053) (0.055)
d3_5_9 -0.754 -0.173
(0.158) (0.090)
d4_5_9 -2.194 -0.137
(0.320) (0.213)
Num.Obs. 1956 1956 1100 1100
R2 0.050 0.099 0.047 0.052
Log.Lik. -8899.218 -8847.985 -3733.873 -3731.078
F 51.768 53.480 27.207 15.043
RMSE 22.89 22.30 7.21 7.19

Create summary tables


Example table

county_yield %>% 
  dplyr::filter(year %in% 2010:2012) %>% 
  modelsummary::datasummary(
    (Year = factor(year)) * (
      (`Corn Yield (bu/acre)` = corn_yield) + 
      (`Soy Yield (bu/acre)` = soy_yield) + 
      (`DI: category 4` = d4_5_9)
    ) ~ 
    state_name * (Mean + SD) ,
    data = .
  )  
tinytable_w8w8mr9rca6vg5yd9kx8
Colorado Kansas Nebraska
Year Mean SD Mean SD Mean SD
2010 Corn Yield (bu/acre) 196.08 12.96 182.38 17.12 182.37 14.80
Soy Yield (bu/acre) 58.79 4.30
DI: category 4 0.00 0.00 0.00 0.00 0.00 0.00
2011 Corn Yield (bu/acre) 186.25 12.76 160.56 29.69 178.32 16.00
Soy Yield (bu/acre) 60.35 5.39
DI: category 4 0.00 0.00 1.52 3.33 0.00 0.00
2012 Corn Yield (bu/acre) 160.50 31.69 161.33 17.44 185.91 18.44
Soy Yield (bu/acre) 59.80 5.21
DI: category 4 1.79 1.60 6.16 3.59 3.05 2.65

modelsummary::datasummary()

Syntax:

modelsummary::datasummary(formula, data = dataset)

formula has two sides separated by ~ just like formula for regression.

Variables/statistics on the left-hand side (right-hand side) comprise rows (columns).

Example

modelsummary::datasummary(
  corn_yield ~ Mean, #<<
  data = county_yield
)
tinytable_jscyy8k7vdhc9wfyrvzn
Mean
corn_yield 178.25


Switching the order changes the structure of the resulting table:

modelsummary::datasummary(
  Mean ~ corn_yield, #<<
  data = county_yield
)
tinytable_lucwwmf3o0bfchc7vk7g
corn_yield
Mean 178.25

The modelsummary package offers multiple summary functions of its own:

  • Mean
  • SD
  • Min
  • Max
  • P0, P25, P50, P75, P100
  • Histogram

These functions have na.rm = NA hidden inside it, so they avoid having NA when simply applying their counterparts from the base package.

For example, compare these two:

modelsummary::datasummary(
  corn_yield ~ Mean, #<<
  data = county_yield
)
tinytable_o1l9jboeg7ukpu3iqsco
Mean
corn_yield 178.25


modelsummary::datasummary(
  #--- mean from the base package ---#
  corn_yield ~ mean, #<<
  data = county_yield
)
tinytable_192llslqa45lhm9ohea8
mean
corn_yield 178.25

You can use a user-defined function that takes a vector of values and return a single value.

Example:

#--- define a function ---#
MinMax <- function(x){
  paste0('[', min(x, na.rm = TRUE), ', ', max(x, na.rm = TRUE), ']')
} 

#--- use it ---#
modelsummary::datasummary(corn_yield ~ MinMax, data = county_yield) 
tinytable_28f3qxu9bp3c5ts5i0lo
MinMax
corn_yield [0, 234.3]

You can add more variables and statistics using +.

Example:

modelsummary::datasummary(
  corn_yield + soy_yield + d0_5_9 + d1_5_9
  ~ Mean + SD+ MinMax + Histogram, 
  data = county_yield
)
tinytable_lqkejvrh0howupb8z4m6
Mean SD MinMax Histogram
corn_yield 178.25 23.50 [0, 234.3] ▁▄▇▆▁
soy_yield 54.95 7.39 [15, 74.3] ▁▄▇▆▃▁
d0_5_9 3.92 3.94 [0, 21.3569] ▇▃▃▂▁
d1_5_9 3.15 4.15 [0, 21.4838] ▇▁▁▁▁

For each of the variables on the left-hand side, each of the statistics on the right-hand side is calculated and displayed.

You can use All() to create a summary table for all the numeric variables in the dataset.

At the moment, All() does not work on tibble. So, if your dataset is tibble, convert it to a data.frame on the fly in the code like below:

Example:

modelsummary::datasummary(
  All(data.frame(county_yield)) 
  ~ Mean + SD, 
  data = county_yield
)
tinytable_whlskggjoy86d3emr93l
Mean SD
corn_yield 178.25 23.50
soy_yield 54.95 7.39
year 2007.38 5.22
d0_5_9 3.92 3.94
d1_5_9 3.15 4.15
d2_5_9 2.82 4.51
d3_5_9 1.60 3.61
d4_5_9 0.41 1.69

More on tablesummary()

You can nest categorical variables with *, meaning you can get summary statistics for each value of the categorical variable (like group_by() %>% summarize()).


Syntax

#--- single stat ---#
variable ~ category_variable * stat  

#--- multiple stats ---#
variable ~ category_variable * (stat 1 + stat 2 + ...)  


Examples:

modelsummary::datasummary(
  corn_yield + soy_yield + d0_5_9 + d1_5_9
  ~ state_name * (Mean + SD) + MinMax, #<< 
  data = county_yield
)
tinytable_gss25q4fb29b8jsa44mj
Colorado Kansas Nebraska
Mean SD Mean SD Mean SD MinMax
corn_yield 168.26 30.64 173.06 24.32 181.65 21.32 [0, 234.3]
soy_yield 50.74 7.34 55.80 7.11 [15, 74.3]
d0_5_9 4.23 4.67 3.69 3.81 3.97 3.89 [0, 21.3569]
d1_5_9 2.66 3.52 2.96 4.19 3.28 4.20 [0, 21.4838]

For each value of state_name (Nebraska, Colorado, Kansas), Mean and SD are shown for each of the variables on the left-hand side. But, MinMax is for the entire sample.

You can nest with multiple categorical variables by multiplying stats with multiple categorical variables.

Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield + soy_yield + d0_5_9 + d1_5_9
    ~ factor(year) * state_name * (Mean + SD) + MinMax, #<< 
    data = .
  )
tinytable_viu9ibddp1lll2ls943s
2011 2012
Kansas Nebraska Kansas Nebraska
Mean SD Mean SD Mean SD Mean SD MinMax
corn_yield 160.56 29.69 178.32 16.00 161.33 17.44 185.91 18.44 [100, 217]
soy_yield 60.35 5.39 59.80 5.21 [48, 70.3]
d0_5_9 3.52 3.18 2.86 2.01 2.15 1.25 3.11 1.34 [0, 8.7386]
d1_5_9 5.05 3.28 0.01 0.05 2.62 1.17 2.74 1.39 [0, 10.1494]

For each of the unique combinations of state_name (Nebraska, Kansas) and year (2011, 2012), Mean and SD are shown for each of the variables on the left-hand side. But, MinMax is for the entire sample.

By default variable and statistics names are used as the labels in the table.

You can provide labels by the following syntax: (label = variable/stat)


Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    (`Corn Yield (bu/acre)` = corn_yield) #<<
    ~ state_name * (Mean + (Std.Dev. = SD)), #<< 
    data = .
  )
tinytable_g3nv1r53vhykl12pwfb6
Kansas Nebraska
Mean Std.Dev. Mean Std.Dev.
Corn Yield (bu/acre) 160.99 23.31 181.95 17.56
  • corn_yield is labeled as Corn Yield (bu/acre)
  • SD is labeled as Std.Dev.

.content-box-red[Note: when you have spaces in the label, surround the label with back quotes.]

If you do not like this way of changing labels, you can always use gt package.

You can pass option arguments to the stats function by: stat * Argument(options)


Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (mean + sd) * Arguments(na.rm = TRUE) + #<<
      quantile * Arguments(prob = 0.1, na.rm = TRUE), #<< 
    data = .
  )
tinytable_zsqsqycdr0wgpipwya66
Kansas Nebraska
mean sd mean sd quantile
corn_yield 160.99 23.31 181.95 17.56 148.52


  • (mean + sd) * Arguments(na.rm = TRUE) adds na.rm = TRUE option to mean() and sd()
  • quantile * Arguments(prob = 0.1, na.rm = TRUE) adds prob = 0.1 and na.rm = TRUE to quantileo()

Example

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (mean + sd) * Arguments(na.rm = TRUE) + 
      quantile * Arguments(prob = 0.1, na.rm = TRUE),  
    data = .,
    title = "A title", #<<
    notes = c("first note", "second note") #<<
  )
tinytable_5i9ghuay8sr3eslqweos
Kansas Nebraska
A title
mean sd mean sd quantile
first note
second note
corn_yield 160.99 23.31 181.95 17.56 148.52

You can use align to align columns. Available alignment are:

  • l: left
  • r: right
  • c: center

Inside align(), you provide a sequence of the option letters (e.g., "lrcle")

The nth letter corresponds to nth column.

Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (`This is M E A N` = mean) * Arguments(na.rm = TRUE) + 
      (`This is Q U A N T I L E` = quantile) * Arguments(prob = 0.1, na.rm = TRUE),  
    data = .,
    align = "lrlc" #<<
  )
tinytable_jzg23abke4oasp2w2tub
Kansas Nebraska
This is M E A N This is M E A N This is Q U A N T I L E
corn_yield 160.99 181.95 148.52

You can use the output option to either export the table as a file or save it as R objects which you can further modify.

This works exactly the same way as the modelsummary::msummary() function.

Convenience functions

If your data was generated through randomized experiments (or you are using natural experiments), then datasummary_balance() can be very useful as it can generate a variable balance table.


Syntax:

modelsummary::datasummary_balance(variables to summarize ~ treatment dummy)
  • variables to summarize: list of variables to summarize
  • treatment dummy: a dummy variable that indicates whether in the treated or control group


Example:

county_yield %>% 
  dplyr::filter(state_name %in% c("Nebraska", "Kansas")) %>% 
  dplyr::select(c(state_name, where(is.numeric))) %>% 
  dplyr::select(- year) %>% 
  modelsummary::datasummary_balance(
    All (data.frame(.))~ state_name, #<<
    data = .
  )
tinytable_q66eqz2vh0bx847fz967
Kansas (N=534) Nebraska (N=1268)
Mean Std. Dev. Mean Std. Dev.
corn_yield 173.1 24.3 181.7 21.3
soy_yield 50.7 7.3 55.8 7.1
d0_5_9 3.7 3.8 4.0 3.9
d1_5_9 3.0 4.2 3.3 4.2
d2_5_9 2.6 4.0 2.8 4.6
d3_5_9 1.6 3.4 1.5 3.5
d4_5_9 0.7 2.4 0.3 1.3

You can create a correlation table with datasummary_correlation().

county_yield %>% 
  dplyr::filter(state_name %in% c("Nebraska", "Kansas")) %>% 
  dplyr::select(c(state_name, where(is.numeric))) %>% 
  dplyr::select(- year) %>% 
  modelsummary::datasummary_correlation()
tinytable_88xmed420fjuf08ma60i
corn_yield soy_yield d0_5_9 d1_5_9 d2_5_9 d3_5_9 d4_5_9
corn_yield 1 . . . . . .
soy_yield .71 1 . . . . .
d0_5_9 .13 .04 1 . . . .
d1_5_9 -.13 -.12 .05 1 . . .
d2_5_9 -.24 -.21 -.28 .38 1 . .
d3_5_9 -.20 -.12 -.30 -.02 .29 1 .
d4_5_9 -.22 -.04 -.18 -.04 .02 .34 1