08-2: Make Regression and Summary Tables with modelsummary

Tips to make the most of the lecture notes

  • Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want

  • Hit letter “o” on your keyboard and you will have a panel view of all the slides

  • The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
  • Hit the “Run Code” button to execute all the code inside the code area.
  • You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
  • If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
  • You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

Create regression tables


Create regression tables with the modelsummary package

We use county_yield throughout this lecture.

First install the r.spatial.workshop.datasets package.

#--- install the r.spatial.workshop.datasets package ---#
install.packages("r.spatial.workshop.datasets", repos = c("https://tmieno2.r-universe.dev", "https://cran.r-project.org"))


Then, get the data:

#--- get the data ---#
data(county_yield, package = "r.spatial.workshop.datasets")

county_yield <- dplyr::select(county_yield, - geometry)
county_yield
# A tibble: 1,956 × 10
   corn_yield soy_yield  year county_code state_name d0_5_9 d1_5_9 d2_5_9 d3_5_9
        <dbl>     <dbl> <int> <chr>       <chr>       <dbl>  <dbl>  <dbl>  <dbl>
 1       123       42    2000 053         Kansas       2.49  2.87   0.134   0   
 2       188.      NA    2017 095         Kansas       8.72  0      0       0   
 3       169.      58.4  2016 095         Kansas       1     0      0       0   
 4       198.      NA    2015 095         Kansas       1.76  1.21   2.09    0   
 5       152.      NA    2012 095         Kansas       6.28  1.47   9.54    4.46
 6       170       42    2007 095         Kansas       0     0      0       0   
 7       193       49    2005 095         Kansas       4.32  0      0       0   
 8       173       47    2003 095         Kansas       2.29  5.16   4.46    1.09
 9       165       40    2002 095         Kansas       3.71  1.48   1.90    0   
10       171       52    2001 095         Kansas       9.88  0.188  0       0   
# ℹ 1,946 more rows
# ℹ 1 more variable: d4_5_9 <dbl>


Variable Definitions

  • soy_yield: soybean yield (bu/acre)
  • corn_yield: corn yield (bu/acre)
  • d0_5_9: ratio of weeks under drought severity of 0 from May to September
  • d1_5_9: ~ drought severity of 1 from May to September
  • d2_5_9: ~ drought severity of 2 from May to September
  • d3_5_9: ~ drought severity of 3 from May to September
  • d4_5_9: ~ drought severity of 4 from May to September

Let’s first run regressions which we are going to report in tables.

model_1_corn <- lm(corn_yield ~ d1_5_9 + d2_5_9, data = county_yield)
model_2_corn <- lm(corn_yield ~ d1_5_9 + d2_5_9 + d3_5_9 + d4_5_9, data = county_yield)
model_1_soy <- lm(soy_yield ~ d1_5_9 + d2_5_9, data = county_yield)
model_2_soy <- lm(soy_yield ~ d1_5_9 + d2_5_9 + d3_5_9 + d4_5_9, data = county_yield)


Get White-Huber robust variance-covariance matrix for the regressions:

vcov_1_corn <- vcovHC(model_1_corn)
vcov_2_corn <- vcovHC(model_2_corn)
vcov_1_soy <- vcovHC(model_1_soy)
vcov_2_soy <- vcovHC(model_2_soy)

You can supply a list of regression results to modelsummary::msummary() to create a default regression table.

modelsummary::msummary(
  list(
    model_1_corn,
    model_2_corn,
    model_1_soy,
    model_2_soy
  )
)
(1) (2) (3) (4)
(Intercept) 181.978 183.882 56.049 56.202
(0.678) (0.690) (0.288) (0.295)
d1_5_9 -0.216 -0.367 -0.062 -0.069
(0.135) (0.133) (0.055) (0.055)
d2_5_9 -1.081 -0.836 -0.327 -0.298
(0.124) (0.129) (0.053) (0.055)
d3_5_9 -0.754 -0.173
(0.158) (0.090)
d4_5_9 -2.194 -0.137
(0.320) (0.213)
Num.Obs. 1956 1956 1100 1100
R2 0.050 0.099 0.047 0.052
R2 Adj. 0.049 0.097 0.046 0.049
AIC 17806.4 17708.0 7475.7 7474.2
BIC 17828.8 17741.4 7495.8 7504.2
Log.Lik. -8899.218 -8847.985 -3733.873 -3731.078
F 51.768 53.480 27.207 15.043
RMSE 22.89 22.30 7.21 7.19

modelsummary::msummary() offers multiple options to modify the default regression table to your liking:

  • title: put a title to the table
  • stars: place significance symbols (and modify the symbol placement rules)
  • coef_map: change the order and label of variable names
  • notes: add footnotes
  • fmt: change the format of numbers
  • statistic: type of statistics you display along with coefficient estimates
  • gof_map: define which model statistics to display
  • gof_omit: define which model statistics to omit from the default selection of model statistics
  • add_rows: add rows of arbitrary contents to the table

Add stars = TRUE in modelsummary::msummary() to add significance markers.

You can modify significance levels and markers by supplying a named vector with its elements being the significance levels and their corresponding names being the significance markers.


Example:

#--- create a named vector ---#
stars_label <- c("+" = 0.1, "&+" = 0.05, "+*+" = 0.01)

#--- create a table ---#
modelsummary::msummary(model_1_corn, stars = stars_label)
(1)
+ p < 0.1, &+ p < 0.05, +*+ p < 0.01
(Intercept) 181.978+*+
(0.678)
d1_5_9 -0.216
(0.135)
d2_5_9 -1.081+*+
(0.124)
Num.Obs. 1956
R2 0.050
R2 Adj. 0.049
AIC 17806.4
BIC 17828.8
Log.Lik. -8899.218
F 51.768
RMSE 22.89

coef_map allows you to reorder coefficient rows and change their labels.

Similarly with the stars option, you supply a named vector where its names are the existing labels and their corresponding elements are the new labels.

In the table, the coefficient rows are placed in the order they are ordered in the named vector.


#--- define a coef_map vector ---#
coef_map_vec <- c(
  "d1_5_9" = "DI: category 1", 
  "d2_5_9" = "DI: category 2", 
  "d3_5_9" = "DI: category 3", 
  "d4_5_9" = "DI: category 4", 
  "(Intercept)" = "Constant"
) 

#--- create a table ---#
modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  coef_map = coef_map_vec
)
(1) (2)
DI: category 1 -0.367 -0.069
(0.133) (0.055)
DI: category 2 -0.836 -0.298
(0.129) (0.055)
DI: category 3 -0.754 -0.173
(0.158) (0.090)
DI: category 4 -2.194 -0.137
(0.320) (0.213)
Constant 183.882 56.202
(0.690) (0.295)
Num.Obs. 1956 1100
R2 0.099 0.052
R2 Adj. 0.097 0.049
AIC 17708.0 7474.2
BIC 17741.4 7504.2
Log.Lik. -8847.985 -3731.078
F 53.480 15.043
RMSE 22.30 7.19

coef_omit() lets you omit coefficient rows from the default selections.

You supply a vector of strings (and/or regular expressions), and coefficient rows that match the string pattern will be omitted.


Example

modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  coef_omit ="d2"
)


d2 matches with d2_5_9, and rows associated with the coefficients on d2_5_9 are removed.

(1) (2)
(Intercept) 183.882 56.202
(0.690) (0.295)
d1_5_9 -0.367 -0.069
(0.133) (0.055)
d3_5_9 -0.754 -0.173
(0.158) (0.090)
d4_5_9 -2.194 -0.137
(0.320) (0.213)
Num.Obs. 1956 1100
R2 0.099 0.052
R2 Adj. 0.097 0.049
AIC 17708.0 7474.2
BIC 17741.4 7504.2
Log.Lik. -8847.985 -3731.078
F 53.480 15.043
RMSE 22.30 7.19

gof_omit() lets you omit model statistics like \(R^2\) from the default selections.

You supply a vector of strings (and/or regular expressions), and statistics that match the string pattern will be omitted.


Example

modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  gof_omit ="IC|Adj"
)

IC matches with AIC and BIC, and Adj matches with R2 Adj

(1) (2)
(Intercept) 183.882 56.202
(0.690) (0.295)
d1_5_9 -0.367 -0.069
(0.133) (0.055)
d2_5_9 -0.836 -0.298
(0.129) (0.055)
d3_5_9 -0.754 -0.173
(0.158) (0.090)
d4_5_9 -2.194 -0.137
(0.320) (0.213)
Num.Obs. 1956 1100
R2 0.099 0.052
Log.Lik. -8847.985 -3731.078
F 53.480 15.043
RMSE 22.30 7.19

add_rows() can be used to insert arbitrary rows into a table. Adding rows using add_rows() is a two-step process:

  • Creating a data.frame (or tibble) to insert
#--- create a table (data.frame) to insert ---#
(
rows <- data.frame(
  c1 = c("FE: County", "FE: Year"),
  c2 = c("Yes", "Yes"),
  c3 = c("No", "Now")
  )
)
          c1  c2  c3
1 FE: County Yes  No
2   FE: Year Yes Now


  • Tell which rows you will inset the data.frame by attr(data.frame, "position") <- row number.
#--- tell where to insert ---#
attr(rows, "position") <- c(3, 4)

#--- create a table with rows inserted ---#
modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit ="IC|Adj",
  coef_omit = "d",
  add_row = rows #<<
)
Moddel1 Model2
(Intercept) 183.882 56.202
(0.690) (0.295)
FE: County Yes No
FE: Year Yes Now
Num.Obs. 1956 1100
R2 0.099 0.052
Log.Lik. -8847.985 -3731.078
F 53.480 15.043
RMSE 22.30 7.19

It is often the case that we replace the default variance-covariance matrix with a robust one for valid statistical testing.

You can achieve this using the statistic_override option. You will give a list of variance-covariance matrices in the order their corresponding regression results appear on the table.


Syntax:

statistic_override = list(vcov_1, vcov_2, ...)

Default:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit = "IC|R",
  coef_omit = "d3|d4",
  #--- no statistical override ---#
)  
Moddel1 Model2
(Intercept) 183.882 56.202
(0.690) (0.295)
d1_5_9 -0.367 -0.069
(0.133) (0.055)
d2_5_9 -0.836 -0.298
(0.129) (0.055)
Num.Obs. 1956 1100
Log.Lik. -8847.985 -3731.078
F 53.480 15.043

VCOV swapped:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit = "IC|R",
  coef_omit = "d3|d4",
  statistic_override = list(vcov_2_corn, vcov_2_soy) #<<
)  
Moddel1 Model2
(Intercept) 183.882 56.202
(0.690) (0.295)
d1_5_9 -0.367 -0.069
(0.133) (0.055)
d2_5_9 -0.836 -0.298
(0.129) (0.055)
Num.Obs. 1956 1100
Log.Lik. -8847.985 -3731.078
F 53.480 15.043

You can save the table to a file by providing a file name to the output option.

The supported file types are:

  • .html
  • .tex
  • .md
  • .txt
  • .docx, pptx
  • .png
  • .jpg


Example:

The docx option may be particularly useful for those who want to put finishing touches on the table manually on WORD:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy),
  output = "reg_results_table.docx" #<<
)

Further modify regression tables with other packages

Using the output option in modelsummary::msummary(), you can turn the regression table into R objects that are readily modifiable by the gt, kableExtra, and flextable packages.


Example: flextable

#--- create a regression table and turn it into a gt_tbl ---#
reg_table_ft <- list(model_1_corn, model_1_soy)%>% 
  modelsummary::msummary(output = "flextable")

#--- check the class ---#
class(reg_table_ft)
[1] "flextable"


Example: gt

#--- create a regression table and turn it into a gt_tbl ---#
reg_table_gt <- list(model_1_corn, model_1_soy)%>% 
  modelsummary::msummary(output = "gt")

#--- check the class ---#
class(reg_table_gt)
[1] "gt_tbl" "list"  

Now that the regression table created using modelsummary::msummary() with output = "flextable" is a flextable object.

So, we can use our knowledge of the flextable package to further modify the regression table if you would like.

For the details of how to use the flextable package visit the flextable lecture notes.

Here I will just given you an example of the use of flextable operations.


Example

list(
  "Corn 1" = model_1_corn, 
  "Corn 2" =  model_2_corn, 
  "Soy 1" = model_1_soy, 
  "Soy 2" = model_2_soy
) %>% 
modelsummary::msummary(
  output = "flextable",
  gof_omit ="IC|Adj",
) %>%  
bold(i = 9, j = c(3, 5), bold = TRUE) %>% 
color(i = 3, j = 2, color = "red")

Corn 1

Corn 2

Soy 1

Soy 2

(Intercept)

181.978

183.882

56.049

56.202

(0.678)

(0.690)

(0.288)

(0.295)

d1_5_9

-0.216

-0.367

-0.062

-0.069

(0.135)

(0.133)

(0.055)

(0.055)

d2_5_9

-1.081

-0.836

-0.327

-0.298

(0.124)

(0.129)

(0.053)

(0.055)

d3_5_9

-0.754

-0.173

(0.158)

(0.090)

d4_5_9

-2.194

-0.137

(0.320)

(0.213)

Num.Obs.

1956

1956

1100

1100

R2

0.050

0.099

0.047

0.052

Log.Lik.

-8899.218

-8847.985

-3733.873

-3731.078

F

51.768

53.480

27.207

15.043

RMSE

22.89

22.30

7.21

7.19

Now that the regression table is a gt_tbl object, we can use our knowledge of the gt package to modify the regression table.

For the details of how to use the gt package go here. Here I will just given you an example of the use of gt operations.

Example

list(
  "Corn 1" = model_1_corn, 
  "Corn 2" =  model_2_corn, 
  "Soy 1" = model_1_soy, 
  "Soy 2" = model_2_soy
) %>% 
  modelsummary::msummary(
    output = "gt",
    gof_omit ="IC|Adj",
  ) %>%  
  gt::tab_spanner( #<<
    label = "Corn", #<<
    columns = vars("Corn 1", "Corn 2") #<<
  ) %>% #<<
  gt::tab_style( #<<
    style = cell_text(color = 'red'), #<<
    locations = cells_body(rows = 7:8) #<<
  ) #<<
Corn
Soy 1 Soy 2
Corn 1 Corn 2
(Intercept) 181.978 183.882 56.049 56.202
(0.678) (0.690) (0.288) (0.295)
d1_5_9 -0.216 -0.367 -0.062 -0.069
(0.135) (0.133) (0.055) (0.055)
d2_5_9 -1.081 -0.836 -0.327 -0.298
(0.124) (0.129) (0.053) (0.055)
d3_5_9 -0.754 -0.173
(0.158) (0.090)
d4_5_9 -2.194 -0.137
(0.320) (0.213)
Num.Obs. 1956 1956 1100 1100
R2 0.050 0.099 0.047 0.052
Log.Lik. -8899.218 -8847.985 -3733.873 -3731.078
F 51.768 53.480 27.207 15.043
RMSE 22.89 22.30 7.21 7.19

Create summary tables


Example table

county_yield %>% 
  dplyr::filter(year %in% 2010:2012) %>% 
  modelsummary::datasummary(
    (Year = factor(year)) * (
      (`Corn Yield (bu/acre)` = corn_yield) + 
      (`Soy Yield (bu/acre)` = soy_yield) + 
      (`DI: category 4` = d4_5_9)
    ) ~ 
    state_name * (Mean + SD) ,
    data = .
  )  
Colorado Kansas Nebraska
Year Mean SD Mean SD Mean SD
2010 Corn Yield (bu/acre) 196.08 12.96 182.38 17.12 182.37 14.80
Soy Yield (bu/acre) 58.79 4.30
DI: category 4 0.00 0.00 0.00 0.00 0.00 0.00
2011 Corn Yield (bu/acre) 186.25 12.76 160.56 29.69 178.32 16.00
Soy Yield (bu/acre) 60.35 5.39
DI: category 4 0.00 0.00 1.52 3.33 0.00 0.00
2012 Corn Yield (bu/acre) 160.50 31.69 161.33 17.44 185.91 18.44
Soy Yield (bu/acre) 59.80 5.21
DI: category 4 1.79 1.60 6.16 3.59 3.05 2.65

modelsummary::datasummary()

Syntax:

modelsummary::datasummary(formula, data = dataset)

formula has two sides separated by ~ just like formula for regression.

Variables/statistics on the left-hand side (right-hand side) comprise rows (columns).

Example

modelsummary::datasummary(
  corn_yield ~ Mean, #<<
  data = county_yield
)
Mean
corn_yield 178.25


Switching the order changes the structure of the resulting table:

modelsummary::datasummary(
  Mean ~ corn_yield, #<<
  data = county_yield
)
corn_yield
Mean 178.25

The modelsummary package offers multiple summary functions of its own:

  • Mean
  • SD
  • Min
  • Max
  • P0, P25, P50, P75, P100
  • Histogram

These functions have na.rm = NA hidden inside it, so they avoid having NA when simply applying their counterparts from the base package.

For example, compare these two:

modelsummary::datasummary(
  corn_yield ~ Mean, #<<
  data = county_yield
)
Mean
corn_yield 178.25


modelsummary::datasummary(
  #--- mean from the base package ---#
  corn_yield ~ mean, #<<
  data = county_yield
)
mean
corn_yield 178.25

You can use a user-defined function that takes a vector of values and return a single value.

Example:

#--- define a function ---#
MinMax <- function(x){
  paste0('[', min(x, na.rm = TRUE), ', ', max(x, na.rm = TRUE), ']')
} 

#--- use it ---#
modelsummary::datasummary(corn_yield ~ MinMax, data = county_yield) 
MinMax
corn_yield [0, 234.3]

You can add more variables and statistics using +.

Example:

modelsummary::datasummary(
  corn_yield + soy_yield + d0_5_9 + d1_5_9
  ~ Mean + SD+ MinMax + Histogram, 
  data = county_yield
)
Mean SD MinMax Histogram
corn_yield 178.25 23.50 [0, 234.3] ▁▄▇▆▁
soy_yield 54.95 7.39 [15, 74.3] ▁▄▇▆▃▁
d0_5_9 3.92 3.94 [0, 21.3569] ▇▃▃▂▁
d1_5_9 3.15 4.15 [0, 21.4838] ▇▁▁▁▁

For each of the variables on the left-hand side, each of the statistics on the right-hand side is calculated and displayed.

You can use All() to create a summary table for all the numeric variables in the dataset.

At the moment, All() does not work on tibble. So, if your dataset is tibble, convert it to a data.frame on the fly in the code like below:

Example:

modelsummary::datasummary(
  All(data.frame(county_yield)) 
  ~ Mean + SD, 
  data = county_yield
)
Mean SD
corn_yield 178.25 23.50
soy_yield 54.95 7.39
year 2007.38 5.22
d0_5_9 3.92 3.94
d1_5_9 3.15 4.15
d2_5_9 2.82 4.51
d3_5_9 1.60 3.61
d4_5_9 0.41 1.69

More on tablesummary()

You can nest categorical variables with *, meaning you can get summary statistics for each value of the categorical variable (like group_by() %>% summarize()).


Syntax

#--- single stat ---#
variable ~ category_variable * stat  

#--- multiple stats ---#
variable ~ category_variable * (stat 1 + stat 2 + ...)  


Examples:

modelsummary::datasummary(
  corn_yield + soy_yield + d0_5_9 + d1_5_9
  ~ state_name * (Mean + SD) + MinMax, #<< 
  data = county_yield
)
Colorado Kansas Nebraska
Mean SD Mean SD Mean SD MinMax
corn_yield 168.26 30.64 173.06 24.32 181.65 21.32 [0, 234.3]
soy_yield 50.74 7.34 55.80 7.11 [15, 74.3]
d0_5_9 4.23 4.67 3.69 3.81 3.97 3.89 [0, 21.3569]
d1_5_9 2.66 3.52 2.96 4.19 3.28 4.20 [0, 21.4838]

For each value of state_name (Nebraska, Colorado, Kansas), Mean and SD are shown for each of the variables on the left-hand side. But, MinMax is for the entire sample.

You can nest with multiple categorical variables by multiplying stats with multiple categorical variables.

Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield + soy_yield + d0_5_9 + d1_5_9
    ~ factor(year) * state_name * (Mean + SD) + MinMax, #<< 
    data = .
  )
2011 2012
Kansas Nebraska Kansas Nebraska
Mean SD Mean SD Mean SD Mean SD MinMax
corn_yield 160.56 29.69 178.32 16.00 161.33 17.44 185.91 18.44 [100, 217]
soy_yield 60.35 5.39 59.80 5.21 [48, 70.3]
d0_5_9 3.52 3.18 2.86 2.01 2.15 1.25 3.11 1.34 [0, 8.7386]
d1_5_9 5.05 3.28 0.01 0.05 2.62 1.17 2.74 1.39 [0, 10.1494]

For each of the unique combinations of state_name (Nebraska, Kansas) and year (2011, 2012), Mean and SD are shown for each of the variables on the left-hand side. But, MinMax is for the entire sample.

By default variable and statistics names are used as the labels in the table.

You can provide labels by the following syntax: (label = variable/stat)


Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    (`Corn Yield (bu/acre)` = corn_yield) #<<
    ~ state_name * (Mean + (Std.Dev. = SD)), #<< 
    data = .
  )
Kansas Nebraska
Mean Std.Dev. Mean Std.Dev.
Corn Yield (bu/acre) 160.99 23.31 181.95 17.56
  • corn_yield is labeled as Corn Yield (bu/acre)
  • SD is labeled as Std.Dev.

.content-box-red[Note: when you have spaces in the label, surround the label with back quotes.]

If you do not like this way of changing labels, you can always use gt package.

You can pass option arguments to the stats function by: stat * Argument(options)


Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (mean + sd) * Arguments(na.rm = TRUE) + #<<
      quantile * Arguments(prob = 0.1, na.rm = TRUE), #<< 
    data = .
  )
Kansas Nebraska
mean sd mean sd quantile
corn_yield 160.99 23.31 181.95 17.56 148.52


  • (mean + sd) * Arguments(na.rm = TRUE) adds na.rm = TRUE option to mean() and sd()
  • quantile * Arguments(prob = 0.1, na.rm = TRUE) adds prob = 0.1 and na.rm = TRUE to quantileo()

Example

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (mean + sd) * Arguments(na.rm = TRUE) + 
      quantile * Arguments(prob = 0.1, na.rm = TRUE),  
    data = .,
    title = "A title", #<<
    notes = c("first note", "second note") #<<
  )
Kansas Nebraska
A title
mean sd mean sd quantile
first note
second note
corn_yield 160.99 23.31 181.95 17.56 148.52

You can use align to align columns. Available alignment are:

  • l: left
  • r: right
  • c: center

Inside align(), you provide a sequence of the option letters (e.g., "lrcle")

The nth letter corresponds to nth column.

Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (`This is M E A N` = mean) * Arguments(na.rm = TRUE) + 
      (`This is Q U A N T I L E` = quantile) * Arguments(prob = 0.1, na.rm = TRUE),  
    data = .,
    align = "lrlc" #<<
  )
Kansas Nebraska
This is M E A N This is M E A N This is Q U A N T I L E
corn_yield 160.99 181.95 148.52

You can use the output option to either export the table as a file or save it as R objects which you can further modify.

This works exactly the same way as the modelsummary::msummary() function.

Convenience functions

If your data was generated through randomized experiments (or you are using natural experiments), then datasummary_balance() can be very useful as it can generate a variable balance table.


Syntax:

modelsummary::datasummary_balance(variables to summarize ~ treatment dummy)
  • variables to summarize: list of variables to summarize
  • treatment dummy: a dummy variable that indicates whether in the treated or control group


Example:

county_yield %>% 
  dplyr::filter(state_name %in% c("Nebraska", "Kansas")) %>% 
  dplyr::select(c(state_name, where(is.numeric))) %>% 
  dplyr::select(- year) %>% 
  modelsummary::datasummary_balance(
    All (data.frame(.))~ state_name, #<<
    data = .
  )
Kansas (N=534) Nebraska (N=1268)
Mean Std. Dev. Mean Std. Dev.
corn_yield 173.1 24.3 181.7 21.3
soy_yield 50.7 7.3 55.8 7.1
d0_5_9 3.7 3.8 4.0 3.9
d1_5_9 3.0 4.2 3.3 4.2
d2_5_9 2.6 4.0 2.8 4.6
d3_5_9 1.6 3.4 1.5 3.5
d4_5_9 0.7 2.4 0.3 1.3

You can create a correlation table with datasummary_correlation().

county_yield %>% 
  dplyr::filter(state_name %in% c("Nebraska", "Kansas")) %>% 
  dplyr::select(c(state_name, where(is.numeric))) %>% 
  dplyr::select(- year) %>% 
  modelsummary::datasummary_correlation()
corn_yield soy_yield d0_5_9 d1_5_9 d2_5_9 d3_5_9 d4_5_9
corn_yield 1 . . . . . .
soy_yield .71 1 . . . . .
d0_5_9 .13 .04 1 . . . .
d1_5_9 -.13 -.12 .05 1 . . .
d2_5_9 -.24 -.21 -.28 .38 1 . .
d3_5_9 -.20 -.12 -.30 -.02 .29 1 .
d4_5_9 -.22 -.04 -.18 -.04 .02 .34 1