08-2: Make Regression and Summary Tables with `modelsummary`

Tips to make the most of the lecture notes

Interactive navigation tools
Running and writing codes

Click on the three horizontally stacked lines at the bottom left corner of the slide, then you will see table of contents, and you can jump to the section you want
Hit letter “o” on your keyboard and you will have a panel view of all the slides

The box area with a hint of blue as the background color is where you can write code (hereafter referred to as the “code area”).
Hit the “Run Code” button to execute all the code inside the code area.
You can evaluate (run) code selectively by highlighting the parts you want to run and hitting Command + Enter for Mac (Ctrl + Enter for Windows).
If you want to run the codes on your computer, you can first click on the icon with two sheets of paper stacked on top of each other (top right corner of the code chunk), which copies the code in the code area. You can then paste it onto your computer.
You can click on the reload button (top right corner of the code chunk, left to the copy button) to revert back to the original code.

Create regression tables

Create regression tables with the `modelsummary` package

Dataset
Initiate a regression table
Modify
Swapping VCOV
Save the table

Get the data
Look at the data

We use county_yield throughout this lecture.

First install the r.spatial.workshop.datasets package.

#--- install the r.spatial.workshop.datasets package ---#
install.packages("r.spatial.workshop.datasets", repos = c("https://tmieno2.r-universe.dev", "https://cran.r-project.org"))

Then, get the data:

#--- get the data ---#
data(county_yield, package = "r.spatial.workshop.datasets")

county_yield <- dplyr::select(county_yield, - geometry)

county_yield

# A tibble: 1,956 × 10
   corn_yield soy_yield  year county_code state_name d0_5_9 d1_5_9 d2_5_9 d3_5_9
        <dbl>     <dbl> <int> <chr>       <chr>       <dbl>  <dbl>  <dbl>  <dbl>
 1       123       42    2000 053         Kansas       2.49  2.87   0.134   0   
 2       188.      NA    2017 095         Kansas       8.72  0      0       0   
 3       169.      58.4  2016 095         Kansas       1     0      0       0   
 4       198.      NA    2015 095         Kansas       1.76  1.21   2.09    0   
 5       152.      NA    2012 095         Kansas       6.28  1.47   9.54    4.46
 6       170       42    2007 095         Kansas       0     0      0       0   
 7       193       49    2005 095         Kansas       4.32  0      0       0   
 8       173       47    2003 095         Kansas       2.29  5.16   4.46    1.09
 9       165       40    2002 095         Kansas       3.71  1.48   1.90    0   
10       171       52    2001 095         Kansas       9.88  0.188  0       0   
# ℹ 1,946 more rows
# ℹ 1 more variable: d4_5_9 <dbl>

Variable Definitions

soy_yield: soybean yield (bu/acre)
corn_yield: corn yield (bu/acre)
d0_5_9: ratio of weeks under drought severity of 0 from May to September
d1_5_9: ~ drought severity of 1 from May to September
d2_5_9: ~ drought severity of 2 from May to September
d3_5_9: ~ drought severity of 3 from May to September
d4_5_9: ~ drought severity of 4 from May to September

Prepare regression results
default table

Let’s first run regressions which we are going to report in tables.

model_1_corn <- lm(corn_yield ~ d1_5_9 + d2_5_9, data = county_yield)
model_2_corn <- lm(corn_yield ~ d1_5_9 + d2_5_9 + d3_5_9 + d4_5_9, data = county_yield)
model_1_soy <- lm(soy_yield ~ d1_5_9 + d2_5_9, data = county_yield)
model_2_soy <- lm(soy_yield ~ d1_5_9 + d2_5_9 + d3_5_9 + d4_5_9, data = county_yield)

Get White-Huber robust variance-covariance matrix for the regressions:

vcov_1_corn <- vcovHC(model_1_corn)
vcov_2_corn <- vcovHC(model_2_corn)
vcov_1_soy <- vcovHC(model_1_soy)
vcov_2_soy <- vcovHC(model_2_soy)

You can supply a list of regression results to modelsummary::msummary() to create a default regression table.

modelsummary::msummary(
  list(
    model_1_corn,
    model_2_corn,
    model_1_soy,
    model_2_soy
  )
)

tinytable_8e8473ydwt94gai2n2t4

	(1)	(2)	(3)	(4)
(Intercept)	181.978	183.882	56.049	56.202
	(0.678)	(0.690)	(0.288)	(0.295)
d1_5_9	-0.216	-0.367	-0.062	-0.069
	(0.135)	(0.133)	(0.055)	(0.055)
d2_5_9	-1.081	-0.836	-0.327	-0.298
	(0.124)	(0.129)	(0.053)	(0.055)
d3_5_9		-0.754		-0.173
		(0.158)		(0.090)
d4_5_9		-2.194		-0.137
		(0.320)		(0.213)
Num.Obs.	1956	1956	1100	1100
R2	0.050	0.099	0.047	0.052
R2 Adj.	0.049	0.097	0.046	0.049
AIC	17806.4	17708.0	7475.7	7474.2
BIC	17828.8	17741.4	7495.8	7504.2
Log.Lik.	-8899.218	-8847.985	-3733.873	-3731.078
F	51.768	53.480	27.207	15.043
RMSE	22.89	22.30	7.21	7.19

How
stars
coef_map
coef_omit
gof_omit
add_rows

modelsummary::msummary() offers multiple options to modify the default regression table to your liking:

title: put a title to the table
stars: place significance symbols (and modify the symbol placement rules)
coef_map: change the order and label of variable names
notes: add footnotes
fmt: change the format of numbers
statistic: type of statistics you display along with coefficient estimates
gof_map: define which model statistics to display
gof_omit: define which model statistics to omit from the default selection of model statistics
add_rows: add rows of arbitrary contents to the table

Add stars = TRUE in modelsummary::msummary() to add significance markers.

You can modify significance levels and markers by supplying a named vector with its elements being the significance levels and their corresponding names being the significance markers.

Example:

#--- create a named vector ---#
stars_label <- c("+" = 0.1, "&+" = 0.05, "+*+" = 0.01)

#--- create a table ---#
modelsummary::msummary(model_1_corn, stars = stars_label)

tinytable_pwrzaaks3ht3yz27njek

	(1)
+ p < 0.1, &+ p < 0.05, +*+ p < 0.01
(Intercept)	181.978+*+
	(0.678)
d1_5_9	-0.216
	(0.135)
d2_5_9	-1.081+*+
	(0.124)
Num.Obs.	1956
R2	0.050
R2 Adj.	0.049
AIC	17806.4
BIC	17828.8
Log.Lik.	-8899.218
F	51.768
RMSE	22.89

coef_map allows you to reorder coefficient rows and change their labels.

Similarly with the stars option, you supply a named vector where its names are the existing labels and their corresponding elements are the new labels.

In the table, the coefficient rows are placed in the order they are ordered in the named vector.

#--- define a coef_map vector ---#
coef_map_vec <- c(
  "d1_5_9" = "DI: category 1", 
  "d2_5_9" = "DI: category 2", 
  "d3_5_9" = "DI: category 3", 
  "d4_5_9" = "DI: category 4", 
  "(Intercept)" = "Constant"
) 

#--- create a table ---#
modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  coef_map = coef_map_vec
)

tinytable_xsgqf7zwtw35oaikwe3a

	(1)	(2)
DI: category 1	-0.367	-0.069
	(0.133)	(0.055)
DI: category 2	-0.836	-0.298
	(0.129)	(0.055)
DI: category 3	-0.754	-0.173
	(0.158)	(0.090)
DI: category 4	-2.194	-0.137
	(0.320)	(0.213)
Constant	183.882	56.202
	(0.690)	(0.295)
Num.Obs.	1956	1100
R2	0.099	0.052
R2 Adj.	0.097	0.049
AIC	17708.0	7474.2
BIC	17741.4	7504.2
Log.Lik.	-8847.985	-3731.078
F	53.480	15.043
RMSE	22.30	7.19

coef_omit() lets you omit coefficient rows from the default selections.

You supply a vector of strings (and/or regular expressions), and coefficient rows that match the string pattern will be omitted.

Example

modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  coef_omit ="d2"
)

d2 matches with d2_5_9, and rows associated with the coefficients on d2_5_9 are removed.

tinytable_gh8iwgfdhdm8cow4efvx

	(1)	(2)
(Intercept)	183.882	56.202
	(0.690)	(0.295)
d1_5_9	-0.367	-0.069
	(0.133)	(0.055)
d3_5_9	-0.754	-0.173
	(0.158)	(0.090)
d4_5_9	-2.194	-0.137
	(0.320)	(0.213)
Num.Obs.	1956	1100
R2	0.099	0.052
R2 Adj.	0.097	0.049
AIC	17708.0	7474.2
BIC	17741.4	7504.2
Log.Lik.	-8847.985	-3731.078
F	53.480	15.043
RMSE	22.30	7.19

gof_omit() lets you omit model statistics like \(R^2\) from the default selections.

You supply a vector of strings (and/or regular expressions), and statistics that match the string pattern will be omitted.

Example

modelsummary::msummary(
  list(model_2_corn, model_2_soy), 
  gof_omit ="IC|Adj"
)

IC matches with AIC and BIC, and Adj matches with R2 Adj

tinytable_32bb5ztm5levnkpjsrco

	(1)	(2)
(Intercept)	183.882	56.202
	(0.690)	(0.295)
d1_5_9	-0.367	-0.069
	(0.133)	(0.055)
d2_5_9	-0.836	-0.298
	(0.129)	(0.055)
d3_5_9	-0.754	-0.173
	(0.158)	(0.090)
d4_5_9	-2.194	-0.137
	(0.320)	(0.213)
Num.Obs.	1956	1100
R2	0.099	0.052
Log.Lik.	-8847.985	-3731.078
F	53.480	15.043
RMSE	22.30	7.19

add_rows() can be used to insert arbitrary rows into a table. Adding rows using add_rows() is a two-step process:

Creating a data.frame (or tibble) to insert

#--- create a table (data.frame) to insert ---#
(
rows <- data.frame(
  c1 = c("FE: County", "FE: Year"),
  c2 = c("Yes", "Yes"),
  c3 = c("No", "Now")
  )
)

          c1  c2  c3
1 FE: County Yes  No
2   FE: Year Yes Now

Tell which rows you will inset the data.frame by attr(data.frame, "position") <- row number.

#--- tell where to insert ---#
attr(rows, "position") <- c(3, 4)

#--- create a table with rows inserted ---#
modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit ="IC|Adj",
  coef_omit = "d",
  add_row = rows #<<
)

tinytable_6fc7irdh9een9p5wvqi5

	Moddel1	Model2
(Intercept)	183.882	56.202
	(0.690)	(0.295)
FE: County	Yes	No
FE: Year	Yes	Now
Num.Obs.	1956	1100
R2	0.099	0.052
Log.Lik.	-8847.985	-3731.078
F	53.480	15.043
RMSE	22.30	7.19

Instruction
Compare

It is often the case that we replace the default variance-covariance matrix with a robust one for valid statistical testing.

You can achieve this using the statistic_override option. You will give a list of variance-covariance matrices in the order their corresponding regression results appear on the table.

Syntax:

statistic_override = list(vcov_1, vcov_2, ...)

Default:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit = "IC|R",
  coef_omit = "d3|d4",
  #--- no statistical override ---#
)

tinytable_4dgo9443aul76fu9bs67

	Moddel1	Model2
(Intercept)	183.882	56.202
	(0.690)	(0.295)
d1_5_9	-0.367	-0.069
	(0.133)	(0.055)
d2_5_9	-0.836	-0.298
	(0.129)	(0.055)
Num.Obs.	1956	1100
Log.Lik.	-8847.985	-3731.078
F	53.480	15.043

VCOV swapped:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy), 
  gof_omit = "IC|R",
  coef_omit = "d3|d4",
  statistic_override = list(vcov_2_corn, vcov_2_soy) #<<
)

tinytable_0qv6e3tqhn07bmncjsgc

	Moddel1	Model2
(Intercept)	183.882	56.202
	(0.690)	(0.295)
d1_5_9	-0.367	-0.069
	(0.133)	(0.055)
d2_5_9	-0.836	-0.298
	(0.129)	(0.055)
Num.Obs.	1956	1100
Log.Lik.	-8847.985	-3731.078
F	53.480	15.043

You can save the table to a file by providing a file name to the output option.

The supported file types are:

.html
.tex
.md
.txt
.docx, pptx
.png
.jpg

Example:

The docx option may be particularly useful for those who want to put finishing touches on the table manually on WORD:

modelsummary::msummary(
  list(Moddel1 = model_2_corn, Model2 = model_2_soy),
  output = "reg_results_table.docx" #<<
)

Further modify regression tables with other packages

output type
edit with flextable
edit with gt

Using the output option in modelsummary::msummary(), you can turn the regression table into R objects that are readily modifiable by the gt, kableExtra, and flextable packages.

Example: flextable

#--- create a regression table and turn it into a gt_tbl ---#
reg_table_ft <- list(model_1_corn, model_1_soy)%>% 
  modelsummary::msummary(output = "flextable")

#--- check the class ---#
class(reg_table_ft)

[1] "flextable"

Example: gt

#--- create a regression table and turn it into a gt_tbl ---#
reg_table_gt <- list(model_1_corn, model_1_soy)%>% 
  modelsummary::msummary(output = "gt")

#--- check the class ---#
class(reg_table_gt)

[1] "gt_tbl" "list"

Now that the regression table created using modelsummary::msummary() with output = "flextable" is a flextable object.

So, we can use our knowledge of the flextable package to further modify the regression table if you would like.

For the details of how to use the flextable package visit the flextable lecture notes.

Here I will just given you an example of the use of flextable operations.

Example

list(
  "Corn 1" = model_1_corn, 
  "Corn 2" =  model_2_corn, 
  "Soy 1" = model_1_soy, 
  "Soy 2" = model_2_soy
) %>% 
modelsummary::msummary(
  output = "flextable",
  gof_omit ="IC|Adj",
) %>%  
bold(i = 9, j = c(3, 5), bold = TRUE) %>% 
color(i = 3, j = 2, color = "red")

	Corn 1	Corn 2	Soy 1	Soy 2
(Intercept)	181.978	183.882	56.049	56.202
	(0.678)	(0.690)	(0.288)	(0.295)
d1_5_9	-0.216	-0.367	-0.062	-0.069
	(0.135)	(0.133)	(0.055)	(0.055)
d2_5_9	-1.081	-0.836	-0.327	-0.298
	(0.124)	(0.129)	(0.053)	(0.055)
d3_5_9		-0.754		-0.173
		(0.158)		(0.090)
d4_5_9		-2.194		-0.137
		(0.320)		(0.213)
Num.Obs.	1956	1956	1100	1100
R2	0.050	0.099	0.047	0.052
Log.Lik.	-8899.218	-8847.985	-3733.873	-3731.078
F	51.768	53.480	27.207	15.043
RMSE	22.89	22.30	7.21	7.19

Now that the regression table is a gt_tbl object, we can use our knowledge of the gt package to modify the regression table.

For the details of how to use the gt package go here. Here I will just given you an example of the use of gt operations.

Example

list(
  "Corn 1" = model_1_corn, 
  "Corn 2" =  model_2_corn, 
  "Soy 1" = model_1_soy, 
  "Soy 2" = model_2_soy
) %>% 
  modelsummary::msummary(
    output = "gt",
    gof_omit ="IC|Adj",
  ) %>%  
  gt::tab_spanner( #<<
    label = "Corn", #<<
    columns = vars("Corn 1", "Corn 2") #<<
  ) %>% #<<
  gt::tab_style( #<<
    style = cell_text(color = 'red'), #<<
    locations = cells_body(rows = 7:8) #<<
  ) #<<

	Corn		Soy 1	Soy 2
	Corn 1	Corn 2	Soy 1	Soy 2
(Intercept)	181.978	183.882	56.049	56.202
	(0.678)	(0.690)	(0.288)	(0.295)
d1_5_9	-0.216	-0.367	-0.062	-0.069
	(0.135)	(0.133)	(0.055)	(0.055)
d2_5_9	-1.081	-0.836	-0.327	-0.298
	(0.124)	(0.129)	(0.053)	(0.055)
d3_5_9		-0.754		-0.173
		(0.158)		(0.090)
d4_5_9		-2.194		-0.137
		(0.320)		(0.213)
Num.Obs.	1956	1956	1100	1100
R2	0.050	0.099	0.047	0.052
Log.Lik.	-8899.218	-8847.985	-3733.873	-3731.078
F	51.768	53.480	27.207	15.043
RMSE	22.89	22.30	7.21	7.19

Create summary tables

Example table

county_yield %>% 
  dplyr::filter(year %in% 2010:2012) %>% 
  modelsummary::datasummary(
    (Year = factor(year)) * (
      (`Corn Yield (bu/acre)` = corn_yield) + 
      (`Soy Yield (bu/acre)` = soy_yield) + 
      (`DI: category 4` = d4_5_9)
    ) ~ 
    state_name * (Mean + SD) ,
    data = .
  )

tinytable_w8w8mr9rca6vg5yd9kx8

		Colorado		Kansas		Nebraska
Year		Mean	SD	Mean	SD	Mean	SD
2010	Corn Yield (bu/acre)	196.08	12.96	182.38	17.12	182.37	14.80
	Soy Yield (bu/acre)					58.79	4.30
	DI: category 4	0.00	0.00	0.00	0.00	0.00	0.00
2011	Corn Yield (bu/acre)	186.25	12.76	160.56	29.69	178.32	16.00
	Soy Yield (bu/acre)					60.35	5.39
	DI: category 4	0.00	0.00	1.52	3.33	0.00	0.00
2012	Corn Yield (bu/acre)	160.50	31.69	161.33	17.44	185.91	18.44
	Soy Yield (bu/acre)					59.80	5.21
	DI: category 4	1.79	1.60	6.16	3.59	3.05	2.65

Syntax:

modelsummary::datasummary(formula, data = dataset)

formula has two sides separated by ~ just like formula for regression.

Variables/statistics on the left-hand side (right-hand side) comprise rows (columns).

Example

modelsummary::datasummary(
  corn_yield ~ Mean, #<<
  data = county_yield
)

tinytable_jscyy8k7vdhc9wfyrvzn

	Mean
corn_yield	178.25

Switching the order changes the structure of the resulting table:

modelsummary::datasummary(
  Mean ~ corn_yield, #<<
  data = county_yield
)

tinytable_lucwwmf3o0bfchc7vk7g

	corn_yield
Mean	178.25

The modelsummary package offers multiple summary functions of its own:

Mean
SD
Min
Max
P0, P25, P50, P75, P100
Histogram

These functions have na.rm = NA hidden inside it, so they avoid having NA when simply applying their counterparts from the base package.

For example, compare these two:

modelsummary::datasummary(
  corn_yield ~ Mean, #<<
  data = county_yield
)

tinytable_o1l9jboeg7ukpu3iqsco

	Mean
corn_yield	178.25

modelsummary::datasummary(
  #--- mean from the base package ---#
  corn_yield ~ mean, #<<
  data = county_yield
)

tinytable_192llslqa45lhm9ohea8

	mean
corn_yield	178.25

You can use a user-defined function that takes a vector of values and return a single value.

Example:

#--- define a function ---#
MinMax <- function(x){
  paste0('[', min(x, na.rm = TRUE), ', ', max(x, na.rm = TRUE), ']')
} 

#--- use it ---#
modelsummary::datasummary(corn_yield ~ MinMax, data = county_yield)

tinytable_28f3qxu9bp3c5ts5i0lo

	MinMax
corn_yield	[0, 234.3]

You can add more variables and statistics using +.

Example:

modelsummary::datasummary(
  corn_yield + soy_yield + d0_5_9 + d1_5_9
  ~ Mean + SD+ MinMax + Histogram, 
  data = county_yield
)

tinytable_lqkejvrh0howupb8z4m6

	Mean	SD	MinMax	Histogram
corn_yield	178.25	23.50	[0, 234.3]	▁▄▇▆▁
soy_yield	54.95	7.39	[15, 74.3]	▁▄▇▆▃▁
d0_5_9	3.92	3.94	[0, 21.3569]	▇▃▃▂▁
d1_5_9	3.15	4.15	[0, 21.4838]	▇▁▁▁▁

For each of the variables on the left-hand side, each of the statistics on the right-hand side is calculated and displayed.

You can use All() to create a summary table for all the numeric variables in the dataset.

At the moment, All() does not work on tibble. So, if your dataset is tibble, convert it to a data.frame on the fly in the code like below:

Example:

modelsummary::datasummary(
  All(data.frame(county_yield)) 
  ~ Mean + SD, 
  data = county_yield
)

tinytable_whlskggjoy86d3emr93l

	Mean	SD
corn_yield	178.25	23.50
soy_yield	54.95	7.39
year	2007.38	5.22
d0_5_9	3.92	3.94
d1_5_9	3.15	4.15
d2_5_9	2.82	4.51
d3_5_9	1.60	3.61
d4_5_9	0.41	1.69

More on `tablesummary()`

Nesting by group
deeper
renaming
function arguments
title and notes
align columns
Output

You can nest categorical variables with *, meaning you can get summary statistics for each value of the categorical variable (like group_by() %>% summarize()).

Syntax

#--- single stat ---#
variable ~ category_variable * stat  

#--- multiple stats ---#
variable ~ category_variable * (stat 1 + stat 2 + ...)

Examples:

modelsummary::datasummary(
  corn_yield + soy_yield + d0_5_9 + d1_5_9
  ~ state_name * (Mean + SD) + MinMax, #<< 
  data = county_yield
)

tinytable_gss25q4fb29b8jsa44mj

	Colorado		Kansas		Nebraska
	Mean	SD	Mean	SD	Mean	SD	MinMax
corn_yield	168.26	30.64	173.06	24.32	181.65	21.32	[0, 234.3]
soy_yield			50.74	7.34	55.80	7.11	[15, 74.3]
d0_5_9	4.23	4.67	3.69	3.81	3.97	3.89	[0, 21.3569]
d1_5_9	2.66	3.52	2.96	4.19	3.28	4.20	[0, 21.4838]

For each value of state_name (Nebraska, Colorado, Kansas), Mean and SD are shown for each of the variables on the left-hand side. But, MinMax is for the entire sample.

You can nest with multiple categorical variables by multiplying stats with multiple categorical variables.

Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield + soy_yield + d0_5_9 + d1_5_9
    ~ factor(year) * state_name * (Mean + SD) + MinMax, #<< 
    data = .
  )

tinytable_viu9ibddp1lll2ls943s

	2011				2012
	Kansas		Nebraska		Kansas		Nebraska
	Mean	SD	Mean	SD	Mean	SD	Mean	SD	MinMax
corn_yield	160.56	29.69	178.32	16.00	161.33	17.44	185.91	18.44	[100, 217]
soy_yield			60.35	5.39			59.80	5.21	[48, 70.3]
d0_5_9	3.52	3.18	2.86	2.01	2.15	1.25	3.11	1.34	[0, 8.7386]
d1_5_9	5.05	3.28	0.01	0.05	2.62	1.17	2.74	1.39	[0, 10.1494]

For each of the unique combinations of state_name (Nebraska, Kansas) and year (2011, 2012), Mean and SD are shown for each of the variables on the left-hand side. But, MinMax is for the entire sample.

By default variable and statistics names are used as the labels in the table.

You can provide labels by the following syntax: (label = variable/stat)

Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    (`Corn Yield (bu/acre)` = corn_yield) #<<
    ~ state_name * (Mean + (Std.Dev. = SD)), #<< 
    data = .
  )

tinytable_g3nv1r53vhykl12pwfb6

	Kansas		Nebraska
	Mean	Std.Dev.	Mean	Std.Dev.
Corn Yield (bu/acre)	160.99	23.31	181.95	17.56

corn_yield is labeled as Corn Yield (bu/acre)
SD is labeled as Std.Dev.

.content-box-red[Note: when you have spaces in the label, surround the label with back quotes.]

If you do not like this way of changing labels, you can always use gt package.

You can pass option arguments to the stats function by: stat * Argument(options)

Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (mean + sd) * Arguments(na.rm = TRUE) + #<<
      quantile * Arguments(prob = 0.1, na.rm = TRUE), #<< 
    data = .
  )

tinytable_zsqsqycdr0wgpipwya66

	Kansas		Nebraska
	mean	sd	mean	sd	quantile
corn_yield	160.99	23.31	181.95	17.56	148.52

(mean + sd) * Arguments(na.rm = TRUE) adds na.rm = TRUE option to mean() and sd()
quantile * Arguments(prob = 0.1, na.rm = TRUE) adds prob = 0.1 and na.rm = TRUE to quantileo()

Example

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (mean + sd) * Arguments(na.rm = TRUE) + 
      quantile * Arguments(prob = 0.1, na.rm = TRUE),  
    data = .,
    title = "A title", #<<
    notes = c("first note", "second note") #<<
  )

tinytable_5i9ghuay8sr3eslqweos

A title
	Kansas		Nebraska
	mean	sd	mean	sd	quantile
first note
second note
corn_yield	160.99	23.31	181.95	17.56	148.52

You can use align to align columns. Available alignment are:

l: left
r: right
c: center

Inside align(), you provide a sequence of the option letters (e.g., "lrcle")

The nth letter corresponds to nth column.

Example:

county_yield %>% 
  dplyr::filter(year %in% 2011:2012) %>% 
  dplyr::filter(state_name %in% c("Kansas", "Nebraska")) %>% 
  modelsummary::datasummary(
    corn_yield 
    ~ state_name * (`This is M E A N` = mean) * Arguments(na.rm = TRUE) + 
      (`This is Q U A N T I L E` = quantile) * Arguments(prob = 0.1, na.rm = TRUE),  
    data = .,
    align = "lrlc" #<<
  )

tinytable_jzg23abke4oasp2w2tub

	Kansas	Nebraska
	This is M E A N	This is M E A N	This is Q U A N T I L E
corn_yield	160.99	181.95	148.52

You can use the output option to either export the table as a file or save it as R objects which you can further modify.

This works exactly the same way as the modelsummary::msummary() function.

Convenience functions

balance table
correlation table

If your data was generated through randomized experiments (or you are using natural experiments), then datasummary_balance() can be very useful as it can generate a variable balance table.

Syntax:

modelsummary::datasummary_balance(variables to summarize ~ treatment dummy)

variables to summarize: list of variables to summarize
treatment dummy: a dummy variable that indicates whether in the treated or control group

Example:

county_yield %>% 
  dplyr::filter(state_name %in% c("Nebraska", "Kansas")) %>% 
  dplyr::select(c(state_name, where(is.numeric))) %>% 
  dplyr::select(- year) %>% 
  modelsummary::datasummary_balance(
    All (data.frame(.))~ state_name, #<<
    data = .
  )

tinytable_q66eqz2vh0bx847fz967

	Kansas (N=534)		Nebraska (N=1268)
	Mean	Std. Dev.	Mean	Std. Dev.
corn_yield	173.1	24.3	181.7	21.3
soy_yield	50.7	7.3	55.8	7.1
d0_5_9	3.7	3.8	4.0	3.9
d1_5_9	3.0	4.2	3.3	4.2
d2_5_9	2.6	4.0	2.8	4.6
d3_5_9	1.6	3.4	1.5	3.5
d4_5_9	0.7	2.4	0.3	1.3

You can create a correlation table with datasummary_correlation().

county_yield %>% 
  dplyr::filter(state_name %in% c("Nebraska", "Kansas")) %>% 
  dplyr::select(c(state_name, where(is.numeric))) %>% 
  dplyr::select(- year) %>% 
  modelsummary::datasummary_correlation()

tinytable_88xmed420fjuf08ma60i

	corn_yield	soy_yield	d0_5_9	d1_5_9	d2_5_9	d3_5_9	d4_5_9
corn_yield	1	.	.	.	.	.	.
soy_yield	.71	1	.	.	.	.	.
d0_5_9	.13	.04	1	.	.	.	.
d1_5_9	-.13	-.12	.05	1	.	.	.
d2_5_9	-.24	-.21	-.28	.38	1	.	.
d3_5_9	-.20	-.12	-.30	-.02	.29	1	.
d4_5_9	-.22	-.04	-.18	-.04	.02	.34	1

08-2: Make Regression and Summary Tables with modelsummary

Tips to make the most of the lecture notes

Create regression tables

Create regression tables with the modelsummary package

Further modify regression tables with other packages

Create summary tables

Example table

modelsummary::datasummary()

More on tablesummary()

Convenience functions

08-2: Make Regression and Summary Tables with `modelsummary`

Create regression tables with the `modelsummary` package

`modelsummary::datasummary()`

More on `tablesummary()`