summarizing a continuous variables by two categorical variables with the gtsummary package

I am trying to summarize a continuous variable by two categorical variables as seen below. I not able to do this correctly. I wonder if there is a way to get this with the gtsummary package. Thank you

library("gtsummary")
library("tidyverse")
set.seed(123)
sex <- sample(c("Male", "Female"), size=100, replace=TRUE)
age <- rnorm(n=100, mean=20 + 4*(sex=="F"), sd=0.1)
height <- sample(c("Tall", "short"), size=100, replace=TRUE)
bmi <- rnorm(n=100, mean=10 + 4*(sex=="Female") + 2*(height=="Tall"), sd=1)

dsn <- data.frame(sex, age, bmi, height)


tab <- dsn %>% 
  dplyr::select(age, sex) %>% 
  tbl_summary(by = sex) %>% 
  bold_labels() 
tab

 #Characteristic         Female, N = 43           Male, N = 57  
                   ────────────────────────────────────────────────────────────────
                     age              20.00 (19.93, 20.06)   19.99 (19.94, 20.03)  
                   ────────────────────────────────────────────────────────────────
                     Statistics presented: median (IQR)                            


tab1 <- dsn %>% 
  filter(height == "Tall") %>% 
  dplyr::select(bmi, sex) %>% 
  tbl_summary(by = sex,
              label = list(bmi ~ "....   Tall"))
tab1

 #Characteristic         Female, N = 22           Male, N = 35  
                   ────────────────────────────────────────────────────────────────
                     ....  Tall       15.54 (15.32, 16.38)   12.09 (11.53, 12.87)  
                   ────────────────────────────────────────────────────────────────
                     Statistics presented: median (IQR) 

tab2 <- dsn %>% 
  filter(height == "Tall") %>% 
  dplyr::select(bmi, sex) %>% 
  tbl_summary(by = sex,
              label = list(bmi ~ "....   Short"))
tab2

 #Characteristic         Female, N = 22           Male, N = 35  
                   ────────────────────────────────────────────────────────────────
                     ....  Short      15.54 (15.32, 16.38)   12.09 (11.53, 12.87)  
                   ────────────────────────────────────────────────────────────────
                     Statistics presented: median (IQR)  

# I am trying to obtain the table below
tbl_stack(
  list(tab1, tab2, tab),
  group_header = c("BMI", "", ""))

   #Group   Characteristic         Female, N = 22           Male, N = 35  
               ────────────────────────────────────────────────────────────────────────
                 BMI     ....  Tall       15.54 (15.32, 16.38)   12.09 (11.53, 12.87)  
                         ....  Short      15.54 (15.32, 16.38)   12.09 (11.53, 12.87)  
                         age              20.00 (19.93, 20.06)   19.99 (19.94, 20.03)  
               ────────────────────────────────────────────────────────────────────────
                 Statistics presented: median (IQR)  

#Is there an easy way to do this using the gtsummary package

1 Like

The tbl_summary() function is not written with that type of output in mind....but you can get at it using tbl_summary() and tbl_stack().

library(gtsummary)
library(tidyverse)

trial %>%
  # keep the continuous var and the two categorical variables
  select(trt, age, grade) %>%
  group_nest(grade) %>%
  mutate(
    tbl = map2(
      grade, data, 
      ~tbl_summary(.y, by = trt, 
                   label = list(age = paste("Age: Grade", .x)), missing = "no")
    )
  ) %>%
  pull(tbl) %>%
  tbl_stack() %>%
  as_kable()
Characteristic Drug A, N = 35 Drug B, N = 33
Age: Grade I 46 (36, 60) 48 (42, 55)
Age: Grade II 44 (31, 54) 50 (43, 57)
Age: Grade III 52 (42, 60) 45 (36, 52)

Created on 2020-09-23 by the reprex package (v0.3.0)

1 Like

Wow... This is amazing. Thank you very much @statistishdan. I will use this. Do you know if it is possible to add a subheader above Age: Grade I such as Age by grade .Thanks again

1 Like

Perhaps, something like this would work for you?

library(gtsummary)
library(tidyverse)

# create table overall
tbl_age <-
  trial %>%
  select(trt, age) %>%
  tbl_summary(by = trt, missing = "no")

# create table stratified by Grade
tbl_age_by_grade <-
  trial %>%
  # keep the continuous var and the two categorical variables
  select(trt, age, grade) %>%
  group_nest(grade) %>%
  mutate(
    tbl = map2(
      grade, data, 
      ~tbl_summary(.y, by = trt, 
                   label = list(age = paste("Grade", .x)), missing = "no")
    )
  ) %>%
  pull(tbl) %>%
  tbl_stack()


# stacking the tables
tbl_stack(list(tbl_age, tbl_age_by_grade)) %>%
  # indenting the grade rows
  as_gt()  %>%
  gt::tab_style(style = gt::cell_text(indent = gt::px(10), align = "left"), 
                locations = gt::cells_body(columns = gt::vars(label), 
                                           rows = str_detect(label, "Grade")))

# stacking the tables
tbl_age %>%
  as_gt(return_calls = TRUE)

Amazing. Wonderful package here. Thank you so so much

2 Likes

hi, thank you a lot for the answer. I still have some questions though.
Do you think there's a way to only have 'Age' as a subheader and hide statistics of Age subgrouped by treatment?

Also, in the case like this, is there a way to edit column names with modify_header()? Say, i want to delete N and only keep the group_by variable.

Thank you a lot.

1 Like

Yep, that is also possible. The table printed in gtsummary is called .$table_body and it just a data frame. In most cases, you can simply go in a make edits to it directly. Example below stripping the estimates from the Age header row, and modifying the table header rows.

library(gtsummary)
library(tidyverse)

# create table overall
tbl_age <-
  trial %>%
  select(trt, age) %>%
  tbl_summary(by = trt, missing = "no") %>%
  modify_header(stat_by = "**{level}**") # CHANGE COLUMN HEADER

# REMOVE STATISTICS FOR AGE FROM TABLE
tbl_age$table_body <-
  tbl_age$table_body %>%
  mutate_at(vars(stat_1, stat_2), ~NA_character_)
  
# create table stratified by Grade
tbl_age_by_grade <-
  trial %>%
  # keep the continuous var and the two categorical variables
  select(trt, age, grade) %>%
  group_nest(grade) %>%
  mutate(
    tbl = map2(
      grade, data, 
      ~tbl_summary(.y, by = trt, 
                   label = list(age = paste("Grade", .x)), missing = "no")
    )
  ) %>%
  pull(tbl) %>%
  tbl_stack()


# stacking the tables
tbl_stack(list(tbl_age, tbl_age_by_grade)) %>%
  # indenting the grade rows
  as_gt()  %>%
  gt::tab_style(style = gt::cell_text(indent = gt::px(10), align = "left"), 
                locations = gt::cells_body(columns = gt::vars(label), 
                                           rows = str_detect(label, "Grade")))

it works perfect! great thanks :star_struck:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.