Rearranging columns and rows in an estimates table (reprex included)

Hi all,

I'm working on creating a table of regression estimates from several models. Here is the data:

structure(list(term = c("age_ceo_state__rf", "", "mktrf", "", 
NA, NA), intercept = c("0.390***", "(19.455)", "0.673***", "(23.409)", 
NA, NA), term_2 = c("age_ceo_state__rf", "", "age_firm_state__rf", 
"", "mktrf", ""), intercept_2 = c("0.209***", "(9.449)", "0.405***", 
"(15.511)", "0.417***", "(13.255)"), term_3 = c("age_ceo_state__rf", 
"", "age_firm_state__rf", "", "mktrf", ""), intercept_3 = c("0.209***", 
"(9.449)", "0.405***", "(15.511)", "0.417***", "(13.255)")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))

I have been breaking my head for the last week on how to make this presentable as a table of estimates:
This is the original

And this is the goal:

The goal is to make it look like any regression table layout with terms aligned on the same row and the terms not present in a model having a blank space in that column. Should be straightforward but all my attempts to come up with a generalized and easy to debug solution with have failed so far.

I don't see any way other than manual rearrangement.

TBL <- structure(list(term = c("age_ceo_state__rf", "", "mktrf", "", 
                               NA, NA), 
                      intercept = c("0.390***", "(19.455)", "0.673***", "(23.409)", 
                                    NA, NA), 
                      term_2 = c("age_ceo_state__rf", "", "age_firm_state__rf", 
                                 "", "mktrf", ""), 
                      intercept_2 = c("0.209***", "(9.449)", "0.405***", "(15.511)", "0.417***", "(13.255)"), 
                      term_3 = c("age_ceo_state__rf", "", "age_firm_state__rf", "", "mktrf", ""), 
                      intercept_3 = c("0.209***", 
                                      "(9.449)", "0.405***", "(15.511)", "0.417***", "(13.255)")), class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -6L))
TBL[5:6,1:2] <- TBL[3:4,1:2]
TBL[3,1] <- TBL[3,3]
TBL[,c(3,5)] <- ""
TBL
#>                 term intercept term_2 intercept_2 term_3 intercept_3
#> 1  age_ceo_state__rf  0.390***           0.209***           0.209***
#> 2                     (19.455)            (9.449)            (9.449)
#> 3 age_firm_state__rf  0.673***           0.405***           0.405***
#> 4                     (23.409)           (15.511)           (15.511)
#> 5              mktrf  0.673***           0.417***           0.417***
#> 6                     (23.409)           (13.255)           (13.255)

Created on 2022-11-03 with reprex v2.0.2

Hi,

This is totally doable as huxreg, modelsummary and pretty much any regression table package that accepts several models does that. It's very hard to parse the way how they do that as it's buried deep in the package but it is doable. Here is where I'm at right now:

fit_cols_1_deb <-
  structure(
    list(
      term = c("age_ceo_state__rf", "", "mktrf", "",
               NA, NA),
      intercept = c("0.390***", "(19.455)", "0.673***", "(23.409)",
                    NA, NA),
      term_2 = c("age_ceo_state__rf", "", "age_firm_state__rf",
                 "", "mktrf", ""),
      intercept_2 = c(
        "0.209***",
        "(9.449)",
        "0.405***",
        "(15.511)",
        "0.417***",
        "(13.255)"
      ),
      term_3 = c("age_ceo_state__rf",
                 "", "age_firm_state__rf", "", "mktrf", ""),
      intercept_3 = c(
        "0.209***",
        "(9.449)",
        "0.405***",
        "(15.511)",
        "0.417***",
        "(13.255)"
      )
    ),
    class = c("tbl_df",
              "tbl", "data.frame"),
    row.names = c(NA, -6L)
  )


fit_cols_1_deb_1 <- fit_cols_1_deb %>%mutate(id =as.integer(gl(n(),2,n()))) %>%
  mutate(across(contains("term"), ~na_if(., ""))) %>%
  group_by(id) %>% mutate(id_1=row_number(id)) %>% 
  tidyr::fill(contains("term"),.direction="down") %>% 
  mutate(across(contains("term"), ~ifelse(id_1==2 & !is.na(.),paste0(.,"_1"), .)))



var_names_tot <- fit_cols_1_deb_1 %>% ungroup() %>% 
  select(contains("term")) %>% 
  pivot_longer(cols = everything()) %>% distinct(value) %>% arrange(value) %>% drop_na()

names(var_names_tot) <- "var_names" 
fit_cols_1_deb_2 <- bind_cols(var_names_tot ,fit_cols_1_deb_1 )

So var_names has the vector of all var names that should be in the table and the key is to take every 2 columns of term + intercept and align the terms according to the var_names vector. Meaning that in Term column mktrf_1 should start at row=5, not row=3, and row(3,4) for the columns term+intercept should be set to NA. PRetty much move/shift the rows to align terms with the var_names column. This has to be done independently from columns term_2+in_2 and term_3+in_3.

The q atm is: How to move subsets of rows in subsets of columns matching a character vector?

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.