Hi-- just wondering what the best package/method would be to make a table of descriptive statistics if I have both continuous and categorical variables? For context, I'd want it to look something like this
I'd suggest creating a small data.frame with example data to give folks a sense of what you're working with.
There's a bunch of R packages that help you create summary tables. A good way to review which will work best for you is to check out the vignettes. For example, qwraps2 has this nice guide: https://cran.r-project.org/web/packages/qwraps2/vignettes/summary-statistics.html
Also the Tidyverse has a bunch of packages that make munging and summarizing data quite easy, and then tools like gt and kable help get your summary table polished for publication.
I personally like to just build the table I'd like to see. As an example of that, see below. But more often than not of the summary table packages have a set of functions that do this kind of thing for you.
library(dplyr)
library(tidyr)
library(glue)
df <- tibble(
ob = 1:4,
num1 = c(1,2,3,3),
num2 = c(3,4,4,5),
cat1 = sample(c(1,0), 4, replace = TRUE),
cat2 = sample(c(1,0), 4, replace = TRUE)
)
result <- bind_rows(
df %>%
select(ob, num1, num2) %>% #select(is.numeric()) requires dplyr 1.0.0
pivot_longer(c(-ob)) %>%
group_by(name) %>%
summarize(
`Mean (SD)` = glue("{mean(value)} ({sd(value) %>% round(2)})"),
),
df %>%
select(ob, cat1, cat2) %>%
pivot_longer(c(-ob,)) %>%
group_by(name) %>%
summarise(
`%` = mean(value)
)
) %>%
replace_na(
list(
`Mean (SD)` = "",
`%` = "-"
)
)
result
#> # A tibble: 4 x 3
#> name `Mean (SD)` `%`
#> <chr> <glue> <chr>
#> 1 num1 "2.25 (0.96)" -
#> 2 num2 "4 (0.82)" -
#> 3 cat1 "" 1
#> 4 cat2 "" 0.5
Created on 2020-04-25 by the reprex package (v0.3.0)
Hi, thanks for your help! For context, here's a bit of the data I'm working with.
read write math science socst female race ses schtyp prog
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|1|57|52|41|47|57|Male|White|Low|Public|General|
|2|68|59|53|63|61|Female|White|Medium|Public|Vocational|
|3|44|33|54|58|31|Male|White|High|Public|General|
|4|63|44|47|53|56|Male|White|High|Public|Vocational|
|5|47|52|57|53|61|Male|White|Medium|Public|Academic|
|6|44|52|51|63|61|Male|White|Medium|Public|Academic|
|7|50|59|42|53|61|Male|Hispanic|Medium|Public|General|
|8|34|46|45|39|36|Male|African American|Medium|Public|Academic|
|9|63|57|54|58|51|Male|White|Medium|Public|General|
Hi, I basically have the same question. I’ve looked through several options but haven't found a way to create a table that shows: means(sd) for multiple variables organized by group.
Using my data (below), I can use 'group_by()' and 'across()' to estimate the 𝑥̄ and sd.
means_sds <- list(
mean = ~mean(.x, na.rm = TRUE),
sd = ~sd(.x, na.rm = TRUE)
)
vars.avg.sd <- dat.sm %>%
group_by(group.factor) %>%
summarise(across(is.numeric, means_sds))
Yet, I have to transpose the data to approximate my goal but, the 𝑥̄ and sd are on alternating rows.
Any help or suggestions would be greatly appreciated.
Cheers,
~ Jason the #rstatsnewbie
t(vars.avg.sd)
Data
dput(head(dat.sm, 30))
structure(list(group.factor = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L), .Label = c("HC", "CLBP",
"FM", "TRUE"), class = "factor"), slpQual = c(0, 1, 1, 1, 1,
0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 2, 0, 2, 1,
1, 2, 1, 2), slpLat = c(1, 0, 2, 1, 1, 0, 1, 1, 1, 2, 0, 1, 1,
1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 2, 0, 1, 1, 1, 3), slpDur = c(1,
0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0,
1, 0, 2, 1, 1, 0, 1, 3), slpEff = c(0, 0, 1, 0, 1, 1, 0, 0, 0,
0, 0, 0, 0, 3, 3, 0, 3, 1, 0, 0, 3, 0, 0, 3, 1, 3, 0, 1, 1, 3
), slpDist = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), slpMeds = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1,
0, 0, 0, 0, 0, 3, 1), slpDayFcn = c(0, 0, 1, 0, 1, 0, 0, 1, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
), psqi_Global = c(3, 2, 7, 3, 5, 3, 3, 4, 2, 6, 0, 3, 4, 6,
7, 1, 7, 5, 6, 3, 5, 2, 6, 4, 8, 6, 4, 5, 8, 13), slpQualraw = c(0,
1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
2, 0, 2, 1, 1, 2, 1, 2), slpLatraw = c(12, 6, 33, 17, 17, 6,
19.5, 32, 12, 33, 6, 7, 12, 12, 18, 11, 17, 22, 24.5, 6, 13.5,
12, 22, 11, 33, 11, 17, 17, 19.5, 109), slpDurraw = c(6.5, 7.5,
7, 7.5, 8, 7, 7, 7.5, 7.5, 6.5, 9, 7.25, 8, 8, 6.5, 8, 7, 7,
7.5, 7, 7, 7.5, 7, 8, 5, 6, 7, 7.5, 7, 4.5), slpEffraw = c(92.8571428571429,
100, 82.3529411764706, 93.75, 84.2105263157895, 82.3529411764706,
93.3333333333333, 88.2352941176471, 100, 92.8571428571429, 100,
90.625, 100, 25, 20.9677419354839, 94.1176470588235, 21.875,
82.3529411764706, 85.7142857142857, 87.5, 21.875, 93.75, 93.3333333333333,
24.8062015503876, 83.3333333333333, 20, 93.3333333333333, 83.3333333333333,
82.3529411764706, 50), slpDistraw = c(2, 3, 1, 4, 5, 3, 5, 8,
1, 7, 0, 2, 7, 5, 6, 1, 6, 3, 4, 5, 8, 5, 7, 3, 3, 4, 3, 16,
11, 13), slpMedsraw = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 1), slpDayFcnraw = c(0,
0, 1, 2, 1, 0, 0, 1, 0, 1, 0, 0, 1, 2, 3, 0, 3, 0, 1, 3, 2, 0,
2, 0, 0, 0, 0, 3, 3, 0), psqi_Globalraw = c(113.357142857143,
117.5, 125.352941176471, 125.25, 116.210526315789, 98.3529411764706,
124.833333333333, 137.735294117647, 120.5, 141.357142857143,
115, 107.875, 129, 53, 55.4677419354839, 114.117647058824, 55.875,
115.352941176471, 125.714285714286, 109.5, 52.375, 118.25, 134.333333333333,
46.8062015503876, 126.333333333333, 42, 121.333333333333, 128.833333333333,
126.852941176471, 179.5), ess_total = c(5, 5, 4, 7, 10, 2, 5,
5, 4, 9, 6, 10, 8, 9, 6, 8, 9, 3, 6, 10, 8, 10, 3, 0, 10, 6,
6, 6, 9, 6), isi_total = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3L, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), bdi_total = c(0, 1, 1, 13, 5, 0, 1, 6, 0,
7, 0, 2, 0, 6, 0, 3, 1, 0, 0, 7, 12, 5, 7, 0, 2, 8, 1, 9, 12,
4), mcgill_total = c(0, 0, 0, 0, 0, 0, 9, 0, 0, 4, 0, 1, 0, 5,
0, 0, 0, 0, 0, 38, 34, 16, 0, 0, 0, 5, 2, 14, 0, 0), TIB = c(7,
7.5, 8.5, 8, 9.5, 8.5, 7.5, 8.5, 7.5, 7, 9, 8, 8, 32, 31, 8.5,
32, 8.5, 8.75, 8, 32, 8, 7.5, 32.25, 6, 30, 7.5, 9, 8.5, 9)), row.names = c(NA,
30L), class = "data.frame")
I dont have the version of dplyr with across so I used the old summarise_if
means_sds <- list(
mean = ~ mean(.x, na.rm = TRUE),
sd = ~ sd(.x, na.rm = TRUE)
)
vars.avg.sd <- dat.sm %>%
group_by(group.factor) %>%
summarise_if(is.numeric, means_sds)
vars.sd <- vars.avg.sd %>% select(group.factor, ends_with("_sd"))
vars.avg <- vars.avg.sd %>% select(group.factor, ends_with("_mean"))
(vars.sdp <- pivot_longer(vars.sd ,cols = -1))
(vars.avgp <- pivot_longer(vars.avg,cols = -1))
(vars_both <- bind_cols(vars.avgp, vars.sdp))
(vars_both_tidy <- select(vars_both, 1:3, value1) %>%
rename(mean = value, sd = value1) %>% rowwise() %>% mutate(
nameparts = str_split(name, "_"),
name = paste0(nameparts[1:(length(nameparts) - 1)], collapse = "_")
) %>% select(-nameparts))
# A tibble: 63 x 4
group.factor name mean sd
<fct> <chr> <dbl> <dbl>
1 HC slpQual 0.826 0.576
2 HC slpLat 0.826 0.650
3 HC slpDur 0.522 0.593
4 HC slpEff 0.870 1.22
5 HC slpDist 0.957 0.209
6 HC slpMeds 0.130 0.458
7 HC slpDayFcn 0.261 0.449
8 HC psqi_Global 4.39 2.13
9 HC slpQualraw 0.826 0.576
10 HC slpLatraw 16.5 9.26
# ... with 53 more rows
Thank you, this is very helpful! I'll need some time to figure out how all of this works.
My next goal is to put this into a table like you did above. I'll try using glue and report back soon!
Edit 1 = failure
Ok, I’ve tried for several hours now and I cannot figure out how to convert the tibble above to a table I want.
My goal is a table similar to this:
Variable | HC | CLBP | FM |
---|---|---|---|
slpQual | mean(SD) | mean(SD) | mean(SD) |
slpLat | mean(SD) | mean(SD) | mean(SD) |
slpDur | mean(SD) | mean(SD) | mean(SD) |
I know this should be a simple task, but I simply can't figure out a way to make this simple table. Any help is greatly appreciated.
~ Jason the #rstatsnewbie
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.