Function to summarize number of studies by counts of values in columns in df

Hi, everyone!

I'm trying to create a function to summarize the number of studies that received a particular value in columns in a df. In the following example, the first column is for study names, and Item1:Item3 are variables. In particular, Item1:Item3 contain codes that a coding team assigned to information in a study. In the function, I want to be able to plug in different dfs; each df will always have the same "Study" column, but the number of variables might change from one df to the next.

Can someone help me turn this into a function? Thank you! (Thanks to woodman for getting me going here.)

df <- tibble(
  Study = c( rep("Wash_2009", 5), 
             rep("Zoey_2001", 12),
             rep("Jane_1999", 10),
             rep("Todd_1993", 15),
             rep("Coco_2019", 5),
             rep("Xena_2016", 3) ),
  Item1 = sample( c(1, 2, 3, 4, 5, "NS", "OT"), 50, T),
  Item2 = sample( c(1, 2, 3, 4, 5, "NS", "OT"), 50, T),
  Item3 = sample( c(1, 2, 3, 4, 5, "NS", "OT"), 50, T)
)

Item1 <- df %>%
  group_by(Study) %>%
  count(Item1) %>%
  group_by(Item1) %>%
  summarise(Studies = n())

Item2 <- df %>%
  group_by(Study) %>%
  count(Item2) %>%
  group_by(Item2) %>%
  summarise(Studies = n())

Item3 <- df %>%
  group_by(Study) %>%
  count(Item3) %>%
  group_by(Item3) %>%
  summarise(Studies = n())

Item1
Item2
Item3

One option is to reshape your data to long format, which is one way to avoid having to know in advance how many columns you're summarizing. For example:

library(tidyverse)

summary1 = df %>% 
  gather(key, value, -Study) %>% 
  group_by(key, value) %>% 
  summarise(n = length(unique(Study)))

summary1
# A tibble: 21 x 3
# Groups:   key [3]
   key   value     n
   <chr> <chr> <int>
 1 Item1 1         5
 2 Item1 2         4
 3 Item1 3         5
 4 Item1 4         3
 5 Item1 5         3
 6 Item1 NS        5
 7 Item1 OT        4
 8 Item2 1         2
 9 Item2 2         5
10 Item2 3         2
# … with 11 more rows

If you'd like the results in wide format, you can do:

summary1 %>% spread(key, n)
# A tibble: 7 x 4
  value Item1 Item2 Item3
  <chr> <int> <int> <int>
1 1         5     2     2
2 2         4     5     5
3 3         5     2     5
4 4         3     4     5
5 5         3     4     3
6 NS        5     6     4
7 OT        4     4     4

To turn this into a function:

sfnc = function(data, wide.output=FALSE) {
  data = data %>% 
    gather(key, value, -Study) %>% 
    group_by(key, value) %>% 
    summarise(n = length(unique(Study)))
  
  if(wide.output) {
    data = data %>% spread(key, n)
  }
  
  data
}

sfnc(df)

sfnc(df, wide.output=TRUE)
2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.