Function to summarize counts for each unique value in each column in df

Hi, everyone!

I am working with several colleagues to code information in studies about language education. I want to create a function that summarizes the counts for each coding decision (0 = No, 1 = Yes, CT = Can't tell, and NS = Not sure!) in a column of a df. In the df below, the first column is the name of a study. Columns 2:4 are variables that contain codes.

In the end, I'm trying to write a function that tells me how many 0's, 1's, CT's, etc. were coded for each item or column in a df after the study name. I tried saving the output as a list, but I'm just not having any luck. I think I'm missing something simple. Help, anyone?


df <- tibble(
  StudyName = c("Wash_2001", "Mal_1999", "Jane_2002", "Zoey_1976"),
  Item1 = c(1, 1, 0, 1),
  Item2 = c(0, 0, 0, "CT"),
  Item3 = c(1, 0, 1, "NS")
)

df %>%            # I'm hoping each element in the list will have 
  count(Item1)  # this for each column in the df.

I'm trying to do something like the following. Where an I coming up short?


Count_Codes <- function(df){
  
  Output <- vector("list", ncol(df))
  
  for(i in seq_along(df)){
    Output[[i]] <- count(i)
  }
  Output
}

I think this will do it? This applies the table function to each column of the dataframe. The result is a list of named vectors..

lapply(df, table)

1 Like

Hi woodard,

Excellent! Thanks a bunch! One more quick, follow-up question: Is there a way to group that by StudyName? I have a hierarchical spreadsheet in which one StudyName can have multiple rows. I would love to use the above function to obtain not the total number of rows for each code but collapse rows by StudyName. Is that possible? Thanks, again.

I'd probably start using gather and summarise at that point. Most things are easier when you have your data in "gathered" format.

library(tibble)
library(tidyr)
library(dplyr)

df <- tibble(
    StudyName = sample(c("Wash_2001", "Mal_1999", "Jane_2002", "Zoey_1976"), 100, TRUE),
    Item1 = sample(c(1, 1, 0, 1), 100, TRUE),
    Item2 = sample(c(0, 0, 0, "CT"), 100, TRUE),
    Item3 = sample(c(1, 0, 1, "NS"), 100, TRUE)
)

df %>% 
    gather(key, value, starts_with("Item")) %>% 
    group_by(StudyName, key, value) %>% 
    summarise(count=n())
#> # A tibble: 28 x 4
#> # Groups:   StudyName, key [12]
#>    StudyName key   value count
#>    <chr>     <chr> <chr> <int>
#>  1 Jane_2002 Item1 0         5
#>  2 Jane_2002 Item1 1        20
#>  3 Jane_2002 Item2 0        20
#>  4 Jane_2002 Item2 CT        5
#>  5 Jane_2002 Item3 0         6
#>  6 Jane_2002 Item3 1        14
#>  7 Jane_2002 Item3 NS        5
#>  8 Mal_1999  Item1 0         7
#>  9 Mal_1999  Item1 1        24
#> 10 Mal_1999  Item2 0        22
#> # ... with 18 more rows

Created on 2019-06-25 by the reprex package (v0.3.0)

3 Likes

Wonderful! Thanks for the help!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.