Create a summary table from a list column

dpepper · June 29, 2020, 10:53pm

I have a dataframe from which I'd like to create a list column, and then create a summary table from data in one column of the nested df.

Here's an example:

df <- tribble(
  ~Location,  ~Customer_ID,    ~Status,     ~Item,
  "A",           101,       "Elite",      "Yoga",
  "A",           102,       "Standard",   "Weights",
  "A",           103,       "Standard",   "Yoga",
  "A",           104,       "Pro",        "Tennis",
  "B",           105,       "Elite",      "Aerobics",
  "B",           106,       "Pro",        "Tennis",
  "B",           107,       "Pro",        "Yoga",
  "B",           108,       "Standard",   "Yoga"
)

I'd like to take this tibble and, first, group by location and nest, so that there's one row per location, with the other three columns nested into a single data column.

Then I'd like to take the data in the Item column for each location, create a count summary from it, and then store that table in a new column, which we can call item_count. So for Location A, the resulting item_count table would look like this:

item_count <- tribble(
  ~Item      ~n,
  "Yoga",     2,
  "Weights",  1,
  "Tennis",   1
)

I don't care too much which function is used to create this table; it can be dplyr::count or janitor::tabyl or anything similar.

In essence, then, all I need is to map the count() function onto the Item column in the nested data frame, and save the resulting table in a new column. But I've been stumped as to exactly how to do this.

Any help greatly appreciated.

woodward · June 30, 2020, 2:49am

The tribble doesn't work for me.

Error in list2(...) : object 'A' not found

woodward · June 30, 2020, 2:53am

I haven't used nested dataframes, how do they work? Is it something like this?

library(tibble)
library(purrr)
library(dplyr)
library(tidyr)

A <- "A"
B <- "B"

df <- tribble(
  ~Location,  ~Customer_ID,    ~Status,     ~Item,
  A,           101,       "Elite",      "Yoga",
  A,           102,       "Standard",   "Weights",
  A,           103,       "Standard",   "Yoga",
  A,           104,       "Pro",        "Tennis",
  B,           105,       "Elite",      "Aerobics",
  B,           106,       "Pro",        "Tennis",
  B,           107,       "Pro",        "Yoga",
  B,           108,       "Standard",   "Yoga"
)

df %>%
  group_by(Location) %>%
  nest() %>% 
  mutate(n = unlist(lapply(data, nrow)))

#> # A tibble: 2 x 3
#> # Groups:   Location [2]
#>   Location data                 n
#>   <chr>    <list>           <int>
#> 1 A        <tibble [4 x 3]>     4
#> 2 B        <tibble [4 x 3]>     4

^{Created on 2020-06-30 by the reprex package (v0.3.0)}

dpepper · June 30, 2020, 3:11am

Oops, forgot the quotes in the tribble definition. It's been fixed now.

woodward · June 30, 2020, 3:12am

Or do you need to nest it?

df %>%
  group_by(Location, Item) %>%
  summarise(n = n()) 

# A tibble: 6 x 3
# Groups:   Location [2]
  Location Item         n
  <chr>    <chr>    <int>
1 A        Tennis       1
2 A        Weights      1
3 A        Yoga         2
4 B        Aerobics     1
5 B        Tennis       1
6 B        Yoga         2

dpepper · June 30, 2020, 3:14am

I did want to nest it first, because in my actual application I have a nested column like that which I want to use.

woodward · June 30, 2020, 3:22am

Or using a custom function and lapply. I think this is getting near a purrr solution.

# custom function
mysummary <- function(mydf){
  mydf %>% 
    group_by(Item) %>% 
    summarise(n = n())
}

# do the thing
df <- df %>%
  group_by(Location) %>% 
  nest() %>% 
  mutate(item_count = lapply(data, mysummary))

dpepper · July 1, 2020, 1:04am

Thanks, lapply does do the job. I'm surprised that there's isn't a purrr/map way to do this, but at least I have something that works.

woodward · July 1, 2020, 2:17am

You probably can but I have never really understood purrr and map. I tend to use dplyr and loops. But am beginning to understand how useful lapply is (even though it's old technology).

nirgrahamuk · July 1, 2020, 8:53am

a solution with purrr

library(tidyverse)
df <- tribble(
  ~Location,  ~Customer_ID,    ~Status,     ~Item,
  "A",           101,       "Elite",      "Yoga",
  "A",           102,       "Standard",   "Weights",
  "A",           103,       "Standard",   "Yoga",
  "A",           104,       "Pro",        "Tennis",
  "B",           105,       "Elite",      "Aerobics",
  "B",           106,       "Pro",        "Tennis",
  "B",           107,       "Pro",        "Yoga",
  "B",           108,       "Standard",   "Yoga"
)

 df2 <- df %>% group_by(Location) %>% nest() 
# > df2
# # A tibble: 2 x 2
# # Groups:   Location [2]
# Location data            
# <chr>    <list>          
#   1 A        <tibble [4 x 3]>
#   2 B        <tibble [4 x 3]>

 #   > df2$data
# [[1]]
# # A tibble: 4 x 3
# Customer_ID Status   Item   
# <dbl> <chr>    <chr>  
# 1         101 Elite    Yoga   
# 2         102 Standard Weights
# 3         103 Standard Yoga   
# 4         104 Pro      Tennis 
# 
# [[2]]
# # A tibble: 4 x 3
# Customer_ID Status   Item    
# <dbl> <chr>    <chr>   
# 1         105 Elite    Aerobics
# 2         106 Pro      Tennis  
# 3         107 Pro      Yoga    
# 4         108 Standard Yoga    

df2$Item_count <- map(.x = df2$data,
                      .f = ~group_by(.,Item) %>% 
                        summarise(n=n())
                      )
# > df2
# # A tibble: 2 x 3
# # Groups:   Location [2]
# Location data             Item_count      
# <chr>    <list>           <list>          
#   1 A        <tibble [4 x 3]> <tibble [3 x 2]>
#   2 B        <tibble [4 x 3]> <tibble [3 x 2]>

# df2$Item_count
# [[1]]
# # A tibble: 3 x 2
# Item        n
# <chr>   <int>
# 1 Tennis      1
# 2 Weights     1
# 3 Yoga        2
# 
# [[2]]
# # A tibble: 3 x 2
# Item         n
# <chr>    <int>
# 1 Aerobics     1
# 2 Tennis       1
# 3 Yoga         2

system · July 8, 2020, 8:53am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.