Pivot or Separate

Hi !

I have summarized the problem below, using reprex, which somewhere lies between pivot and separate from the tidy-verse.

Will appreciate help. Thanks !


# Make sure to include library calls for all the libraries that you are using in your example
library(tidyverse)

# Remember to include the sample data that you have generated
structure(list(customer = c("Alpha", "Beta"), reqts = c("soap, shampoo", 
"shampoo, detergent, sanitizer")), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -2L), spec = structure(list(
    cols = list(customer = structure(list(), class = c("collector_character", 
    "collector")), reqts = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

# Narrow down your code to just the problematic part.
## Tried pivot_wider and separate
## but unable to produce the desired output as follows
## reqts customer_count
## soap      1
## shampoo   2
## detergent 1
## sanitizer 1

library(tidyverse)

sample_df <- structure(list(customer = c("Alpha", "Beta"), reqts = c("soap, shampoo", 
                                                                     "shampoo, detergent, sanitizer")), class = c("spec_tbl_df", "tbl_df", 
                                                                                                                  "tbl", "data.frame"), row.names = c(NA, -2L), spec = structure(list(
                                                                                                                      cols = list(customer = structure(list(), class = c("collector_character", 
                                                                                                                                                                         "collector")), reqts = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                            "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                  "collector")), skip = 1L), class = "col_spec"))
sample_df %>% 
    separate_rows(reqts) %>% 
    count(reqts, name = "customer_count")
#> # A tibble: 4 × 2
#>   reqts     customer_count
#>   <chr>              <int>
#> 1 detergent              1
#> 2 sanitizer              1
#> 3 shampoo                2
#> 4 soap                   1

Created on 2021-07-28 by the reprex package (v2.0.0)

1 Like

Thank you so much for the prompt reply. It works ...

May I ask how to manage the case where the reqts are multi-word seperated by comma

structure(list(customer = c("Alpha", "Beta"), reqts = c("beauty soap, hair shampoo", 
"hair shampoo, detergent, hand and body sanitizer")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), spec = structure(list(
    cols = list(customer = structure(list(), class = c("collector_character", 
    "collector")), reqts = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

I tried as follows:

sample_df %>%

 separate_rows(reqts,sep=",") %>%

 count(reqts)

to get:

structure(list(reqts = c(" detergent", " hair shampoo", " hand and body sanitizer", 
"beauty soap", "hair shampoo"), n = c(1L, 1L, 1L, 1L, 1L)), row.names = c(NA, 
-5L), groups = structure(list(reqts = c(" detergent", " hair shampoo", 
" hand and body sanitizer", "beauty soap", "hair shampoo"), .rows = structure(list(
    1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))```

Or in another words:
image

Where it should be:

|beauty soap||1|
|hair shampoo||2|
|detergent||1|
|hand and body sanitizer||1|

A proper regex for the sep argument would be ,\\s

library(tidyverse)

sample_df <- structure(list(customer = c("Alpha", "Beta"), reqts = c("beauty soap, hair shampoo", 
                                                                     "hair shampoo, detergent, hand and body sanitizer")), class = c("spec_tbl_df", 
                                                                                                                                     "tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), spec = structure(list(
                                                                                                                                         cols = list(customer = structure(list(), class = c("collector_character", 
                                                                                                                                                                                            "collector")), reqts = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                               "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                     "collector")), skip = 1L), class = "col_spec"))
sample_df %>% 
    separate_rows(reqts, sep = ",\\s") %>% 
    count(reqts, name = "customer_count")
#> # A tibble: 4 × 2
#>   reqts                   customer_count
#>   <chr>                            <int>
#> 1 beauty soap                          1
#> 2 detergent                            1
#> 3 hair shampoo                         2
#> 4 hand and body sanitizer              1

Created on 2021-07-28 by the reprex package (v2.0.0)

Works like a charm ! Thanks again !

@andresrcs - may I request you for some more.

Is it possible to get the list of customers in a column like this:

structure(list(reqts = c("beauty soap", "detergent", "hair shampoo",
"hand and body sanitizer"), customer_count = c(1, 1, 2, 1), customer_name = c("Alpha",
"Beta", "Alpha,Beta", "Beta")), class = "data.frame", row.names = c(NA,
-4L))

Should I post this as as new question, with link to the previous one - please let me know and I will do so.

Thanks !

library(tidyverse)

sample_df <- structure(list(customer = c("Alpha", "Beta"), reqts = c("beauty soap, hair shampoo", 
                                                                     "hair shampoo, detergent, hand and body sanitizer")), class = c("spec_tbl_df", 
                                                                                                                                     "tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), spec = structure(list(
                                                                                                                                         cols = list(customer = structure(list(), class = c("collector_character", 
                                                                                                                                                                                            "collector")), reqts = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                               "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                     "collector")), skip = 1L), class = "col_spec"))
sample_df %>% 
    separate_rows(reqts, sep = ",\\s") %>% 
    group_by(reqts) %>% 
    summarise(customer_count = n(),
              customer_name = paste(unique(customer), collapse = ", "))
#> # A tibble: 4 × 3
#>   reqts                   customer_count customer_name
#>   <chr>                            <int> <chr>        
#> 1 beauty soap                          1 Alpha        
#> 2 detergent                            1 Beta         
#> 3 hair shampoo                         2 Alpha, Beta  
#> 4 hand and body sanitizer              1 Beta

Created on 2021-08-01 by the reprex package (v2.0.0)

This is becoming a little repetitive, if you want to learn about data wrangling in general, I recommend you to read this free e-book

https://r4ds.had.co.nz/

Thanks again @andresrcs . Will definitely read it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.