How to remove comma separated values in a factor

Hi all,
I am having a bit of fun with runing statistics on my record collection using R-Studio. An issue I run into is that the factor "Format" in my data frame has to many levels. Instead of just ne level for "LP", I get dozens. That in turn is due to values, separated by comma that serve no purpuse and that I would like to delete. To illustrate with an example (se also screendump):
"LP, Album"
"LP, Album, RE"
"LP, Album + 7"
"LP, Album, Blu"
"LP, Comp"

I'd like to delete all values after the comma, just keep "LP". Any tips on how that could be done using subsetting / filtering or any other useful way in R-Studio?

Thanks!

You can use something like substring("LP, Album", 1, regexpr(",", "LP, Album") - 1 ) in conjuction with mutate()

library(dplyr)
your_data <- your_data %<%
    mutate(format = substring(format, 1, regexpr(",", format) - 1 )

You can use dplyr::case_when for your problem:

library(magrittr)

input <- tibble::tribble(
  ~format, 
  "LP, Album",
  "LP, Album, RE",
  "LP, Album + 7",
  "LP, Album, Blu",
  "LP, Comp",
  "something else"
)

input %>%
  dplyr::mutate(is_lp = dplyr::case_when(
    grepl("LP, ", format) ~ "LP",
    TRUE                  ~ "Not LP"
  ))
#> # A tibble: 6 x 2
#>   format         is_lp 
#>   <chr>          <chr> 
#> 1 LP, Album      LP    
#> 2 LP, Album, RE  LP    
#> 3 LP, Album + 7  LP    
#> 4 LP, Album, Blu LP    
#> 5 LP, Comp       LP    
#> 6 something else Not LP

Created on 2019-01-02 by the reprex package (v0.2.1)
I guess, you'll have more formats in your data that you want to clean a bit, so case_when seems like a good fit for this type of task. If you are only doing it for "LP"/"Not LP", then you can even use something like if_else.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.