Hi all,
I am having a bit of fun with runing statistics on my record collection using R-Studio. An issue I run into is that the factor "Format" in my data frame has to many levels. Instead of just ne level for "LP", I get dozens. That in turn is due to values, separated by comma that serve no purpuse and that I would like to delete. To illustrate with an example (se also screendump):
"LP, Album"
"LP, Album, RE"
"LP, Album + 7"
"LP, Album, Blu"
"LP, Comp"
I'd like to delete all values after the comma, just keep "LP". Any tips on how that could be done using subsetting / filtering or any other useful way in R-Studio?
library(magrittr)
input <- tibble::tribble(
~format,
"LP, Album",
"LP, Album, RE",
"LP, Album + 7",
"LP, Album, Blu",
"LP, Comp",
"something else"
)
input %>%
dplyr::mutate(is_lp = dplyr::case_when(
grepl("LP, ", format) ~ "LP",
TRUE ~ "Not LP"
))
#> # A tibble: 6 x 2
#> format is_lp
#> <chr> <chr>
#> 1 LP, Album LP
#> 2 LP, Album, RE LP
#> 3 LP, Album + 7 LP
#> 4 LP, Album, Blu LP
#> 5 LP, Comp LP
#> 6 something else Not LP
Created on 2019-01-02 by the reprex package (v0.2.1)
I guess, you'll have more formats in your data that you want to clean a bit, so case_when seems like a good fit for this type of task. If you are only doing it for "LP"/"Not LP", then you can even use something like if_else.