Multiple Values in one cell


get a photo of it.

My question is there something akin to OpenRefine's value.split(";") in Rstudio, so I can do count and plot for example with the companies abbreviation without it counting it like WD; LNER; BR but as count them as one WD, One LNER and one BR separate

Welcome to the community @kiankier! I believe separate_rows() is the function you're looking for.

library(tidyverse)

df = data.frame(
  row = 1:3,
  abbr = c('WD; LNER; BR', 
           'WD; LNWR; BR', 
           'WD; GCR; LNER' )
)

df
#>   row          abbr
#> 1   1  WD; LNER; BR
#> 2   2  WD; LNWR; BR
#> 3   3 WD; GCR; LNER

out = df %>%
  separate_rows(abbr, sep = '; ')

out
#> # A tibble: 9 × 2
#>     row abbr 
#>   <int> <chr>
#> 1     1 WD   
#> 2     1 LNER 
#> 3     1 BR   
#> 4     2 WD   
#> 5     2 LNWR 
#> 6     2 BR   
#> 7     3 WD   
#> 8     3 GCR  
#> 9     3 LNER

count(out, abbr)
#> # A tibble: 5 × 2
#>   abbr      n
#>   <chr> <int>
#> 1 BR        2
#> 2 GCR       1
#> 3 LNER      2
#> 4 LNWR      1
#> 5 WD        3

Created on 2023-01-07 with reprex v2.0.2.9000

I will try it out, but a question follow this, could I use the count or plot command after this Rscript? like do a plot with the sorting abbreviations?

Yes, that's possible. Below is a continuing example that puts it all together in one script and uses ggplot to create a bar graph.

df %>%
  separate_rows(abbr, sep = '; ') %>%
  count(abbr) %>%
  ggplot(aes(x = abbr, y = n)) +
  geom_bar(stat = 'identity')

so many thanks, I will use it tomorrow and hopefully, I can make up for the time spent on finding a solution

so is this normal for a plot?

Are there NA? You can edit the code like this:

mydataframe %>%
drop_na() %>% # add this line 
ggplot(…….)

no, not in this variable

Is it a case where the "missing" bars are for cases that only appear once? Below is an example that shows if the largest value is 22,500, the bar for n = 1 does not show.

library(tidyverse)

df = data.frame(abbr = c('a', 'b', 'c', 'd'),
                n = c(1, 2, 5, 22500))

ggplot(df, aes(x = abbr, y = n)) +
  geom_bar(stat = 'identity') +
  coord_flip()

Created on 2023-01-08 with reprex v2.0.2.9000

final question, which command do I add to make x-axis labels (in this plot with abbr, I am talking about BR, GCR, LNER) bigger/thick?

Add the theme() line below. You can specify formatting for axis.text.x and/or axis.text.y.

ggplot(df, aes(x = abbr, y = n)) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text.y = element_text(face = 'bold', size = 20))

image

1 Like