dplyr and filter special character

Hello,
I am using data that contains country and city as variables.
The problem is city, sometimes, contains 2 locations separated by "/" or "-". I solved It using the first condition ("/")...I think

country=c("USA","JPN","JPN","KOR","FRA","FRA","ITA","ITA")
city=c("Miami","Tokyo/Yokohama","Kyoto","Seul","Paris","Lyon/Marseille","Rome","Torino/Firenze")
data1=data.frame(country, city)
data1
data1 %>% mutate(city2=gsub("(.*)/.*","\\1", city)) 

But I just can't aggregate more conditions, in this case, the "-" text.
I partially solve It, but applying twice the "mutate" command:

country=c("USA","JPN","JPN","KOR","FRA","FRA","ITA","ITA")
city=c("Miami","Tokyo/Yokohama","Kyoto","Seul","Paris","Lyon/Marseille","Rome","Torino-Firenze")
data1=data.frame(country, city)
data1
data1 %>% mutate(city2=gsub("(.*)/.*","\\1", city)) %>% mutate(city2=gsub("(.*)-.*","\\1", city2)) 

How can I write the conditions in the same "mutate"?
Thanks for your time and interest.

You can use tidyr::separate().

library(tidyr)

country <- c("USA", "JPN", "JPN", "KOR", "FRA", "FRA", "ITA", "ITA")
city <- c("Miami", "Tokyo/Yokohama", "Kyoto", "Seul", "Paris", "Lyon/Marseille", "Rome", "Torino-Firenze")
data1 <- data.frame(country, city)

separate(data1, city, into = c("city1", "city2"), sep = "[/-]", remove = FALSE)
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 5 rows [1, 3, 4,
#> 5, 7].
#>   country           city  city1     city2
#> 1     USA          Miami  Miami      <NA>
#> 2     JPN Tokyo/Yokohama  Tokyo  Yokohama
#> 3     JPN          Kyoto  Kyoto      <NA>
#> 4     KOR           Seul   Seul      <NA>
#> 5     FRA          Paris  Paris      <NA>
#> 6     FRA Lyon/Marseille   Lyon Marseille
#> 7     ITA           Rome   Rome      <NA>
#> 8     ITA Torino-Firenze Torino   Firenze

Created on 2020-08-21 by the reprex package (v0.3.0)

2 Likes

Hello,
I tried something and it work. Give it a shot.

library(tidyverse)

library(stringr)

country=c("USA","JPN","JPN","KOR","FRA","FRA","ITA","ITA","IND")

city=c("Miami","Tokyo/Yokohama","Kyoto","Seul","Paris",
"Lyon/Marseille","Rome","Torino/Firenze", "Mumbai-Bombay")

data1=data.frame(country, city)

data1

data1 %>% 
  mutate(
    city2 = ifelse(
      str_detect(city,
                "[/-]"),
      ifelse(
        str_detect(city,
                   pattern = "/"),
        str_extract(string = city,
                    ".*(?=/)"),
        str_extract(city,".*(?=-)")
      ),
      paste(city)  
    )
  )

Hope this helps,
Ayush

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.