I have a countries list created manually . I also have a dataframe which countains countries and
continents and cities. How can I select only rows where df$location value contains the one of the elements in this list
also if say Bosnia is one of the elements in countries_list and the dataframe has Bosnia and Herzegovina, this also needs to be included
I have so far panda function but does not allow for partial match
data <- data[grepl(isin(countries_list),data$Location),]
Rstudio offers many handy cheatsheets, including one for Regular Expressions and that might be useful if you want to keep using the family of grep-like functions.
Personally, I prefer a different approach while still using some basic regex expressions: using "|" as an OR operator.
library(tidyverse)
countries_list <- "Norway|Sweden|Finland|Bosnia|Spain|Germany"
df <- tribble(
~location, ~continent, ~city,
"Australie", "Oceania", "Canberra",
"France", "Europe", "Paris",
"Bosnia and Herzegovenia", "Europe", "Sarajevo",
"Peru", "South America", "Lima"
)
df %>% filter(str_detect(location, countries_list))
#> # A tibble: 1 x 3
#> location continent city
#> <chr> <chr> <chr>
#> 1 Bosnia and Herzegovenia Europe Sarajevo