I have a dataframe with a column containing objects separated by a semicolon. I want to remove all rows where that column contains 1 or more objects from an outside list.
I think I know how this could be done by using ‘mutate’ to create a new column flagging whether anything in the column list matches the external list (which I think would need a couple loops), and then filtering out. However, the actual dataframe will have millions of rows, so I’m trying to figure out what is the simplest/fastest way to do this.
# Dataframe #
df <- data.frame(id = c(1,2,3,4,5,6),
species = c("species1", "species1", "species2", "species3", "species3", "species4"),
issue = c("a b;d", "e;f;g", "b;c d;f", NA, "d", "e;g"))
# List of objects that specify which rows should be removed (any row where the column “issue” contains any of these objects should be removed)
issues_to_remove <- list("a b", "b", "c d")
# Desired output #
df.new <- data.frame(id = c(2, 4, 5, 6),
species = c("species1", "species3", "species3", "species4"),
issue = c("e;f;g", NA, "d", "e;g"))