Optimizing a regex of a long list pattern

I have this file of 500.000 queries and I want to find different companies (30.000), locations and more long lists. However, it is taking a very long time to label. Is there a better way to do this?

g_query_samp[str_detect(g_query_samp$search_q, regex(paste0(Loc$cities, collapse = ".*|.*"), ignore_case = T)),  "city"] <- 1

Hi, and welcome!

A reproducible example, called a reprex yields more and better answers than a code fragment.

Without knowing the structure of g_query_samp or its search_q variable and how its delimited, or what assigning the statement to 1 is supposed to represent, I'd be speculating too much to provide a useful answer.

All I can say that in general you are better off vectorizing columns than subsetting. Do you have some representative data you could share in a reprex? It doesn't have to be big, just enough to show what it is that needs parsing.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.