# find matched elements row by row fast

Hi R experts,

I want to extract matching elements between column x and column y and here is my code:
data=data.frame(x=c('xdcff','dfghj','erbmp'),y=c('aaaa','dvbgg','tg'))
data\$x=as.character(data\$x)
data\$y=as.character(data\$y)
data\$m=0
for (i in 1:nrow(data)) {
if (nchar(as.character(data\$x[i]))>1) {
data\$m[i]=paste(intersect(strsplit(data\$x[i],split='')[[1]],strsplit(data\$y[i],split='')[[1]]),collapse = '') }}

data\$m is the result I want. Besides, the to-be-compared strings could be Chinese characters. So, the split function is needed.
The thing is I got 500 thousands rows and it took like forever to run the loop. I appreciate it if you could share other ways to do it fast.

Best,
Veda

Hi Veda,

First, create a function to do the same

this_function <- function(x,y){
if (nchar(as.character(x))>1) {
m =paste(intersect(strsplit(x,split='')[[1]],strsplit(y,split='')[[1]]),collapse = '')
return(m) }
}

Then apply 'this_function' to vectors : data\$x and data\$y.

data\$m <- mapply(this_function, data\$x, data\$y)

It worked super fast. Thanks Rafael.

This would be a `tidyverse` based solution

``````library(tidyverse)

data <-  data.frame(stringsAsFactors = FALSE,
x = c('xdcff','dfghj','erbmp'),
y = c('aaaa','dvbgg','tg'),
m = 0)

data %>%
rowwise() %>%
mutate(m = paste(intersect(str_split(x, pattern = "", simplify = TRUE),
str_split(y, pattern = "", simplify = TRUE)),
collapse = "")
) %>%
ungroup()
#> # A tibble: 3 x 3
#>   x     y     m
#>   <chr> <chr> <chr>
#> 1 xdcff aaaa  ""
#> 2 dfghj dvbgg "dg"
#> 3 erbmp tg    ""
``````

Created on 2020-04-10 by the reprex package (v0.3.0.9001)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.