Compare each record value from two different data sets


#1

Hi All,

I have two datasets / data tables , and wanted to compare each value from one column to another dataset entire column. And result should show in new column stating matched or not matched. It's like vlookup in excel. Kindly help.


#2

Without a reprex, I am not sure I understand what you want to do.

Here is a solution based on what I understood (but that may not be what you are trying to do).

Let's create some data:

library(tidyverse)

a <- tibble(
  x = 1:5,
  y = letters[1:5]
)

b <- tibble(
  var1 = sample(1:10, 10, replace = T),
  var2 = sample(1:10, 10, replace = T)
)

I understood that you want to see if each value of a$x (for instance) matches the entire values of b$var1 (for instance). And you want the results in a new column. I understood this as "a new column in b" and, unless a$x only contains one value, there will be as many new columns in b as there are values in a$x.

Here is the solution for this:

bind_cols(b, map_dfc(a$x, ~ transmute(b, new_col = a$x[.] == b$var1)))

It gives:

# A tibble: 10 x 7
    var1  var2 names names1 names2 names3 names4
   <int> <int> <lgl> <lgl>  <lgl>  <lgl>  <lgl> 
 1     1     9 TRUE  FALSE  FALSE  FALSE  FALSE 
 2     8     9 FALSE FALSE  FALSE  FALSE  FALSE 
 3     6     4 FALSE FALSE  FALSE  FALSE  FALSE 
 4     5     4 FALSE FALSE  FALSE  FALSE  TRUE  
 5     7     4 FALSE FALSE  FALSE  FALSE  FALSE 
 6     1     9 TRUE  FALSE  FALSE  FALSE  FALSE 
 7     2     7 FALSE TRUE   FALSE  FALSE  FALSE 
 8    10     4 FALSE FALSE  FALSE  FALSE  FALSE 
 9     2     5 FALSE TRUE   FALSE  FALSE  FALSE 
10     7     9 FALSE FALSE  FALSE  FALSE  FALSE

#3

Dear Prosoitos,

Thanks for your time and solution. Here is my exact requirement.

Column A
A & S FCU
CLEARVIEW FCU
FRIENDLY FCU
WEST-AIRCOMM FCU
MON VALLEY COMMUNITY FCU

Column B
A & S FCU
ALTOONA VA HOSPITAL FCU
FRIENDLY FCU
AMERICAN PRIDE CU
BLAIR COUNTY FCU

Each value from column 1 should check if available in Column A. below is the result column for the above two columns.

Result Column
Found in Column B
Not Found in Column B
Found in Column B
Not Found in Column B
Not Found in Column B

Column A and Column B are from different tables.

Is this achievable through regular expressions . ?


#4

Hi @Janakiram,

You should definitely learn to make a reprex. It would make it a lot easier for people trying to help you to understand what you are trying to achieve.

A few things are still unclear:

  • where do you want your result column? (as a new column in the dataset holding column B?)
  • more importantly: your example does not make it clear whether the result should be TRUE only when the values are on the same row or not. It sounds like it does not have to be, but in your example they are. (When making an example, try not to make a special case.)

In case I am not being clear: if A & S FCU was, say, on the 2nd row in column B, should this still return TRUE?

Lastly, it would also make it easier for others if you gave the data in a form that is directly usable by copy-paste (e.g. a tibble or a data frame). So basically, learn what a reprex is and how to make one :slightly_smiling_face:


#5

So, let's make some new data matching the new information you gave:

library(tidyverse)

a <- tibble(
  x = 1:5,
  y = letters[1:5],
  col_A = c(
    "A & S FCU",
    "CLEARVIEW FCU",
    "FRIENDLY FCU",
    "WEST-AIRCOMM FCU",
    "MON VALLEY COMMUNITY FCU"
  )
)

b <- tibble(
  var1 = sample(1:10, 5, replace = T),
  var2 = sample(1:10, 5, replace = T),
  col_B = c(
    "A & S FCU",
    "ALTOONA VA HOSPITAL FCU",
    "FRIENDLY FCU",
    "AMERICAN PRIDE CU",
    "BLAIR COUNTY FCU"
  )
)

If you want to add the result column to the first dataset, you can use:

mutate(a, result = a$col_A %in% b$col_B)

This gives:

# A tibble: 5 x 4
      x y     col_A                    result
  <int> <chr> <chr>                    <lgl> 
1     1 a     A & S FCU                TRUE  
2     2 b     CLEARVIEW FCU            FALSE 
3     3 c     FRIENDLY FCU             TRUE  
4     4 d     WEST-AIRCOMM FCU         FALSE 
5     5 e     MON VALLEY COMMUNITY FCU FALSE 

#6

Start here!