Compare each record value from two different data sets

Janakiram · April 22, 2018, 2:18pm

Hi All,

I have two datasets / data tables , and wanted to compare each value from one column to another dataset entire column. And result should show in new column stating matched or not matched. It's like vlookup in excel. Kindly help.

prosoitos · April 22, 2018, 4:57pm

Without a reprex, I am not sure I understand what you want to do.

Here is a solution based on what I understood (but that may not be what you are trying to do).

Let's create some data:

library(tidyverse)

a <- tibble(
  x = 1:5,
  y = letters[1:5]
)

b <- tibble(
  var1 = sample(1:10, 10, replace = T),
  var2 = sample(1:10, 10, replace = T)
)

I understood that you want to see if each value of a$x (for instance) matches the entire values of b$var1 (for instance). And you want the results in a new column. I understood this as "a new column in b" and, unless a$x only contains one value, there will be as many new columns in b as there are values in a$x.

Here is the solution for this:

bind_cols(b, map_dfc(a$x, ~ transmute(b, new_col = a$x[.] == b$var1)))

It gives:

# A tibble: 10 x 7
    var1  var2 names names1 names2 names3 names4
   <int> <int> <lgl> <lgl>  <lgl>  <lgl>  <lgl> 
 1     1     9 TRUE  FALSE  FALSE  FALSE  FALSE 
 2     8     9 FALSE FALSE  FALSE  FALSE  FALSE 
 3     6     4 FALSE FALSE  FALSE  FALSE  FALSE 
 4     5     4 FALSE FALSE  FALSE  FALSE  TRUE  
 5     7     4 FALSE FALSE  FALSE  FALSE  FALSE 
 6     1     9 TRUE  FALSE  FALSE  FALSE  FALSE 
 7     2     7 FALSE TRUE   FALSE  FALSE  FALSE 
 8    10     4 FALSE FALSE  FALSE  FALSE  FALSE 
 9     2     5 FALSE TRUE   FALSE  FALSE  FALSE 
10     7     9 FALSE FALSE  FALSE  FALSE  FALSE

Janakiram · April 23, 2018, 6:33am

Dear Prosoitos,

Thanks for your time and solution. Here is my exact requirement.

Column A
A & S FCU
CLEARVIEW FCU
FRIENDLY FCU
WEST-AIRCOMM FCU
MON VALLEY COMMUNITY FCU

Column B
A & S FCU
ALTOONA VA HOSPITAL FCU
FRIENDLY FCU
AMERICAN PRIDE CU
BLAIR COUNTY FCU

Each value from column 1 should check if available in Column A. below is the result column for the above two columns.

Result Column
Found in Column B
Not Found in Column B
Found in Column B
Not Found in Column B
Not Found in Column B

Column A and Column B are from different tables.

Is this achievable through regular expressions . ?

prosoitos · April 23, 2018, 3:57pm

Hi @Janakiram,

You should definitely learn to make a reprex. It would make it a lot easier for people trying to help you to understand what you are trying to achieve.

A few things are still unclear:

where do you want your result column? (as a new column in the dataset holding column B?)
more importantly: your example does not make it clear whether the result should be TRUE only when the values are on the same row or not. It sounds like it does not have to be, but in your example they are. (When making an example, try not to make a special case.)

In case I am not being clear: if A & S FCU was, say, on the 2nd row in column B, should this still return TRUE?

Lastly, it would also make it easier for others if you gave the data in a form that is directly usable by copy-paste (e.g. a tibble or a data frame). So basically, learn what a reprex is and how to make one

prosoitos · April 23, 2018, 4:07pm

So, let's make some new data matching the new information you gave:

library(tidyverse)

a <- tibble(
  x = 1:5,
  y = letters[1:5],
  col_A = c(
    "A & S FCU",
    "CLEARVIEW FCU",
    "FRIENDLY FCU",
    "WEST-AIRCOMM FCU",
    "MON VALLEY COMMUNITY FCU"
  )
)

b <- tibble(
  var1 = sample(1:10, 5, replace = T),
  var2 = sample(1:10, 5, replace = T),
  col_B = c(
    "A & S FCU",
    "ALTOONA VA HOSPITAL FCU",
    "FRIENDLY FCU",
    "AMERICAN PRIDE CU",
    "BLAIR COUNTY FCU"
  )
)

If you want to add the result column to the first dataset, you can use:

mutate(a, result = a$col_A %in% b$col_B)

This gives:

# A tibble: 5 x 4
      x y     col_A                    result
  <int> <chr> <chr>                    <lgl> 
1     1 a     A & S FCU                TRUE  
2     2 b     CLEARVIEW FCU            FALSE 
3     3 c     FRIENDLY FCU             TRUE  
4     4 d     WEST-AIRCOMM FCU         FALSE 
5     5 e     MON VALLEY COMMUNITY FCU FALSE

jcblum · April 23, 2018, 6:34pm

Start here!

FAQ: What's a reproducible example (`reprex`) and how do I create one? meta

Why reprex? Getting unstuck is hard. Your first step here is usually to create a reprex, or reproducible example. The goal of a reprex is to package your code, and information about your problem so that others can run it and feel your pain. Then, hopefully, folks can more easily provide a solution. What's in a Reproducible Example? Parts of a reproducible example: background information - Describe what you are trying to do. What have you already done? complete set up - include any library() calls and data to reproduce your issue. data for a reprex: Here's a discussion on setting up data for a reprex make it run - include the minimal code required to reproduce your error on the data…