Match same rows from 2 data frames

I have 2 data frames below.
By the way, I tried to use datapasta but I got error:

> Error in if ((nchar(joined_s) + indent_context) > defn_width) { : 
>   argument is of length zero

So I have to paste the tables here

January data frame:

Plugin ID CVE CVSS Risk Host Protocol Port Name
1 80101 CVE-2013-4786 7.8 High 10.10.1.254 udp 623 IPMI v2.0 Password Hash Disclosure
2 10815 CVE-2002-1060 4.3 Medium 10.10.1.79 tcp 80 Web Server Generic XSS
3 18405 CVE-2005-1794 5.1 Medium 10.92.102.11 tcp 3389 Microsoft Windows Remote Desktop Protocol Server Man-in-the-Middle Weakness
4 35291 CVE-2004-2761 5.0 Medium 10.92.102.11 tcp 3389 SSL Certificate Signed Using Weak Hashing Algorithm
5 42873 CVE-2016-2183 5.0 Medium 10.92.102.11 tcp 3389 SSL Medium Strength Cipher Suites Supported (SWEET32)
6 45411 NA 5.0 Medium 10.92.102.11 tcp 3389 SSL Certificate with Wrong Hostname
7 51192 NA 6.4 Medium 10.92.102.11 tcp 3389 SSL Certificate Cannot Be Trusted
8 57582 NA 6.4 Medium 10.92.102.11 tcp 3389 SSL Self-Signed Certificate
9 57608 NA 5.0 Medium 10.92.102.11 tcp 445 SMB Signing not required
10 57690 NA 4.3 Medium 10.92.102.11 tcp 3389 Terminal Services Encryption Level is Medium or Low

July data frame:

Plugin ID CVE CVSS Risk Host Protocol Port Name
1 57608 NA 5.0 Medium 10.92.102.11 tcp 445 SMB Signing not required
2 57690 NA 4.3 Medium 10.92.102.11 tcp 3389 Terminal Services Encryption Level is Medium or Low
3 58453 NA 4.3 Medium 10.92.102.11 tcp 3389 Terminal Services Doesn't Use Network Level Authentication (NLA) Only
4 79638 CVE-2014-6321 10.0 Critical 10.92.102.11 tcp 3389 MS14-066: Vulnerability in Schannel Could Allow Remote Code Execution (2992611) (uncredentialed check)
5 82828 CVE-2015-1635 10.0 Critical 10.92.102.11 tcp 80 MS15-034: Vulnerability in HTTP.sys Could Allow Remote Code Execution (3042553) (uncredentialed check)
6 90510 CVE-2016-0128 6.8 Medium 10.92.102.11 tcp 49155 MS16-047: Security Update for SAM and LSAD Remote Protocols (3148527) (Badlock) (uncredentialed check)
7 97833 CVE-2017-0143 10.0 Critical 10.92.102.11 tcp 445 MS17-010: Security Update for Microsoft Windows SMB Server (4013389) (ETERNALBLUE) (ETERNALCHAMPION) (ETERNALROMANCE) (ETERNALSYNERGY) (WannaCry) (EternalRocks) (Petya) (uncredentialed check)
8 99439 NA 10.0 Critical 10.92.102.11 tcp 445 SMB Server DOUBLEPULSAR Backdoor / Implant Detection (EternalRocks)
9 108797 NA 10.0 Critical 10.92.102.11 tcp 0 Unsupported Windows OS
10 18405 CVE-2005-1794 5.1 Medium 10.92.102.12 tcp 3389 Microsoft Windows Remote Desktop Protocol Server Man-in-the-Middle Weakness
11 35291 CVE-2004-2761 5.0 Medium 10.92.102.12 tcp 3389 SSL Certificate Signed Using Weak Hashing Algorithm
12 42873 CVE-2016-2183 5.0 Medium 10.92.102.12 tcp 3389 SSL Medium Strength Cipher Suites Supported (SWEET32

My actual question is how do I match common rows on both data frames but only for certain columns? For example I want rows with values that match in column Risk, Host, Port and Name only?

I have tried using setdiff() and compare() but they seem to compare only by one column and do not have extra flags to do an advanced comparison.

Thanks

I quite like dplyr's set operations for tables, where you can definitely select specific and multiple columns using dplyr::select to narrow down your tables to only those columns you are interested in comparing.

library(dplyr)
a <- tibble(
  c1 = 1:10,
  c2 = 5:14,
  c3 = 11:20
)
b <- tibble(
  c1 = 1:5,
  c2 = c(5,6,1,1,1),
  c3 = c(1,12,1:3)
)

dplyr::intersect(a,b)
#> # A tibble: 1 x 3
#>      c1    c2    c3
#>   <int> <dbl> <dbl>
#> 1     2     6    12

a %>% 
  select(c1,c2) %>% 
  dplyr::intersect(
    b %>% select(c1,c2)
  )
#> # A tibble: 2 x 2
#>      c1    c2
#>   <int> <dbl>
#> 1     1     5
#> 2     2     6

Created on 2021-02-24 by the reprex package (v1.0.0)

Hi Sir,

Thank you for the solution, it works very well.

My next question is how can I get

  1. the count of intersecting rows
  2. the rows and the count of non-intersecting rows which belong to data frame 1 only
  3. the rows and the count of non-intersecting rows which belong to data frame 2 only

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.