library(tidyverse)
library(reprex)
df <- tibble::tribble(
~V1, ~V2,
"5th Generation Builder", "5th Generation Builder, LLC",
"5th Generation Builders Inc.", "5th Generation Builders",
"89 Contractors LLC", "89 Contractors LLC",
"906 Studio Architects LLC", "906 Studio Architects LLC",
"A & A Glass Co.", "Paragon Const.",
"A & E Farm", "A & E Farm",
"A & H GLASS", "C & C Contractors",
"A & J Homeworks,Painting, and Restoration", "A.W. Builders",
"A & K Construction Co.", "J. Jones Restoration",
"A & L Construction", "A & L Const.")
output <- df %>%
distinct(V1, V2, .keep_all = TRUE)
output <- df %>%
group_by(V1, V2) %>%
dplyr::mutate(row_number = dplyr::row_number()) %>%
filter(row_number == max(row_number))
Created on 2020-04-03 by the reprex package (v0.3.0)
I am very new to the community, so I'll apologize in advance.
I am trying to produce a list, resulting from V1 as the CRM ( reference ) comparing to V2 ( Leads ) Filter the duplicates out or even index them as such, leaving me with a list such as a V3 column, indicating anything in V2 that was not identified in V1.
I feel I've tried most everything related to duplicate removal, but I can't seem to fabricate the output I'm looking for.
I truly hope this helps and that my reprex is acceptable.