library(tidyverse) library(reprex) df <- tibble::tribble( ~V1, ~V2, "5th Generation Builder", "5th Generation Builder, LLC", "5th Generation Builders Inc.", "5th Generation Builders", "89 Contractors LLC", "89 Contractors LLC", "906 Studio Architects LLC", "906 Studio Architects LLC", "A & A Glass Co.", "Paragon Const.", "A & E Farm", "A & E Farm", "A & H GLASS", "C & C Contractors", "A & J Homeworks,Painting, and Restoration", "A.W. Builders", "A & K Construction Co.", "J. Jones Restoration", "A & L Construction", "A & L Const.") output <- df %>% distinct(V1, V2, .keep_all = TRUE) output <- df %>% group_by(V1, V2) %>% dplyr::mutate(row_number = dplyr::row_number()) %>% filter(row_number == max(row_number))
Created on 2020-04-03 by the reprex package (v0.3.0)
I am very new to the community, so I'll apologize in advance.
I am trying to produce a list, resulting from V1 as the CRM ( reference ) comparing to V2 ( Leads ) Filter the duplicates out or even index them as such, leaving me with a list such as a V3 column, indicating anything in V2 that was not identified in V1.
I feel I've tried most everything related to duplicate removal, but I can't seem to fabricate the output I'm looking for.
I truly hope this helps and that my reprex is acceptable.