I have two dfs each with >100K rows that I am looking to combine into one. Prior to merging them I wanted to look for any potential redundancies between the two dfs. I went through the process of modifying each one so that each df is identical to the other, and then did an inner join using ~6 columns to check for redundancies in entries based on those criteria. The result was a new df with >23K rows. I'm relived that that worked, but is there a way to efficiently process that 23k+ df ? I feel like manually comparing it to each individual df is a bit cumbersome and could leave me prone to errors of omission/repetition in the final combined df. I was wondering if anyone had any tips or resources on how to go about comparing large datasets like this? I know this is relatively small by some comparisons, but it's the largest dataset that I've worked with so far, and I'd love to learn from people with more experience.
Thanks in advance for your time!