Sidenote: those ID numbers are reeeeaaaaaally long (19 digits, I count). So long, in fact, that R's
integer type probably can't hold them, and the
double class (which is definitely not appropriate for storing ID numbers) can only represent them to about 15 to 17 significant figures:
The 53-bit significand precision gives from 15 to 17 significant decimal digits precision.
read_csv is reading these ID numbers in as doubles, you're likely losing some of those figures in the import process, which could explain why you're getting so few duplicates. I would've thought you'd get an error or warning if that was happening, but it's hard to tell without seeing what the data frame, read into R, looks like.
If you can't make these ID numbers smaller and you need to use them to find duplicates, maybe have
read_csv read them in as strings using them
EDIT: Excel also uses double precision, so if you read your data into Excel and let Excel assume this column is numeric, it'll also probably ruin the numbers.