Hello, Dear all:
I am reading this 78K patient file. Each record is for one patient with one unique ID. When I open in Excel. It looks ok.
But when I read into R studio.
df <- read.table("patient_level_043020.txt", sep="\t", header=TRUE)
When I count the unique ID or just ID. I got different numbers. That means in the readin df, there are duplicated IDs. I output some duplicated IDs, and they are new ID. Not existing in the original file at all. This happens in the past 3 days. Before it seems everything is ok. The patient ID is 17 digit number. Anyone experiences something similar?
length(unique(df$patientId))  77191
length((df$patientId))  78379