I desperately need help with a code for consolidating patient info. I need to write a code that merges some info is a name has been inputted more than once, but replaces duplicate info into the same row!

So far I have gotten this far;
merged_data <- Mock_spreadsheet %>%
group_by(pt name) %>%
summarize(notes_merge = paste(notes, collapse = ","), date = paste(date of contact, collapse = ","))

However, this only works on the columns i need merged, but exludes columns where the info is the same( such as contact info ect).
Data looks kinda like this: I need for example Eva to be merged by status, notes and dates, everything else can be replaced since its the same... Hope this makes sense, I would really appreciate any help.

pt name contact # ins. Provider date of contact Referral ReferralName Status staff notes

1 eva malso 123.456.7890 cigna 1.2.21 RRRC Tyler BOV initiated KO pt called to ask about some services. Ins info provided. Billing Solutions contacted.
2 mark smith 234.567.8901 bcbs 1.2.21 DS Susie Car1 info KO Pt interested in some night sessions. Info provided. No ins. Collected.
3 john doe 345.678.9012 united 1.3.21 CP Jack LOL initiated SB pt interested in medicine. Ins info provided. Billing solutions contacted.
4 eva malso 123.456.7890 cigna 1.4.21 RRRC Tyler ASC intiated SB Deductible of $14. Met $21.34. company process initiated...

Similar issue in this thread

1 Like

I made this into a light reprex and took a shot at it. Basically, what it does is:

  • Removes pt (that looks to be a row number in your example; if it's a patient ID that would actually be the same value for "eva malso", then it could be part of the group_by, too)
  • For the summarizing, it evaluates each column and gets the number of unique values
  • If the total unique values (the length) is 1, then it uses the first value for the summary
  • Otherwise, it pastes all of the values and collapses them with a " , " separator

This works for the example you provided. What it won't do is gracefully handle a scenario where, for instance, there are 3 records for one patient, and there are, say, two unique contact numbers. It would wind up doing the "paste" and collapsing all three...even though two of them are duplicates of each other. But, it's not clear if you would expect that to happen with much frequency. That might be addressable with a nested grouping within the summarize_at...but could get a little tricky.

library(tidyverse)

# Create a date frame to work with
df <- data.frame(
  pt = c("1", "2", "3", "4"),
  name = c("eva malso", "mark smith", "john doe", "eva malso"),
  contact_num = c("123.456.7890", "234.567.8901", "345.678.9012", "123.456.7890"),
  ins_provider = c("cigna", "bcbs", "united", "cigna"),
  date_of_contact = c("1.2.21", "1.2.21", "1.3.21", "1.4.21"),
  referral = c("RRRC", "DS", "CP", "RRRC"),
  referral_name = c("Tyler BOV", "Susie Car1", "Jack LOL", "Tyler ASC"),
  status = c("initiated", "info", "initiated", "intiated"),
  staff = c("KO", "KO", "SB", "SB"),
  notes = c("pt called to ask about some services. Ins info provided. Billing Solutions contacted", 
            "Pt interested in some night sessions. Info provided. No ins. Collected.", 
            "pt interested in medicine. Ins info provided. Billing solutions contacted.", 
            "Deductible of $14.00 Met $21.34. company process initiated")
)

# Do the de-duplication
df2 <- df %>% 
  select(-pt) %>% 
  group_by(name) %>% 
  summarize_all(~ if_else(length(unique(.x)) == 1, 
                          first(.x),
                          paste(.x, collapse = ", ")))
1 Like

Thank you so much, this works perfectly. I truly appreciate it, this is brilliant!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.