Importing Data and adding ID specific for different file sources

chris_turner · August 3, 2021, 3:02pm

I have one data frame containing the patient_id's matched with the names of the patients.

Each patient has his/her own data file FirstNameLastName.csv. In order to anonymize the data I wrote the function read_in which will read in each FirstNameLastName.csv and add the specified patient_id to it.

For further analysis I now want to have all anonymized data in one data frame object. I tried this using the map_df() function from the purrr package, however I am having problems matching the ID to each read in .csv file. Could somebody help fix that, such that the result is a data frame containing all the data with the respected ID.

> patient_names
  patient_id        patient_name  
1      1            Tina Turner
2      2            Michael Jackson 
3      3            Michael Jordan  
4      4            Dom Toretto
5      5            Lebron James

read_csv("LebronJames.csv")

Year         Injury                  
<chr>        <chr>                
2020       Sprained Ankle             
1990       Torn ACL       
1995       Bruised Knee       
2011       Sore Neck  
2014       Headache 
2019       Broken Leg

read_in <- function(path, patient_id= 1){
  data <- read_delim(path, delim= ";",col_names = TRUE)
  data <- add_column(data, patient_id= patient_names[["patient_id"]][id], .before = 1)
}

  patient_id       Year         Injury                  
       <int>       <chr>        <chr>                
 1      5          2020       Sprained Ankle             
 2      5          1990       Torn ACL       
 3      5          1995       Bruised Knee       
 4      5          2011       Sore Neck  
 5      5          2014       Headache 
 6      5          2019       Broken Leg

list.files(path= "/directory", pattern = ".csv", full.names = TRUE) %>%
  map_df(read_in)

# A tibble: 1234 x 3
    patient_id   Year    Injury
    <int>        <chr>   <chr>        
 1      1        2012    Ankle   
 2      1        2014    Broken Arm 
 3      1        1999    Concussion 
 4      1        1987    Broken Finger
...    ...       ...     ...

cactusoxbird · August 3, 2021, 3:09pm

It sounds like you might want to use the .id argument in map_df(). Would using map_df(read_in, .id = "id") do what you're hoping for?

It's not clear to me how the data frame you show in the final code block differs from what you want.

chris_turner · August 4, 2021, 8:09am

The last data frame is just there to show how it should look. The problem is the ID Argument can only specify columns not a Number that iterates.

nirgrahamuk · August 4, 2021, 9:37am

instead of map_df, use imap_df then within the function formula you can reference either name or position of iterator via the y shortcut

list.files(path= "/directory", pattern = ".csv", full.names = TRUE) %>%
  imap_df(~read_in(path = .x,
                   patient = .y))

system · August 25, 2021, 9:37am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.