Importing Data and adding ID specific for different file sources

I have one data frame containing the patient_id's matched with the names of the patients.

Each patient has his/her own data file FirstNameLastName.csv. In order to anonymize the data I wrote the function read_in which will read in each FirstNameLastName.csv and add the specified patient_id to it.

For further analysis I now want to have all anonymized data in one data frame object. I tried this using the map_df() function from the purrr package, however I am having problems matching the ID to each read in .csv file. Could somebody help fix that, such that the result is a data frame containing all the data with the respected ID.

> patient_names
  patient_id        patient_name  
1      1            Tina Turner
2      2            Michael Jackson 
3      3            Michael Jordan  
4      4            Dom Toretto
5      5            Lebron James

read_csv("LebronJames.csv")

Year         Injury                  
<chr>        <chr>                
2020       Sprained Ankle             
1990       Torn ACL       
1995       Bruised Knee       
2011       Sore Neck  
2014       Headache 
2019       Broken Leg 
read_in <- function(path, patient_id= 1){
  data <- read_delim(path, delim= ";",col_names = TRUE)
  data <- add_column(data, patient_id= patient_names[["patient_id"]][id], .before = 1)
}

  patient_id       Year         Injury                  
       <int>       <chr>        <chr>                
 1      5          2020       Sprained Ankle             
 2      5          1990       Torn ACL       
 3      5          1995       Bruised Knee       
 4      5          2011       Sore Neck  
 5      5          2014       Headache 
 6      5          2019       Broken Leg 
list.files(path= "/directory", pattern = ".csv", full.names = TRUE) %>%
  map_df(read_in)

# A tibble: 1234 x 3
    patient_id   Year    Injury
    <int>        <chr>   <chr>        
 1      1        2012    Ankle   
 2      1        2014    Broken Arm 
 3      1        1999    Concussion 
 4      1        1987    Broken Finger
...    ...       ...     ...

It sounds like you might want to use the .id argument in map_df(). Would using map_df(read_in, .id = "id") do what you're hoping for?

It's not clear to me how the data frame you show in the final code block differs from what you want.

The last data frame is just there to show how it should look. The problem is the ID Argument can only specify columns not a Number that iterates.

instead of map_df, use imap_df then within the function formula you can reference either name or position of iterator via the y shortcut

list.files(path= "/directory", pattern = ".csv", full.names = TRUE) %>%
  imap_df(~read_in(path = .x,
                   patient = .y))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.