insert part of file name as a column value when multiple trials for each subject.

Hello,
I am a novice in R and I would like some help. I have an .xlsx data file for each subject with 288 trials each. Before I merge all these excel files together I want to add the subject number which is part of each file name (e.g, subject-103) as a new column next to each trial. So for each of the 288 trials the subject number should be listed.

How can i do it?

Thank you in advance.

You can do this using tidyr::separate(), which allows you to split a character column into multiple columns, and you'd keep the original column (if you wish to do so) be setting the remove argument to FALSE (remove = FALSE).

You could also use substr() or stringr::str_extract() to take the part of the file name (the number) and create a new variable.

Here's an example using substr() to choose the last three characters of each of the filename variables:

    library(tidyverse)
    example <- tibble::tribble(
          ~filename,
      "subject-103",
      "subject-222",
      "subject-042",
      "subject-236"
      )

    example <- example %>%
      mutate(subject_number = substr(filename, 9, 11))

    example
    #> # A tibble: 4 x 2
    #>   filename    subject_number
    #>   <chr>       <chr>         
    #> 1 subject-103 103           
    #> 2 subject-222 222           
    #> 3 subject-042 042           
    #> 4 subject-236 236

Created on 2020-09-15 by the reprex package (v0.3.0.9001)

Dear @mara thank you for your response.
Ok, here is the case: i have around 123 subjects and each subject has 288 trials. The trials for each subject are saved in a separate excel file.
So, i have the following questions:

  1. Do i need to import in R-studio all the separate excel files, add the extra column with the subject number (using the script you have suggested) in each file and then merge all the files together?
    or
  2. Is there a way to direct the system to the main folder with all the individual excel files and ask to do the same thing as above, that is create a new column adding in each of the 288 rows the subject id based on the file name and then merge the files together once it is done with the previous process?

Thank you in advance,
Maria

You can do this all with a script. There are many ways to do this in R, but the how-to using the purrr package is described very well in this post, the only difference for you is that you are reading in an .xlsx file (I assume) instead of .csv.


You can read and combine them together using purrr::map_dfr() (again, the post has all the details), and save the source (i.e. the filename) by using the .id argument. At that point you'll have all the data in one data frame, and you can extract the subject id from the column you created from .id.
1 Like

I do something like this all the time:

library(tidyverse)
library(readxl)

dir.list <- dir()[grepl("subject", dir())] # if you have your data files is separate directory that can be cleaner, use paste0 to add the directory to the filename

all.data <- dir.list %>% map_df(.f=function(FILE){
data.out <- read_excel(FILE) %>%
mutate(subject = str_sub(FILE, 9,11))
return(data.out)
})

Dear @mara and @Wendell_Miyaji thank you for your responses.