Combining multiple Excel files with multiple sheets into one dataframe.

Hello,

I have multiple excel files that I need to combine into one dataframe.
The files have several sheets (exact same number of sheets, column headers, number of observations).

I have managed to merge them into one dataset based on the information in thise very helpful post:

My files are named 100.xlsx, 101.xlsx, 102.xlsx etc. When reading the files into R and combining them into one dataframe I would like to add a separate column named "subject ID" corresponding to the name of the individual files being uploaded.

This is what my code looks like so far:

install.packages("tidyverse", "readxl")

library(tidyverse)
library(readxl)

files <- list.files(
  path = "my path",
  pattern = ".xlsx",
  full.names = TRUE)

all_file_contents <- map(.x = files,
                         .f = ~ {
                           
                           temp_file <- .x
                           
                           map(.x = excel_sheets(.x),
                               .f = ~ read_excel(path = temp_file,
                                                 sheet = .x) %>%
                                 mutate(sheet_name = .x))
                         })

Data_collated <- transpose(all_file_contents) %>%
  map_df(.x = .,
         .f = ~ .x)

I have tried adding in the following at various locations but no luck yet...:

x$files = i

Any help would be greatly appreciated! Thanks in advance!

You can build on the examples provided here:
readxl Workflows • readxl (tidyverse.org)

1 Like

Thanks a lot for the reply!
I have managed to add file names to the data. However, now I cannot transpose the data to get all the sheets into one df, as two of my sheets have less observations. Before adding in the file names, this was not a problem and I deleted them after.
Is there a way to specify which sheets to extract from the files (in my code above) rather than R extracting all files?
I have tried adding in '2:7' at various places but it isn't happy with that :see_no_evil:

One way would be to filter after the excel_sheets()step.

2 Likes

I think you are close. I have a small recommendation. Write a function to read a single sheet read_tab <- function(filename, sheetname) that writes the sheetname as a column. Then write a function to read a single file that maps across the sheets. Then map that function over the files. In some cases you may need map2 or imap.

I sense that trying to do so much with the inline ~ functions is making the problem more difficult for you.

1 Like

Thank you. I've managed to make it work now :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.