extract some part of filnames and add them as value in columns

re123 · April 25, 2022, 2:38pm

Hi.
I have imported multiple excel files (more then 20) at once in R and I want to add values to columns in the dataframe based on the file name.

Let us say that the filename are the following: T1_123, T1_222 , T1_333 , T2_123, T2_222, T2_333 , T3_123, ect...

I want to create two column where the row in column_1 will show the T1, T2, T3.

So a row will show T1 if these row are from the file T1_123, T1_222 , T1_333
A row will show T2 if these row are from the file T2_123, T2_222, T2_333

column 2 should show 123 or 222 or 333 also based on the name of the file.

I have already imported all the files

#import all the excel files
file.list <- list.files(path = "xxxxxxxxxx",    
                       pattern = "*.xlsx",
                       full.names = TRUE) 

alldata<-file.list %>%
  map_dfr(~read_excel(.x)%>%
  mutate_all(as.character))

I need a solution that can be generalized to a large number of variables
How can I do this?
Thanks in advance

xvalda · April 25, 2022, 3:31pm

Hi @re123 ,

Does this work for you?
I didn't get the part about generalizing to a large number of variables. Did you mean large number of excel files to read?

library(tidyverse); library(fs)

#create listing (fs_path object) of relevant files to read
excel_files <- dir_ls(regexp = "^test.*\\.xlsx$")
#read all files and store their name in `source` variable
all_files <- excel_files %>% 
  set_names() %>% 
  map_dfr(read_excel, .id = "source")
#remove extension of the file name and separate into two columns
all_files %>% 
  mutate(source = str_remove(source, "\\.xlsx$")) %>% 
  separate(source, into = c("test", "ID"), sep = "_")

Hope it helps.

re123 · April 25, 2022, 4:42pm

Thanks @xvalda .
I have update the original post with a better example of the file name.
I also get an error about the combination of some columns .

Error in dplyr::bind_rows():
! Can't combine T1_123.xlsx$Column_A <<character.>> and T1_222.xlsx$Column_A <double.>.
Run rlang::last_error() to see where the error occurred.
There were 50 or more warnings (use warnings() to see the first 50)

In my original post I used the following so that all the columns will be to character. Is it possible to add mutate_all(as.character)) to your suggestion?

alldata<-file.list %>%
  map_dfr(~read_excel(.x)%>%
  mutate_all(as.character))

xvalda · April 26, 2022, 9:30am

Hi @re123

It should be working, can you try again like this?:

library(tidyverse); library(fs); library(readxl)
excel_files <- dir_ls(regexp = "^test.*\\.xlsx$")
all_files <- excel_files %>% 
  set_names() %>% 
  map_dfr(~read_excel(.x) %>% mutate_all(as.character), .id = "source")
all_files %>% 
  mutate(source = str_remove(source, "\\.xlsx$")) %>% 
  separate(source, into = c("test", "ID"), sep = "_")

system · May 3, 2022, 9:30am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.