importing multiple untidy excel sheets and variables

Hi! I am struggling with the import of a large and very chaotic excel database. Already tried many functions like readxl etc. but it seems like my database is too complicated.

  • i have 96 worksheets in the file (each is one month, database includes multiple years)
  • i would like to have them in one large dataframe/tibble
  • columns are not always the same. range is from 4 to 10 columns per sheet.
  • i would like to import 4 columns per sheet, but their headers are not precise. eg. "DOB" and "Date of Birth" etc.

Does someone know a workaround in readxl or similar to specify the columns it want to import?
Like providing a vector to look out for while importing?
E.g. column "DOB" in R should contain all variables from columns "DOB", "Date of Birth" or "Birth"
column "Operation" in R should contain all variables from columns "OP", "Operation", or "Procedure" and so on...

Thanks so much in advance! Greets from Innsbruck!

If you can't fix the problem using Excel Find & Replace, which is where I would start if I had Excel, can you import every sheet, fix the column names and then select the columns you want? The work flow would look something like the following except that I only fixed two columns.

library(dplyr, warn.conflicts = FALSE)
library(stringr)
DF <- data.frame(DOB = 1:4, NotGood = 2:5, OP = 1:4)
DF
#>   DOB NotGood OP
#> 1   1       2  1
#> 2   2       3  2
#> 3   3       4  3
#> 4   4       5  4

NMS <- colnames(DF)
NMS <- str_replace(NMS, "Birth|DOB|Date of Birth", "DateOfBirth")
NMS <- str_replace(NMS, "Operation|Procedure|OP", "Oper")

colnames(DF) <- NMS
DF <- select(DF, DateOfBirth, Oper)
DF
#>   DateOfBirth Oper
#> 1           1    1
#> 2           2    2
#> 3           3    3
#> 4           4    4

Created on 2021-03-24 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.