Yours is actually a common use case. It happens quite often with datasets with standardized structure like logs or what not that are partitioned into individual files by date or region or what not.
My preferred way to handle it is by offloading the data to local sqlite database (there are folks who swear by by Duck DB; the difference is very minor).
This is the way I do it:
- first get a list of all files in a directory;
{fs}
is of great help
- create a local sqlite database by connecting to it
- (re) creating a target table inside the database
- iterate over the vector of excels, reading each one in turn & inserting it into the database
- closing the connection as polite people do
Once done you can easily access the data either by using your favorite database manager (I suggest dbeaver) and / or as remote tables via {dbplyr}
.
As a piece of code for your inspiration consider this; you will have to tune the DDL part a bit according to your actual data.
library(dplyr)
library(DBI)
library(RSQLite)
library(readxl)
library(fs)
ddl_registrace <- "CREATE TABLE `registrace` (
`some_field` REAL,
`some_other_field` TEXT,
# some ddl here as required.
);
CREATE INDEX some_field_IDX ON registrace (some_field);"
# get list of all XLSX files in a directory
files <- fs::dir_info("./data", glob = "*.xlsx")
# connect the database; file will be created if not present
con <- DBI::dbConnect(RSQLite::SQLite(), "./data/auta.sqlite")
# out with the old...
result <- dbSendQuery(con, "drop table if exists registrace;")
dbClearResult(result)
# ... in with the new!
result <- dbSendQuery(con, ddl_registrace)
dbClearResult(result)
for (file in files$path) {
# read the file in
wrk_excel <- read_excel(file)
# append data to database
DBI::dbAppendTable(con, "registrace", wrk_excel)
}
DBI::dbDisconnect(con) # don't leave the connection hanging