Trying to combine multiple .xlsx files into a single dataframe, using a key value to signify specific cell ranges. Then converting from long to wide.

Pandamonium · September 15, 2022, 2:31pm

I'm fairly new to R so excuse any shortsightedness here.

I have about 57 .xlsx files sitting in the same directory. They all have the same structure and formatting, with some of the cell ranges off by 1 or 2 between files, and I don't want to combine all of the data found in each file, I only want to combine specific cell ranges from each file into a Master Excel file as output later on.

This workflow will be something I spend a lot of time doing every month, so I'd like to write some R code to automate most of the manual work.

I need help figuring out a way to ingest all of the individual Excel files, selecting specific cell ranges based on 4 unique key values (the key values are the same cost codes across all files, but represent 4 different ranges (54, 55, 56, 58) in each file), combining them into a single dataframe, then pivoting from long to wide.

I basically need to isolate these 4 cell ranges and drop the rest of the file's data, then combine and pivot each file's extracted ranges into a wide format where each unique file represents one observation in the new dataframe. A new column that provides the original filename would be great too.

ANY help would be greatly appreciated.

All data is Double. All data is found in 1 sheet. The 4 row ranges I need to keep are rows where the key value is 54, 55, 56 and 58. In every file.

After a day of researching, this is all I've got so far and I just can't make any progress:

# Loads your packages
library(readxl)
library(tidyr)
library(readr)
library(dplyr)

# Sets the file path for your files to be loaded from
setwd(directory)

# Creates a list in RStudio of all of the files for you to check against
my_files <- list.files(pattern = "*.xlsx")
read_excel('filename.xlsx', col_types = "numeric", col_names = TRUE, 
           range = "A38:C38", "A51:C51", "A75:C75", "A102:C102")

read_excel('filename', sheet =1, col_names = TRUE, 
           range = "A37:C37", "A50:C50", "A75:C75", "A102:C102")

DavoWW · September 16, 2022, 1:24am

Hi @Pandamonium,
Unless your 57 files are HUGE I think your best approach is to read in the whole sheet from each file, and then do the subsetting within R before joining and pivoting. It’s hard to provide further help unless we have access to the data files.

system · October 7, 2022, 1:24am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.