Importing multiple excel files and turn them into long format

goodMan909 · October 25, 2020, 10:24am

Hi, I'm new to r and have a question about data input. Any suggestion is greatly appreciated.

I am trying to read and merge multiple excel files into a data frame.
Each of the excel file is a time series of a variable, but it is in the following format:

I would like to input 50 of these excel files and create a data frame in the format that each row contains a time series variable, and the columns are the time frame: 2003-12, 2004-01, 2004-02, ...... 2020-05.

I was wondering what should I do to read and merge these files and at the same time turn them from wide format into long format?

Thank you in advance!

paul · October 25, 2020, 11:21am

Hi, the code below should do everything you need assuming each excel file is the same as your screenshot, the data is in sheet 1 of each file and all the files are in one directory.

I added comments but let me know if you have any questions!

library(tidyverse)
library(readxl)
library(fs)

# path to directory containing excel files
excel_dir <- here::here("excel_dir/")

df_combined_long <- 
  # list all the excel file paths in the directory (assuming they are xlsx files not xls)
  dir_ls(excel_dir, regexp = "[.]xlsx$") %>% 
  # name the list with the filename portion of the path with no file extension
  # this will added as a variable in the final dataframe 
  set_names(path_ext_remove(path_file(.))) %>% 
  # read each file and combine together into single df
  # set a 'variable' column with the value of the filename created above
  map_dfr(read_excel, .id = "variable") %>% 
  # convert to long format with a month and value column
  pivot_longer(cols = c(-variable, -Year), names_to = "month", values_to = "value") %>% 
  # combine year and month columns into a single date column
  unite(date, Year, month, sep = "-") %>% 
  # convert date column to R date format
  mutate(date = parse_date(date, format = "%Y-%b"))

goodMan909 · October 25, 2020, 1:59pm

Hi, thank you so much for the detailed comments. I really appreciate it.
But the excel files actually have some headings. Sorry for not clarifying that in my original post.

I guess because of this, I received error message:
Error: Can't subset columns that don't exist. x The column Year doesn't exist.

I was wondering how should I specify the selection range from the excel? (The range is the same for all files.)

Thank you again for your help.

paul · October 25, 2020, 2:27pm

Ah ok, so you can actually pull out the information in the header using cell references and add it to the data frame when reading each file, then supply a skip = 10 argument to read the main data table starting from row 11.

Like this (assuming the first cell in your screenshot is A1):

library(tidyverse)
library(readxl)
library(fs)

# path to directory containing excel files
excel_dir <- here::here("excel_dir/")

df_combined_long <- 
  # list all the excel file paths in the directory (assuming they are xlsx files not xls)
  dir_ls(excel_dir, regexp = "[.]xlsx$") %>% 
  # read each file and combine together into single df
  map_dfr(~{
    series_id <- read_excel(path = .x, range = "B4", col_names = FALSE) %>% pull()
    series_label <- read_excel(path = .x, range = "B5", col_names = FALSE) %>% pull()
    industry <- read_excel(path = .x, range = "B6", col_names = FALSE) %>% pull()
    product <- read_excel(path = .x, range = "B7", col_names = FALSE) %>% pull()
    
    read_excel(.x, skip = 10) %>% 
      mutate(
        series_id = series_id,
        series_label = series_label,
        industry = industry,
        product = product
      ) %>% 
      select(series_id, series_label, industry, product, everything())
  }) %>% 
  # convert to long format with a month and value column
  pivot_longer(cols = Jan:Dec, names_to = "month", values_to = "value") %>% 
  # combine year and month columns into a single date column
  unite(date, Year, month, sep = "-") %>% 
  # convert date column to R date format
  mutate(date = parse_date(date, format = "%Y-%b"))

goodMan909 · October 25, 2020, 3:21pm

Wow this worked perfectly. I still need to study it line by line, but thank you very very much!!

system · November 1, 2020, 3:21pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.