Combining Multiple Excel Files & Filling a Column from Cell

I am working on combining multiple Excel files of statistics about several countries. The excel file already comes in a standard format where the excel file name is Country1.xlsx, Country2.xlsx, Country3.xlsx ... and so on.

Here's an example of how one of the files look like for Country1.xlsx, that is Jordan:
image

The name of the country in all of the files sets always on cell A1 in Excel. I use this code to combine the excel files into one data frame which works well:

library(tidyverse)
library(dplyr)
library(readxl)
library(purrr)
library(stringr)

Excel_dir <- "<MyExcelDirectoryPath>"

df_combined_long <- 
               dir_ls(Excel_dir, regex = "[.]xlsx$") %>%
               set_names(path_ext_remove(path_file(.))) %>%
               map_dfr(read_excel, .id = "CountryName")

.id = "CountryName" will set the first column of the data frame to the name of the excel file for each combined file.
However, I would like the observations of this column to be the country name from each file; the one setting in cell A1. Manually, I can rename each excel file to the perspective country, but this would be for a large number of files and this is how they're formatted from their original source.

This is how I would like the final data frame to look like, as an example:

df_combined_long

CountryName     Year     Population     Migrants     Median Age
Jordan          2020     10203134         10220          23.8
Jordan          2019     10101694         10220          22.4
...             ...      ...              ...            ...
...             ...      ...              ...            ...
Palestine       2020     1235461          6325           20.6
Palestine       2019     1267891          6035           20.9
...             ...      ...              ...            ...
...             ...      ...              ...            ...

Is there a way to set the first column in the data frame based on a value in a certain cell from the excel sheet?

Use an anonymous function to feed to purrr::map_dfr instead of just using read_excel. I would do something like this:

map_dfr(
    flist, # your vector of file names
    function(file) {
        raw_data <- read_excel(file)
        country_name = names(raw_data[1])
        col_names <- c('CountryName', raw_data[3, ]) %>% 
            sapply(c)
        real_data <- raw_data[4:nrow(raw_data), ] %>% 
            mutate(
                country_name = country_name
            ) %>% 
            relocate(
                country_name,
                .before = everything()
            )
        names(real_data) <- col_names
        real_data
    }
)
1 Like

Thanks very much! Works like magic :sunglasses:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.