Looping through excel worksheets to read data and export as separate files

Glove303 · August 18, 2020, 3:47pm

Hi,

I have the following code, which reads in data from an excel file named "sitn". It then merges and filters this with a table named "all_sum" and performs some basic calculations before exporting it.

I would like to adapt this code so that it will read in tables (each with identical formatting) from each worksheet in "sitn" and perform the same calculations and formatting, but then export a file for worksheet.

Basically, it will do the same as it does now, but save a lot of time as I don't have to rerun the process each time.

itn <- read.csv(sitn,stringsAsFactors = FALSE)

# first, convert relevant character data to numeric
jtdata$N <- as.numeric(jtdata$N)
jtdata$Av_jt <- as.numeric(jtdata$Av_jt)

# then calculate the average for each link and time period
all_sum <- group_by(jtdata,Select_t,link_id) %>%
    summarise(AVJT_all = weighted.mean(Av_jt,N)/100,
              obs_all = sum(N))

# merge in the ITN data
all_sum <- merge(all_sum,itn[,c("Link_id","Classifica","Lnklength")],by.x = "link_id", by.y = "Link_id",all.x = TRUE) %>%
  filter(Classifica!="N/A")

# add in speed
all_sum <- mutate(all_sum,
                  speed = (Lnklength/1000)/(AVJT_all/3600))

write.csv(all_sum, "JT_Averages_by_link.csv", row.names = F, quote = F)

Thanks.

cderv · August 18, 2020, 4:08pm

You are using a csv reader function ? Are they really xls files ?

To read / write excel files I would recommed readxl, writexl, openxlsx package.
You may be able to read from a specific range for your tables

Glove303 · August 18, 2020, 4:23pm

Good point, they are not xls, although they could be. Alternatively could I point to a folder with a list of csv files? There may be over 100 so it will save a lot of time to loop through them all rather than running the code each time.

cderv · August 18, 2020, 4:25pm

I would recommend the purrr package to iterate with a function

See the recent blog post

Glove303 · August 18, 2020, 7:24pm

Thanks, I've taken a look but to be honest I'm struggling to make sense of it all as I'm still new to this.

It seems to suggest I would need to create a list of all the files I need to read in, but could I not specify a folder and ask it to loop until it has read every file in the folder? And then to export each time to a new file named using the filename of the csv is read in?

EliMiller · August 18, 2020, 8:58pm

You can use the list.files function to do this. list.files("my_directory", full.names = TRUE) will return a character vector you can iterate over to open the files.

The following will create a list of data.frames that you can operate on with purrr functions.
my_df_list <- map(list.files("path/to/directory", full.names = TRUE), read.csv)

Once you have that you can perform the needed logic with purrr::map_dfc or purrr::map_dfr which will row bind or column bind your results respectively. purrr::map_dfr also takes an id for joining which it looks like you have the logic figured out already

system · September 8, 2020, 8:58pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.