Writing to multiple worksheets in an Excel workbook

Hi,

I would like to write some code that opens an excel workbook and then goes through every worksheet (which will have identically formatted tables) to perform a merge from a master table (that is held in R) so that for every worksheet, a cross reference is made.

In other words, for each worksheet a lookup is made for each row in column A to return a value from the master table (in R) and enter it into column F.

I am fairly new to this, but I have started creating some code that will open the excel workbook, and I can perform a merger to make the lookup. However, I am stuck with how to cycle through all worksheets and also how to write to them.

Thanks.

How did you read from the worksheets? If you use readxl you can list the available sheets with excel_sheets(), and simply open them one at a time.

As to how to loop, either you use a for loop or you put your operation in a function to call with map. You can see here for more information.

For writing, I don't think readxl writes in Excel (you'd have to write to csv, but then you can't have multiple sheets). A package that should work is openxlsx, but I don't think it can tell you how many sheets an Excel file has.

So easiest solution could be:

nb_sheets <- readxl::excel_sheets(filename)
wb <- createWorkbook(new_filename)

for(sheet in seq_len(nb_sheets)){
  content <- openxlsx::read.xlsx(filename, sheet = sheet)
  #do stuff...
  
  openxlsx::addWorksheet(processed_data, wb)
}

Thanks, I'm having a play with this. I installed the openxlsx package but I get the error message that it can't find the function createWorkbook.

With the code you have provided, can I just check what this is doing please:

  1. Opening and reading a workbook (I replace "filename" with the file I want it to open - do I need to provide the path or will it default to look in the folder the R script is in?
  2. Creates a new workbook with the name "new_filename".
  3. Loops through the worksheets in "filename". I'm not too sure what seq_len and nb_sheets are here though, do these not need to be defined?
  4. Not too sure about content either, is this defined as whatever the current sheet is?
  5. Then obviously I add the code to merge the data
  6. Finally the code adds a new worksheet with the merged data, but into which workbook as I haven't seen reference again to the new_filename workbook?

Apologies for the basic questions, I'm familiar with VBA but very new to R and struggling to get my head around it, despite hours of googling!

Yes, at the beginning it's normal. If you have some time I highly recommend reading through the first chapters of R for data science, the book structure makes it more efficient than googling around problem after problem.

Both nb_sheets and content are defined in the script. The <- operator assigns a variable, could be replaced by = as in most other languages.

Sorry, my previous script was inexact as I aimed to give a general idea of the process. Here is a much better one that works on a real file:

library(openxlsx) # so that we don't need to write it everytime

# define file names
input_path <- "my_input.xlsx"
new_filename <- "my_output.xlsx"

# what are the names of the sheets in that file? We get a vector
sheet_names <- getSheetNames(input_path)
sheet_names
#> [1] "Sheet1" "Sheet2" "Sheet3"

# Initialize an output workbook. We store it in the 'wb' object
wb <- createWorkbook()
wb
#> A Workbook object.
#>  
#> Worksheets:
#>  No worksheets attached

# Now let's select the first sheet
cur_sheet <- sheet_names[1]

# Read its contents
content <- read.xlsx(input_path, sheet = cur_sheet)
content
#>    colA1    colB1
#> 1    170 1.636704
#> 2     80 1.299625
#> 3     42 1.157303
#> 4     93 1.348315

# We can do whatever operations we want:
content$colA1 <- content$colA1 + 1

# And add it to the Workbook we are storing in "wb"
addWorksheet(wb = wb, sheetName = cur_sheet)
writeData(wb, sheet = cur_sheet, x = content)

# Note that at this point, we are only modifying the "wb" object, we haven't actually written anything to disk
# When we are finished, we can save the whole "wb" object at once

saveWorkbook(wb, new_filename)

Now that's for a single sheet, but that let's you do things step by step and look inside the objects at each step to understand what's happening. Once you want to do it for all sheets, you can use a for loop:

for(cur_sheet in sheet_names){
...
}

At each iteration of the loop, cur_sheet will take the next value from sheet_names. So it's equivalent to do:

cur_sheet <- sheet_names[1]
# do stuff
cur_sheet <- sheet_names[2]
# do stuff
cur_sheet <- sheet_names[3]
...

And finally, it's very informative to look at the documentation of any function. You can get it easily with ?name_of_function. For example, try ?saveWorkbook.

This is absolutely brilliant, thanks so much for taking the time to explain it fully. I've been having a play this afternoon and have set up a dummy workbook with tables on multiple tabs, and this code now produces a new workbook where each tab has a table with a new column that R has calculated based on data from the other columns.

I've also started taking a look at R of data science, thanks for providing that link, it looks like a really good place to start.

So if I wanted to merge in data to each table from a table in R I could do something like the following:

cur_sheet <- merge(cur_sheet,temp_data[,c("Link_ID","Time","Speed")],by.x = "Link_ID", by.y = "Link_ID",all.x = TRUE)

I feel like this is close but it's not working at the moment.
temp_data is the master table stored in R and Link_ID is the cross reference.

From the code I wrote above, cur_sheet is the name of the current sheet. It's the contents that you want to merge.

Ah yes, of course! Thanks again.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.