Filter by unique category and save as uniqueCategory.xlsx in a folder

I have a dataframe as provided below;

ID Department Strength DoD Group_ID
82950 Training 96 953 A
32045 Training 86 5277 A
84817 Sales 77 1633 C
2857 Logistics 88 5629 Z
55587 Training 46 6710 L
73103 Sales 51 1892 M
5837 Sales 62 2394 A
7140 Sales 81 2854 C
70450 Logistics 56 5449 A
34207 Manufacturing 85 1332 S
20095 Manufacturing 58 4589 X
30200 Sales 91 1094 W
47211 Manufacturing 36 5862 C
8932 Manufacturing 75 4847 A
82663 Manufacturing 39 5798 A

I do this activity everyday manually filter column category one by one, copy filtered data, open a new .xlsx file, name it by the filtered category name, password protect and save it in a folder.

edit: This is not a groupby summarise task but getting subset by their categories.

Is there a way to automate this process ?

Many thanks

Short answer: For sure is this possible. Depending on the exakt specs of your task you could read your xl file using e.g. readxl::read_excel, loop over your categories and filter the data accordingly, store the filtered datasets in a named list, loop over list to write the results to separate and password protected xl files.

@stefan1, ok but a bit confused. I have data in a dataframe already (shown in table above). Then is there a reason why I should be using read_excel ?
Kindly clarify

Okay. I see. My bad. Thought you were doing your whole workflow in Excel.

no worries.
I'm currently exploring dplyr but there should be a way to perform this.

I'm awaiting to see if someone else can resolve this.

This is an example of how to do it, this produces independent password protected xlsx files by each unique value on the "Department" column.

library(dplyr)
library(purrr)
library(xlsx)

# Sample data on a copy/paste friendly format
sample_df <- data.frame(
  stringsAsFactors = FALSE,
                ID = c(82950,32045,84817,2857,
                       55587,73103,5837,7140,70450,34207,20095,30200,47211,
                       8932,82663),
        Department = c("Training","Training","Sales",
                       "Logistics","Training","Sales","Sales","Sales",
                       "Logistics","Manufacturing","Manufacturing","Sales",
                       "Manufacturing","Manufacturing","Manufacturing"),
          Strength = c(96,86,77,88,46,51,62,81,
                       56,85,58,91,36,75,39),
               DoD = c(953,5277,1633,5629,6710,
                       1892,2394,2854,5449,1332,4589,1094,5862,4847,5798),
          Group_ID = c("A","A","C","Z","L","M",
                       "A","C","A","S","X","W","C","A","A")
)

sample_df %>% 
    group_nest(Department) %>% 
    walk2(.x = .$Department,
          .y = .$data,
          .f = ~ write.xlsx(x = .y,
                            file = paste(.x, ".xlsx"),
                            password = "1234")
          )
1 Like

@andresrcs, it works for the sample_df but when I tried this with the actual dataset, error message below throws up;

Error in .jnew("java/io/FileOutputStream", jFile) :
java.io.FileNotFoundException: C:\Users\timetraveller\Desktop\Analysis\1011""AEROGFDTECH, INC"" .xlsx (The filename, directory name, or volume label syntax is incorrect)

I did some research to fix this:

Do you think, we may need to do some modifications to our code ?
Awaiting your response.

The path that gets generated has invalid characters, you would need to make some text cleaning first.

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.