Problem for loop with Janitor and Rlang

Hi everyone,

I'ts my first contact with the janitor and rlang packages.

I try to do a for loop with janitor and rlang but something is wrong, I don´t know what happend please help me to fix it.

My session info
version version R version 3.6.0 (2019-04-26)
os os Windows >= 8 x64
system system i386, mingw32

Packages' version:
dplyr * 0.8.2 2019-06-29 [1] CRAN (R 3.6.0)
janitor * 1.2.0 2019-04-21 [1] CRAN (R 3.6.0)
rlang * 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)

First of all I make a simple code with janitor's function all fine:

prueba <- mtcars %>% 
  tabyl(am,cyl) %>%
  adorn_totals(c("row", "col")) %>% 
  adorn_percentages("all") %>% 
  adorn_pct_formatting() %>% 
  adorn_ns(position = "front")

I saw the @mara ´s account in Twitter reply the rlang's new version and I would like to prove this and I mix janitor and rlang packages

(tab_freq <- function(data, v1, v2) {
  data %>%
    tabyl({{ v1 }}, {{ v2 }}) %>%
    adorn_totals(c("row", "col")) %>% 
    adorn_percentages("all") %>% 
    adorn_pct_formatting() %>% 
    adorn_ns(position = "front") %>% 
    print()
})
tab_freq(mtcars, am, cyl)

Good, I did it
But why not make a for loop with this?
And make something like this...

for(i in seq2(1,10)){
  prim <- names(mtcars)[i]
  secun <- names(mtcars)[i+1]
  (tab_freq(mtcars,{{prim}},{{secun}}))
}

Oh boy, what the...?
I don´t know how can I fix it :frowning:

Error: Strings must match column names. Unknown columns: cyl Call rlang::last_error() to see a backtrace

You've written the function to take unquoted expressions, but you're passing it strings. You can convert by wrapping in sym:

for(i in seq(1,10)){
    prim <- sym(names(mtcars)[i])
    secun <- sym(names(mtcars)[i+1])
    (tab_freq(mtcars,{{prim}},{{secun}}))
}

You're rigth @alistaire it works, thanks a lot!

Hey @alistaire another problem how can I assign diferents names to every (tab_freq)?
Please help me something like that:

for(i in seq(1,10)){
  prim <- sym(names(mtcars)[1])
  secun <- sym(names(mtcars)[i+1])
  assign(paste({{ sym(v1) }}, {{ sym(v2) }}, sep="")) <- (tab_freq(mtcars,{{prim}},{{secun}}))
}

assign is going down a bad road; unless you're working with environments, it encourages bad idioms. If you trade for loops for list mapping functions, the code gets easier. Stick the lists in list columns of a data frame, and it can be downright pretty. For example,

library(tidyverse)
library(janitor)

tab_freq <- function(data, v1, v2) {
    data %>%
        tabyl({{ v1 }}, {{ v2 }}) %>%
        adorn_totals(c("row", "col")) %>% 
        adorn_percentages("all") %>% 
        adorn_pct_formatting() %>% 
        adorn_ns(position = "front")    # no need to print here
}

Instead of writing a loop, you write a function to call on each element of the vector or list:

tab_mtcars <- function(v1, v2){
    tab_freq(mtcars, !!ensym(v1), !!ensym(v2))    # use ensym to substitute the string arguments
}

tab_df <- tibble(col1 = names(mtcars)[-length(mtcars)],    # make a data frame of cols
                 col2 = names(mtcars)[-1]) %>% 
    mutate(name = paste(col1, col2, sep = "_"), 
           tab = map2(col1, col2, tab_mtcars))    # map the function

tab_df
#> # A tibble: 10 x 4
#>    col1  col2  name      tab                
#>    <chr> <chr> <chr>     <list>             
#>  1 mpg   cyl   mpg_cyl   <df[,5] [26 × 5]>  
#>  2 cyl   disp  cyl_disp  <df[,29] [4 × 29]> 
#>  3 disp  hp    disp_hp   <df[,24] [28 × 24]>
#>  4 hp    drat  hp_drat   <df[,24] [23 × 24]>
#>  5 drat  wt    drat_wt   <df[,31] [23 × 31]>
#>  6 wt    qsec  wt_qsec   <df[,32] [30 × 32]>
#>  7 qsec  vs    qsec_vs   <df[,4] [31 × 4]>  
#>  8 vs    am    vs_am     <df[,4] [3 × 4]>   
#>  9 am    gear  am_gear   <df[,5] [3 × 5]>   
#> 10 gear  carb  gear_carb <df[,8] [4 × 8]>

Everything you want is already in that data frame, but you can put the names on the list column and pull it out if you like:

tab_list <- tab_df %>% 
    mutate(tab = set_names(tab, name)) %>% 
    pull(tab)

tab_list[8:9]
#> $vs_am
#>     vs          0          1       Total
#>      0 12 (37.5%)  6 (18.8%) 18  (56.2%)
#>      1  7 (21.9%)  7 (21.9%) 14  (43.8%)
#>  Total 19 (59.4%) 13 (40.6%) 32 (100.0%)
#> 
#> $am_gear
#>     am          3          4         5       Total
#>      0 15 (46.9%)  4 (12.5%) 0  (0.0%) 19  (59.4%)
#>      1  0  (0.0%)  8 (25.0%) 5 (15.6%) 13  (40.6%)
#>  Total 15 (46.9%) 12 (37.5%) 5 (15.6%) 32 (100.0%)

Note that nested tidy eval can go weird sometimes—if you try to write tab_mtcars as an anonymous function within mutate (which like tabyl uses tidy eval), weird things may happen. (Curiously, it's because the code is using expressions instead of quosures, which carry their environment with them in order to solve this exact problem.) Defining the function beforehand keeps the scopes separated and avoids the issue.

3 Likes

Mmm nice!!! Thanks @alistaire

Now I wanna export to Excel file every tabyl, I create next function:

printer <- function(tabla){
  write.xlsx( {{ tabla }}, file = "C:/Users/Leisdir Bernal/Documents/R_250818/Scripts/myworkbook.xlsx", append=TRUE)
}

This function put into this code

prueba <- tab_df %>% 
  select(tab) %>%
  map(printer)

It´s nice but Worksheet's name is "Sheet N°"
I want to create a Workbook which has like Worksheet's name...the column "name" of corresponding tab_df something like this:

printer <- function(tabla, hoja){
  write.xlsx( {{ tabla }}, file = "C:/Users/Leisdir Bernal/Documents/R_250818/Scripts/myworkbook.xlsx", sheetName = {{ hoja }}, append=TRUE)
}

I don´t understand yet this issue with rlang and purrr packages :frowning:

prueba <- tab_df %>% 
  select(name, tab) %>%
  map(tab, printer(!!ensym(name)))

write.xlsx doesn't use tidy eval (it takes strings, not unquoted names), so you don't need {{/ensym/rlang. You may need to pass two parameters to each call, though: the name and the data. You can use map2 to iterate in parallel across two vectors, if you pass them in correctly and write a function (likely anonymous) to handle them.

You can't pipe in a data frame, because then the data frame will become the first parameter and map will iterate over the columns, which is not what you want. (Piping into pmap actually would work, though you'd have to set up your function carefully.) You could write

map2(
    tab_df$tab, 
    tab_df$name, 
    function(tab, name) write.xlsx(x = tab, file = paste0(name, '.xlsx'), worksheet = name)
)

or with a purrr-style anonymous function

map2(
    tab_df$tab, 
    tab_df$name, 
    ~ write.xlsx(x = .x, file = paste0(.y, '.xlsx'), worksheet = .y)
)

Really, you could use walk2 because you don't care about the output, just the side-effect of writing to files. Tweak to your liking.

Sorry but I need put all tables in the same Excel file like some this:

map2(
  tab_df$tab, 
  tab_df$name, 
  ~ write.xlsx(x = .x, file = "All.xlsx", worksheet = .y, sheetName = .y, overwrite = T)
)

But I don´t know how can I do

That looks plausible, but I don't use openxlsx often. How is it failing?

It looks like there's an interface with addWorksheet, too, but to do that you may need to use reduce2 with an .init argument, which is a little harder to understand.

I understand @alistaire but do you help me with another way or package to solve this problem? please. I aprecciated for your help.

If you look at the examples of openxlsx, they show roughly what needs to happen, the difference being that you need to iterate.

A difficulty: Unlike the tidyverse, openxlsx functions mutate its workbook objects in place, meaning the output of a call is not important, but its side-effect of modifying an object that's stored in-memory somewhere is. For instance,

library(openxlsx)

wb <- createWorkbook()
wb
#> A Workbook object.
#>  
#> Worksheets:
#>  No worksheets attached

addWorksheet(wb, 'excellent_sheet')
wb
#> A Workbook object.
#>  
#> Worksheets:
#>  Sheet 1: "excellent_sheet"
#>  
#>  Worksheet write order: 1

Note that wb was not reassigned; it was modified just by calling a function on it. This is not usually how R works.

This actually makes iterating a little easier than I thought—you don't need reduce, since you don't care about the output of the call, just the side-effect. You can use walk, which is just like map, but doesn't return anything.

library(openxlsx)
library(purrr)

wb <- createWorkbook()    # start a workbook
walk(tab_df$name, ~addWorksheet(wb, .x))    # add the worksheets
walk2(tab_df$name, tab_df$tab, ~writeData(wb, .x, .y))    # write the data to the worksheets
saveWorkbook(wb, 'All.xlsx')    # save the workbook

The result:

Ha, just noticed that ?openxlsx::write.xlsx's x parameter is documented as

object or a list of objects that can be handled by writeData to write to file

which made me realize you can just pass it tab_list:

openxlsx::write.xlsx(tab_list, 'All.xlsx')

which does the same thing as the above. It even picks up the list's names as sheet names, which is nice.

2 Likes

Oh nice my friend !!!
I put this code you write me and here it is:

library(openxlsx)
library(purrr)

wb <- createWorkbook() 
walk(tab_df$name, ~addWorksheet(wb, .x)) 
walk2(tab_df$name, tab_df$tab, ~writeData(wb, .x, .y)) 
saveWorkbook(wb, 'All.xlsx') 

I got it...It works but the line you suggest me in the last message

openxlsx::write.xlsx(tab_list, 'All.xlsx')

This code It´s better than the other one, openxlsx is not much intutive

openxlsx::write.xlsx(tab_df$tab, sheetName = tab_df$name, 'All2.xlsx')

Really thank you so much @alistaire , I am a beginner Rstudio user yet but I hope to get better with more practice :slight_smile:

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.