Are verbose pipelines possible?

paul-james · January 29, 2018, 9:47pm

For longer dplyr pipelines (time long, not code long), is there a way to insert messages to make the progress verbose?
I run a lot of scripts by calling $ Rscript my_script.R from the command line and I use the message() functions often.
magrittr's tee (%T>%) operator doesn't work for this purpose (aside: I wanted to use a magrittr tag for this topic, but one isn't available).

# 01: Currently what I'm doing --------------------------------

message(sprintf("doing a bunch of things to %s, be back in a few...", fname))

file.path(
      data_path
    , "feathers"
    , fname
  ) %>%
  read_feather(
    columns = keep
  ) %>%
  filter(
    sample_col == "processed"
  ) %>%
  mutate(
    region = RGN %>% tolower() %>% trimws()
  ) %>%
  left_join(
      .
    , region_df
  )

message("finished now, thanks for waiting on me")

# 02: What I'd like to do (comment lines for emphasis) --------

file.path(
      data_path
    , "feathers"
    , fname
  ) %>%
  #
  message(sprintf("reading file: %s", fname)) %>%
  #
  read_feather(
    columns = keep
  ) %>%
  #
  message("...filtering and prepping for join...") %>%
  #
  filter(
    sample_col == "processed"
  ) %>%
  mutate(
    region = RGN %>% tolower() %>% trimws()
  ) %>%
  #
  message(sprintf("joining: %s with %s", fname, region_df_name)) %>%
  #
  left_join(
      .
    , region_df
  )

elben10 · January 30, 2018, 2:28pm

I think this will solve your problem.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(feather)

data_path <- tempdir()
dir.create(file.path(data_path, "feather"))

keep <- c("a", "sample_col", "RGN")
fname <- "file1.feather"

region_df <- tibble(a = sample(1:10, 50, TRUE),
                    d = sample(letters, 50, TRUE))

message_ <-  function(df, msg) {
  cat("This is a message:\n", msg, "\n")
  df
}

tibble(a = sample(1:10, 50, TRUE),
       RGN = sample(c(TRUE, FALSE), 50, TRUE),
       sample_col = sample(c("processed", "non-processed"), 50, TRUE)) %>%
  write_feather(file.path(data_path, "feather", fname))

file.path(data_path, "feather", fname) %>%
  message_(sprintf("reading file: %s", fname)) %>%
  read_feather(columns = keep) %>%
  message_("...filtering and prepping for join...") %>%
  filter(sample_col == "processed") %>%
  mutate(region = RGN %>% tolower() %>% trimws()) %>%
  message_(sprintf("joining: %s with %s", fname, as.character(quote(region_df)))) %>%
  left_join(region_df, by = "a")
#> This is a message:
#>  reading file: file1.feather 
#> This is a message:
#>  ...filtering and prepping for join... 
#> This is a message:
#>  joining: file1.feather with region_df
#> # A tibble: 126 x 5
#>        a sample_col RGN   region d    
#>    <int> <chr>      <lgl> <chr>  <chr>
#>  1    10 processed  T     true   v    
#>  2    10 processed  T     true   f    
#>  3    10 processed  T     true   v    
#>  4    10 processed  T     true   s    
#>  5    10 processed  T     true   i    
#>  6    10 processed  T     true   v    
#>  7    10 processed  T     true   z    
#>  8     8 processed  F     false  g    
#>  9     8 processed  F     false  k    
#> 10     8 processed  F     false  i    
#> # ... with 116 more rows

nutterb · January 30, 2018, 2:43pm

I would think it would be preferable to use the message function within message_, such as

message_ <-  function(df, ..., domain = NULL, appendLF = TRUE) {
  message(..., domain = domain, appendLF = appendLF)
  df
}

elben10 · January 30, 2018, 2:46pm

It works equally well using message.