For longer dplyr
pipelines (time long, not code long), is there a way to insert messages to make the progress verbose?
I run a lot of scripts by calling $ Rscript my_script.R
from the command line and I use the message()
functions often.
magrittr
's tee (%T>%
) operator doesn't work for this purpose (aside: I wanted to use a magrittr
tag for this topic, but one isn't available).
# 01: Currently what I'm doing --------------------------------
message(sprintf("doing a bunch of things to %s, be back in a few...", fname))
file.path(
data_path
, "feathers"
, fname
) %>%
read_feather(
columns = keep
) %>%
filter(
sample_col == "processed"
) %>%
mutate(
region = RGN %>% tolower() %>% trimws()
) %>%
left_join(
.
, region_df
)
message("finished now, thanks for waiting on me")
# 02: What I'd like to do (comment lines for emphasis) --------
file.path(
data_path
, "feathers"
, fname
) %>%
#
message(sprintf("reading file: %s", fname)) %>%
#
read_feather(
columns = keep
) %>%
#
message("...filtering and prepping for join...") %>%
#
filter(
sample_col == "processed"
) %>%
mutate(
region = RGN %>% tolower() %>% trimws()
) %>%
#
message(sprintf("joining: %s with %s", fname, region_df_name)) %>%
#
left_join(
.
, region_df
)
3 Likes
I think this will solve your problem.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(feather)
data_path <- tempdir()
dir.create(file.path(data_path, "feather"))
keep <- c("a", "sample_col", "RGN")
fname <- "file1.feather"
region_df <- tibble(a = sample(1:10, 50, TRUE),
d = sample(letters, 50, TRUE))
message_ <- function(df, msg) {
cat("This is a message:\n", msg, "\n")
df
}
tibble(a = sample(1:10, 50, TRUE),
RGN = sample(c(TRUE, FALSE), 50, TRUE),
sample_col = sample(c("processed", "non-processed"), 50, TRUE)) %>%
write_feather(file.path(data_path, "feather", fname))
file.path(data_path, "feather", fname) %>%
message_(sprintf("reading file: %s", fname)) %>%
read_feather(columns = keep) %>%
message_("...filtering and prepping for join...") %>%
filter(sample_col == "processed") %>%
mutate(region = RGN %>% tolower() %>% trimws()) %>%
message_(sprintf("joining: %s with %s", fname, as.character(quote(region_df)))) %>%
left_join(region_df, by = "a")
#> This is a message:
#> reading file: file1.feather
#> This is a message:
#> ...filtering and prepping for join...
#> This is a message:
#> joining: file1.feather with region_df
#> # A tibble: 126 x 5
#> a sample_col RGN region d
#> <int> <chr> <lgl> <chr> <chr>
#> 1 10 processed T true v
#> 2 10 processed T true f
#> 3 10 processed T true v
#> 4 10 processed T true s
#> 5 10 processed T true i
#> 6 10 processed T true v
#> 7 10 processed T true z
#> 8 8 processed F false g
#> 9 8 processed F false k
#> 10 8 processed F false i
#> # ... with 116 more rows
2 Likes
I would think it would be preferable to use the message
function within message_
, such as
message_ <- function(df, ..., domain = NULL, appendLF = TRUE) {
message(..., domain = domain, appendLF = appendLF)
df
}
2 Likes
It works equally well using message
.
1 Like