Convert dplyr::filter condition into human readable text

Hi there,

I use RMarkdown (specifically {bookdown}) for the sake of reproducibility and dynamic control of the human-readable text. I wonder if it is possible to automatically convert dplyr::filter() conditions to human-readable text.

For instance,

library(tidyverse)

threshold <- 3

# Condition 1
mpg %>%
  filter(displ > threshold) %>%
  ggplot(aes(x = cty,
             y = hwy,
             color = as.factor(cyl))) +
  geom_point() +
  theme(legend.position = "bottom")

In this case, the plot description can be done with in-line code in the way:

City miles per gallon vs. highway miles per gallon for cars with engine displacements greater than `r threshold `.

However, if I change the filtering condition (not just the threshold), the text will become invalid.


# Condition 2
mpg %>%
  #  >= used instead of > in "Condition 1"
  filter(displ >= threshold) %>%
  ggplot(aes(x = cty,
             y = hwy,
             color = as.factor(cyl))) +
  geom_point() +
  theme(legend.position = "bottom")

or

# Condition 3
threshold <- c(2.5, 5)

mpg %>%
  filter(between(displ, threshold[[1]], threshold[[2]])) %>%
  ggplot(aes(x = cty,
             y = hwy,
             color = as.factor(cyl))) +
  geom_point() +
  theme(legend.position = "bottom")

So, would it be possible to dynamically set the filtering condition and then compile a human-readable text from it?

Hello,

If I understand you correctly, you would like to define the filtering condition as a variable, and then plug it into the dply::filter and the text so both you update at the same time? As far as I know, there is no simple solution for this, unless you're going to set up a lot more complex logic to achieve this, which would bypass the point. Is there a specific reason you want to do this? I would think that the easiest for now would be to just edit the text in the different instances.

Example condition 2:

City miles per gallon vs. highway miles per gallon for cars with engine displacements 
greater than or equal to `r threshold `.

Example condition 3:

City miles per gallon vs. highway miles per gallon for cars with engine displacements 
between `r threshold[1]` and `r threshold[2]`.

Maybe someone knows a more elegant solution, but the logic inside the filters can become very complex and automatic text translation would be difficult I imagine.

PJ

Thank you for your reply. I expected that there would be no such solution.

The specific reason for this "automation" is that now I have to remember to update text every time I change filtering conditions. This is a problem at the early stage of the project when I'm trying to write the text explanations along with data wrangling (not to forget why I do things). However, if some changes in the upstream analysis occur, I might need to change filtering conditions and forget to update the text accordingly.

Anyway, it is not a big issue, and I just need to re-read the text once in a while to check that it is still okay.

Hi,

I completely understand that, this happens to me all the time as well :slight_smile:
What I would do is use some RegEx to quickly find all the parts in my code where this occurs, and then check if the text matches. I have written you a regex string to search for any filtering in your code using the filter() function

(filter\([^>=<%]+[>=<]+)|(filter\(\s*between\()

If you paste this into the RStudio search box and check the regex option, you should be able to jump from filter to filter and quickly check if the text below matches.

Hope this helps,
PJ

1 Like

For me it depends on tbe variability / variety and complexity of the filter statements you intend to have. If you will limit your self to single filter statements that are one of a few fixed types then it should be relatively trivial although somewhat menial, to set up a function that can work as you describe. Its a question of the level of ambition for me.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.