Getting argument name when passed via %>%. I keep getting "."

preposterior · July 23, 2020, 8:56pm

I would like to get the argument name passed to a function using the pipe operator "%>%". I can only get ".". Here is a simple MWE:

library(tidyverse)

myDF = data.frame(x=c(1,2))

testfun = function(objName) {
  z = rlang::as_name(rlang::ensym(objName))
  print(z)
}

## returns "myDF" like I want it to
testfun(myDF)

## returns "." , but I want myDF
myDF %>% testfun()

Thanks!

technocrat · July 23, 2020, 9:50pm

library(magrittr)

myDF <- data.frame(x=c(1,2))

myDF %>% rlang::quo()
#> <quosure>
#> expr: ^.
#> env:  0x7fdadc5886e0

^{Created on 2020-07-23 by the reprex package (v0.3.0)}

The %>% pipe passes a quosure, named (expr: ^.) which when quoted is ".". The function operates not on the quosure but the myDF object itself.

preposterior · July 23, 2020, 10:22pm

Thanks for this. So is the name unrecoverable after being passed through the pipe? Or is there some way to extract it from the quosure environment?

technocrat · July 23, 2020, 10:43pm

Good question. I don't see an obvious way of doing this, although it may be possible somehow using rlang::env_bind. You might check Wickham's Advanced R. When I get a longer break, I may come back to this with you.

MyKo101 · July 23, 2020, 11:19pm

Okay, so I quite enjoyed the challenge of figuring the solution out to this one. It's an interesting one. The way that the pipe works, it creates a sequence of functions like below:

function(.)
testfun(.)

and evaluates them in turn, applying the new function to the last one. This means that the variable being passed to testfun(), when used in a pipeline is actually . and it has the same value as whatever the previous step was. Within the calling of these functions, the pipe actually calls it value and that's why if you play with some of the {rlang} functions, you'll get value out

myDF %>% ensym

In this case, it's the same value as myDF, but it's now got a different name, ., and that's why it gives that result. When you think of it this way, your function is doing exactly what it's supposed to. It's the pipe that's being weird.

You can, however, look back over the call-stack where the current function is being evaluated (which is what error-finding functions like traceback() do). Within the pipe, it actually creates a relatively deeply nested set of calls (about 9 calls deep). However, the sys.calls() function can return this stack. Compare for example the following two outputs:

stack_fun <- function(x){
  sys.calls()
}
stack_fun(myDF)
myDF %>% stack_fun

The first element of this stack will be the initial call, in this case myDF %>% stack_fun(). This will be a call object and so we can pull out the left-hand side by extracting the second element (the %>% is the first element, and stack_fun() is the third). Therefore, the testfun() function can be written as:

testfun <- function(objName){
  first_call <- sys.calls()[[1]] #get the first entry on the call stack
  lhs <- first_call[[2]] #get the second element of this entry
  z <- rlang::as_name(lhs)
  print(z)
}

myDF %>% testfun()

But, that's not the end of our tale!

This is just looking for the initial call, and isn't strictly going to seek out where there is a pipe. For example, it wouldn't work with the following function, since f() would be at the top of the stack:

f <- function(x){
  x %>% testfun
}

And, in theory you would want this to return "x", since that's what's being piped into testfun(). This could also cause other problems when nested inside other functions and/or pipelines, etc... It's only ever looking at what the user has called, which is not necessarily where you want this function to look.

However, by inspecting the entire stack for a pipe, we can pull out the most recent (i.e. the lowest) entry that is a pipe:

get_lhs <- function(){
  calls <- sys.calls()
  
  call_firsts <- lapply(calls,`[[`,1) 
  pipe_calls <- vapply(call_firsts,identical,logical(1),quote(`%>%`))
  if(all(!pipe_calls)){
    NULL
  } else {
    pipe_calls <- which(pipe_calls)
    pipe_calls <- pipe_calls[length(pipe_calls)]
    calls[[c(pipe_calls,2)]]
  }
}

So, you can re-write your testfun() function to be:

testfun <- function(objName){
  lhs <- get_lhs()
  if(is.null(lhs)){
    lhs <- rlang::ensym(objName)
  }
  z <- as_name(lhs)
  print(z)
}

This means that the following both return "myDF":

testfun(myDF)
myDF %>% testfun

These will return "x":

f(myDF)
myDF %>% f

And this even works with fseq-style functions in an interesting way

g <- . %>% testfun

This is a function, which we can use in one of two ways, either as a regular function (e.g. g(myDF)) or by piping into it (e.g. myDF %>% g), and these return two different results

g(myDF) #returns "."

This is because it's essentially the same as defining g() as a function:

g <- function(.){
  . %>% testfun
}

So, this makes sense. BUT when we pipe it, it gets weird, but still a good result:

myDF %>% g # returns "myDF"

This is because it's essentially chaining the two pipelines together into a single, longer chain (much more apparent it you had many elements in your two pipelines)

Sorry for the long answer, but I thought this was an interesting challenge. I've recently started a blog about my adventures in R and coding, and so I think I'm going to copy this long-winded response into a post on there. So thank you for the inspiration

technocrat · July 24, 2020, 12:01am

Great takedown! Thanks for running this to ground.

MyKo101 · July 24, 2020, 1:16am

Oops, sorry. The get_lhs() function doesn't work if the pipeline is more than two functions long. Here's an update with an explanation as to why it doesn't work:

get_lhs <- function(){
  calls <- sys.calls()
  
  #pull out the function or operator (e.g. the `%>%`)
  call_firsts <- lapply(calls,`[[`,1) 
  
  #check which ones are equal to the pipe
  pipe_calls <- vapply(call_firsts,identical,logical(1),quote(`%>%`))
  
  #if we have no pipes, then get_lhs() was called incorrectly
  if(all(!pipe_calls)){
    NULL
  } else {
    #Get the most recent pipe, lowest on the 
    pipe_calls <- which(pipe_calls)
    pipe_calls <- pipe_calls[length(pipe_calls)]
    
    #Get the second element of the pipe call
    this_call <- calls[[c(pipe_calls,2)]]
    
    #We need to dig down into the call to find the original
    while(is.call(this_pipe) && identical(this_call[[1]],quote(`%>%`))){
      this_call <- this_call[[2]]
    }
    this_call
    
  }
}

Once we have the call, getting the lhs of it requires digging down. If we have pipeline, then it's actually a nested sequence of operators. For example, 2+3+4 makes sense to us, but R can't add like this, it breaks this down by calculating from left to right, basically it does this (2 + 3) + 4, which is the same as add(add(2,3),4). R does this with the pipe too.

If we're piping a few things together, we write this: my_variable %>% fun1 %>% fun2 %>% fun3, R reads it as this: ((my_variable %>% fun1) %>% fun2) %>% fun3.

So we repeatedly check that the current function/operator/call name is a pipe, if it is, grab the second entry (which is what is being piped into the current pipe). If it isn't, we've dug down far enough.

Also, as I mentioned, here is a blog post about it

technocrat · July 24, 2020, 1:25am

The () would be much more intuitive in Haskell

preposterior · July 24, 2020, 1:31am

I am in awe of the quality and speed of this response. Kudos and thank you!

system · July 31, 2020, 1:31am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.