Equivalent to switch in tidyverse running a list of commands

Isaiah · November 6, 2018, 3:42am

I can get switch to run different sequences of commands, but not to pipe.
I can get case_when to pipe, but when I have more than one command in the sequence, I get all commands from all branches (L1, L2, W1, W2, W2) rather than what I hoped for (W1, W2).

Is there a tidyverse way to get the switch output?

library(tidyverse)
library(reprex)
x <- Sys.info() %>% pluck("sysname")

switch (x,
        "Linux" = {
          print("L1")
          print("L2")
        },
        "Windows" = {
          print("W1")
          print("W2")
        })
#> [1] "W1"
#> [1] "W2"

case_when(x == "Linux" ~{
  print("L1")
  print("L2")
},
x == "Windows" ~  {
  print("W1")
  print("W2")
},
TRUE ~ {
  ("Not Linux or Windows")
})
#> [1] "L1"
#> [1] "L2"
#> [1] "W1"
#> [1] "W2"
#> [1] "W2"

cderv · November 6, 2018, 6:26am

case_when has a specificity - it evaluates all the RHS then filter according the to LHS. In your case, it is why you see the all the combination because

all the print are evaluated (printed to console)
nothing is return by the filter because print return invisibly

To illustrate,

If you assign the result of case_when to y, you'll still have the values printed and when printing y, you'll have only the last returned value invisibly by print.

library(tidyverse)
x <- Sys.info() %>% pluck("sysname")
y <- case_when(x == "Linux" ~{
  print("L1")
  print("L2")
},
x == "Windows" ~  {
  print("W1")
  print("W2")
},
TRUE ~ {
  ("Not Linux or Windows")
})
#> [1] "L1"
#> [1] "L2"
#> [1] "W1"
#> [1] "W2"
# you only get the last value return in expression
# like any other function
y
#> [1] "W2"

If you do not print, but build a vector of what you want using c() and print directly the result of case_when, you'll have the correct result.

library(tidyverse)
x <- Sys.info() %>% pluck("sysname")
case_when(
  x == "Linux"   ~ c("L1","L2"),
  x == "Windows" ~  c("W1", "W2"),
  TRUE           ~ "Not Linux or Windows"
  )
#> [1] "W1" "W2"

^{Created on 2018-11-06 by the reprex package (v0.2.1)}

Is it clearer ?

for what you want to achieve, switch could be the solution. Not everything has to be transform in tidyverse
One big advantage is that it is vectorised with a default value if nothin found - unlike switch - and this is very practical in data.frame manipulation.

library(tidyverse)
x <- c("Windows", "Linux", "other")
case_when(
  x == "Linux"   ~  "L1",
  x == "Windows" ~  "W1",
  TRUE           ~  "Not Linux or Windows"
)
#> [1] "W1"                   "L1"                   "Not Linux or Windows"
switch(x, 
  Linux   =  "L1",
  Windows =  "W1"
)
#> Error in switch(x, Linux = "L1", Windows = "W1") :  EXPR must be a length 1 vector

case_when is also kind of string as you can replace value only by the same type.

Isaiah · November 6, 2018, 7:36am

Ah, that makes sense! Below is closer to what I was trying to do.

library(tidyverse)
library(reprex)
x <- Sys.info() %>% pluck("sysname")
case_when(x == "Linux" ~{
  (a <- "L1")
  (b <- "L2")
},
x == "Windows" ~  {
  (c <- "W1")
  (d <- "W2")
},
TRUE ~ {
  ("Not Linux or Windows")
})

For this, is there a vector solution?

mara · November 6, 2018, 12:35pm

Since you're using case_when() this is already vectorized. The code above works (at least for me).

library(tidyverse)
x <- Sys.info() %>% pluck("sysname")
case_when(x == "Linux" ~{
  (a <- "L1")
  (b <- "L2")
},
x == "Windows" ~  {
  (c <- "W1")
  (d <- "W2")
},
TRUE ~ {
  ("Not Linux or Windows")
})
#> [1] "Not Linux or Windows"

^{Created on 2018-11-06 by the reprex package (v0.2.1.9000)}
(Note that in order to create a reprex, I don't actually call the reprex library, reprex is run on the code itself. See the community reprex FAQ for more details).

Isaiah · November 6, 2018, 8:35pm

Thanks Mara!

I think the case _when output is correct for both of us, but all the assignments occur for all branches:

library(tidyverse)
a <- b <- c  <- NULL
x <- Sys.info() %>% pluck("sysname")
case_when(x == "Linux" ~
            (a <- "L1")
          ,
          x == "Windows" ~
            (b <- "W1")
          ,
          TRUE ~
            (c <- "N1"))
#> [1] "W1"
a
#> [1] "L1"
b
#> [1] "W1"
c
#> [1] "N1"

This seems strange and is different from nested ifs, ifelse, and select.

rensa · November 6, 2018, 8:45pm

It seems like all of these solutions are trying hard to use case_when() to do what switch() is designed to do. Can you give us an example of the kind of pipe you'd like to use switch() in, @Isaiah?

Isaiah · November 6, 2018, 8:58pm


library(tidyverse)
Sys.info() %>% pluck("sysname") %>% switch ("Linux" = {
  (a <- "L1")
},
"Windows" = {
  (b <- "W1")
})

OK, good call. I guess I was lost down a rabbit hole. Above works fine! Thank you.

cderv · November 6, 2018, 9:31pm

Yes you first example shows what case_when does by design: It evaluates all the RHS before filtering to the correct condition, so yes all the assignment are made.

switch is a better candidate here. I don't think case_when is thought to be used this way. It is for assigning the result of case_when to a variable.

@Isaiah, if your question is resolved, can you mark the topic as solved.(FAQ: How do I mark a solution?)

rensa · November 6, 2018, 10:07pm

Haha, that's okay! We all go down rabbit holes sometimes Two other tricks that might come in handy if you want to use switch() in a pipe:

If you want do a more complex switch() condition, rather than just having the pipe input directly be the condition, remember that yon can wrap the switch() statement in braces to prevent the pipe from inserting the input directly. For example:

Sys.info() %>%
pluck("sysname") %>%
{
  # okay, this is a terrible example. the point is you can do this if
  # you need to, lol
  switch(toupper(.),
    "LINUX" = {
      # …
    },
    "WINDOWS" = {
      # …
    })
} %>%
more_stuff()

If your switch statement only involves side effects (ie. nothing you do in it will affect the rest of the pipe), you can use the tee-pipe %T>% to have the input to the switch statement pass directly on to the next part of the chain, instead of the switch's output going on:

Sys.info() %>%
pluck("sysname") %T>%
# another terrible example XD
switch ("Linux" = { autoplot(.) }, "Windows" = { summary(.) }) %>%
# more_stuff() is getting the output of pluck(), not autoplot() or summary()
more_stuff()

(Note that the tee operator isn't exported by other tidyverse packages, so you'll need to library(magrittr) to use it!)

Hopefully those tools will give you some more flexibility when building your pipes

Isaiah · November 7, 2018, 12:04am

library(tidyverse)
b <- NULL; x <- NULL
x <- Sys.info() %>% pluck("sysname")
switch (x,
        "Linux" = {
          (a <- "L1")
        },
        "Windows" = {
          (b <- "W1")
         })
#> [1] "W1"
b
#> [1] "W1"
# b is assigned

b <- NULL; x <- NULL

Sys.info() %>% pluck("sysname") %>% switch ("Linux" = {
  (a <- "L1")
},
"Windows" = {
  (b <- "W1")
})
#> [1] "W1"
# b is not assigned

So with switch, I get different behaviour with pipes. Without pipes, the assignment occurs; with pipes, it does not.

rensa · November 7, 2018, 12:17am

Hmmm, that's interesting—I was able to replicate this. If I modify the examples like this:

switch (x,
        "Linux" = {
          (a <- "L1")
        },
        "Windows" = {
          (b <- "W1"); print(b); browser()
         })

Sys.info() %>% pluck("sysname") %>% switch ("Linux" = {
  (a <- "L1")
},
"Windows" = {
  (b <- "W1"); browser()
})

When we hit the breakpoint at browser(), inspecting the environment, b is there and assigned. If I do the same with the non-piped example, the breakpoint seems to still be in the global environment. Maybe one of the others can illuminate this, but it seems to me like the pipe is introducing a new environment that's causing you to lose b once the switch() statement ends.

EDIT: from the pipe documentation:

For most purposes, one can disregard the subtle aspects of magrittr's evaluation, but some functions may capture their calling environment, and thus using the operators will not be exactly equivalent to the "standard call" without pipe-operators.

It seems like this could be a possibility

jcblum · November 7, 2018, 12:33am

I think maybe it's the sentence before that?

First a function is produced from all of the individual right-hand side expressions, and then the result is obtained by applying this function to the left-hand side.

Or, as the vignette says about the ability to pipe into lambdas:

Since all right-hand sides are really “body expressions” of unary functions, this is only the natural extension the simple right-hand side expressions.

magrittr is writing a new function, so the assignment is happening inside that function's environment. You can see this if you put the browser() call before the assignment, then compare where browser() says you are when it launches between the pipe and non-pipe versions:

With the pipe, browser() launches as: Called from: function_list[[k]](value)
You can also see that RStudio opens a viewer showing something like:
```
function (.) 
switch(., Linux = {
  browser()
  a <- "L1"
}, Windows = {
  b <- "W1"
})
```

Without the pipe, browser() launches as: Called from: top level

If you use the <<- assignment operator inside the switch() portion of the pipe, then the desired assignments (in the global environment) happen.

Edited to add: forgot to say why I think it's clearer with browser() before the assignment! You can see that when browser() launches, the environment is empty. And then a or b gets created when you advance one step.

rensa · November 7, 2018, 12:37am

More philosophically, when I think about what the pipe is designed for, it's generally about transforming data. So the primary use cases for using a switch() inside a pipe are going to be either conditionally transforming the data that you're passing through (based on either the data itself or on something in a parent environment) or conditionally performing side effects. My gut feeling is that making global assignments inside pipes doesn't come up a whole lot. But it's great to know about <<- for making those assignments!

jcblum · November 7, 2018, 1:00am

I agree that the pipe envisions you passing values down a chain of expressions, not reaching outside the pipeline to assign values to external variables. My gut feeling is that making global assignments inside pipelines should be approached with great caution, and maybe as an opportunity to ask oneself if there's another, less side-effect-y way to handle the problem — but I admit that's kind of how I feel about global assignments in general!

@Isaiah, I'm curious — what are the scenarios where you find yourself wanting to assign to variables outside the pipeline like this?

Isaiah · November 7, 2018, 2:17am

It's paths to data files, which differ depending on whether I'm using linux or windows.

rensa · November 7, 2018, 3:21am

If that's the case, it might just be better to assign the base path for your data files globally at the start of your script (or at least outside the pipe)—that's what I've done in the past.

You could also use the here package, which is generally best practice for files inside your project directory, but I'm not sure whether you could do something for platform-dependent external files like shares I'd love to hear other solutions for this, since I have a similar problem (eg. sensitive/confidential data files that can't be included in the project).

system · November 14, 2018, 3:21am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.