Syntax consistency for renaming functions


#1

There are different ways to rename characters and factors in the tidyverse and all of them use different syntax which makes it difficult for a part time user like me to remember the correct one.

Consider the simple reprex below.

dplyr::recode uses a sequence of named replacements, with pattern first (name) and replacement second (value)

stringr::str_replace_all uses a vector of named replacements, with pattern first (name) and replacement second (value)

forcats::fct_recode uses a sequence of named replacements, with replacement first (name) and pattern second (value)


library(tidyverse)
library(lubridate)
library(forcats)
library(stringr)

df <- tribble( ~ alpha, ~ beta,
               "green",  1,
               "blue",   2,
               "yellow", 3)

df %>%
  mutate(alpha = dplyr::recode(alpha,
                               green = "black",
                               blue = "white"))
#> # A tibble: 3 x 2
#>    alpha  beta
#>    <chr> <dbl>
#> 1  black     1
#> 2  white     2
#> 3 yellow     3

df %>%
  mutate(alpha = stringr::str_replace_all(alpha,
                                          c(green = "black",
                                            blue = "white")))
#> # A tibble: 3 x 2
#>    alpha  beta
#>    <chr> <dbl>
#> 1  black     1
#> 2  white     2
#> 3 yellow     3

df %>%
  mutate(alpha = as_factor(alpha)) %>%
  mutate(alpha = forcats::fct_recode(alpha,
                                     black = "green",
                                     white = "blue"))
#> # A tibble: 3 x 2
#>    alpha  beta
#>   <fctr> <dbl>
#> 1  black     1
#> 2  white     2
#> 3 yellow     3

I wish it was always as simple as with renaming columns in dplyr, no quotes, replacement first:


library(tidyverse)

df <- tribble( ~ alpha, ~ beta,
               "green",  1,
               "blue",   2,
               "yellow", 3)

df %>% 
  dplyr::select(delta = alpha, omega = beta)
#> # A tibble: 3 x 2
#>    delta omega
#>    <chr> <dbl>
#> 1  green     1
#> 2   blue     2
#> 3 yellow     3

df %>% 
  dplyr::rename(delta = alpha, omega = beta)
#> # A tibble: 3 x 2
#>    delta omega
#>    <chr> <dbl>
#> 1  green     1
#> 2   blue     2
#> 3 yellow     3

Any chance this can be harmonised across the tidyverse?


#2

I am also frequently tripped up by these inconsistencies, especially the difference between forcats::fct_recode (newthing = "old thing") and dplyr::recode (oldthing = "new thing"). For me, the dplyr::recode ordering intuitively makes more sense (but for the life of me I can't explain why!), though I can see how the forcats method is more analogous to variable assignment, and more consistent with dplyr::rename.

In the end, I'd be happy with either ordering, as long as it's consistent (and I'd actually prefer minimal NSE, for the sake of simplicity in programming).

There's already one issue relating to pan-tidyverse conventions in the tidyverse/tidyverse issue tracker — would that be the right place to raise this issue, too?

ETA: I don't know how realistic it is that this can be standardized, since it seems like it would inevitably break a lot of existing code... :confused: