How to do this with Purrr (or something else?)

I'm riffing off the question asked "I don't know how to do this for loop with Purrr" from 2019-02-02.

I was trying to use the answer to this question to figure out an approach for a related, but somewhat different problem. The solution to the problem indicates to map the modifier which contains the "old" strings (strings to be replaced" and "new" strings" and map over a vector.

I want to do something similar with a dataframe. But instead of mapping over a single vector, I want to replace strings in colB and colC in a dataframe.
For my solution:

  • I want to reference the column by their names,
  • the columns will not always be adjacent, so I don't want to use a solution like "colB:colC" (although that would be handy to know how to do!)
df <- data.frame(
  stringsAsFactors = FALSE,
                id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L),
            colA = c("caacute","caacute","caacute",
                       "capitaacute","mara","mara","capitaacute","capitaacute","capitaacute"),
  colB = c("mara","mara","pido", "pido","caacute","pido","caacute","intreacute","intreacute"),
   colC = c("pido","pido","pido", "pido","caacute","pido","caacute","empty","kitten")
)

This is the result I'm looking for:

dfFinal <- data.frame(
  stringsAsFactors = FALSE,
                id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L),
            colA = c("caacute","caacute","caacute",
                       "capitaacute","mara","mara","capitaacute","capitaacute","capitaacute"),
  colB = c("all your life", "all your life", "you were only", "you were only", "caacute", "you were only", "caacute", "waiting for this moment", "waiting for this moment"),
   colC = c("you were only", "you were only", "you were only", "you were only", "caacute", "you were only", "caacute", "empty", "kitten")
)

Code I copied from a previous question, and my modifications:

# old strings to replace
args_1 <- list('mara', 'pido', 'intreacute')
# new strings
args_2 <- list('all your life','you were only', 'waiting for this moment')

# I want to change colB and colC by using. list_1 and list_2 are 2 different approaches, that didn't work. 
list_1 <- list ('colB', 'colC')
list_2 <- c ('colB', 'colC')

# modifier which replaces the old strings with the new strings
modifier <- purrr::map2(args_1, args_2, ~purrr::partial(stringr::str_replace_all, pattern = .x, replacement = .y)) %>%
  purrr::reduce(purrr::compose) 

  1. Note: I don't know what this part of the script does %>% purrr::reduce(purrr::compose)
    but when I tried to cobble a solution using this code, I was able to get partway there, so I'm just giving it a go.
  2. Here are some of the approaches I tried but didn't work...
df1 <- df %>% 
 purrr::map_chr(list_1, modifier)

df1 <- df %>% 
   purrr::map_chr(list_2, modifier)

Can you help me figure out how to develop an approach that will include:

  1. List of columns to change
  2. List of old strings to be replaced
  3. List of new strings which will replace the list of old strings.

Actually, I'm not married to solving this using purrr, but I'd like to figure out an approach.
Thank you!

This gets the desired result with a for loop. I had to use that because I do not know how to do all the replacements at once.

df <- data.frame(
  stringsAsFactors = FALSE,
  id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L),
  colA = c("caacute","caacute","caacute",
           "capitaacute","mara","mara","capitaacute","capitaacute","capitaacute"),
  colB = c("mara","mara","pido", "pido","caacute","pido","caacute","intreacute","intreacute"),
  colC = c("pido","pido","pido", "pido","caacute","pido","caacute","empty","kitten")
)

dfFinal <- data.frame(
  stringsAsFactors = FALSE,
  id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L),
  colA = c("caacute","caacute","caacute",
           "capitaacute","mara","mara","capitaacute","capitaacute","capitaacute"),
  colB = c("all your life", "all your life", "you were only", "you were only", "caacute", "you were only", "caacute", "waiting for this moment", "waiting for this moment"),
  colC = c("you were only", "you were only", "you were only", "you were only", "caacute", "you were only", "caacute", "empty", "kitten")
)

args_1 <- list('mara', 'pido', 'intreacute')

args_2 <- list('all your life','you were only', 'waiting for this moment')

library(stringr)
library(dplyr)

MyFunc <- function(V, L1, L2) {
  for (i in seq_along(L1)) {
    V <- str_replace(V, L1[[i]], L2[[i]])
  }
  V
}

dfOut <- mutate_at(df, c("colB", "colC"), .funs = MyFunc, L1 = args_1, L2 = args_2)
identical(dfOut, dfFinal)
#> [1] TRUE

Created on 2020-08-25 by the reprex package (v0.3.0)

3 Likes

The linked post had a nice way to do all the replacement at once using a named vector:

dict <- c('mara' = 'all your life',
          'pido' = 'you were only',
          'intreacute' = 'waiting for this moment')

df_final <- df %>%
  mutate(across(c(colB, colC), str_replace_all, dict))

all.equal(dfFinal, df_final)
#> TRUE

And if you need to pass the arguments as lists, you can first convert them to a named vector:

args_1 <- list('mara', 'pido', 'intreacute')
args_2 <- list('all your life','you were only', 'waiting for this moment')

dict <- as.character(args_2) %>%
  set_names(args_1)

Similarly if you want to pass a vector of column names:

my_cols <- c("colB", "colC")

df_final <- df %>%
  mutate(across(all_of(my_cols), str_replace_all, dict))
# equivalent
df_final <- df %>%
  mutate_at(all_of(my_cols), str_replace_all, dict)
2 Likes

edit: @AlexisW pipped me to it while I was writing :grinning:

I can sort of understand what the example you've cited is doing with purrr::partial and purrr::compose but I'm not sure it's the most helpful way to do this. Though it looks very clever.
You asked about

purrr::reduce(purrr::compose)

  • I think this is a way of compressing a set of partial functions into a single function.

Note that if you just map over two lists of strings using purrr::map2 to pass each pair of strings (pattern and replacement) to str_replace_all you'll get a list of data frames as your output, one for each pair, not a single data frame. I don't think this is what you want.

I may be missing something but if I were just trying to get from your initial df to your dfFinal I'd just use dplyr::case_when, or dplyr::mutate with str_replace_all() as in the following reprexes. The third example below shows how you can pass three vectors (match strings, replacement strings, column names) to a mutating function - this seems like it provides what you are wanting to be able to do?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

# initial data frame
df <- tibble(
  id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L),
  colA = c(
    "caacute", "caacute", "caacute",
    "capitaacute", "mara", "mara", "capitaacute", "capitaacute", "capitaacute"
  ),
  colB = c("mara", "mara", "pido", "pido", "caacute", "pido", "caacute", "intreacute", "intreacute"),
  colC = c("pido", "pido", "pido", "pido", "caacute", "pido", "caacute", "empty", "kitten")
)

head(df, 4)
#> # A tibble: 4 x 4
#>      id colA        colB  colC 
#>   <int> <chr>       <chr> <chr>
#> 1     1 caacute     mara  pido 
#> 2     2 caacute     mara  pido 
#> 3     3 caacute     pido  pido 
#> 4     4 capitaacute pido  pido

# use dplyr::case_when to modify columns
df %>%
  mutate(across(c(colB, colC), ~ case_when(
    . == "mara" ~ "all your life",
    . == "pido" ~ "you were only",
    . == "intreacute" ~ "waiting for this moment",
    TRUE ~ .
  ))) %>% 
  head(4)
#> # A tibble: 4 x 4
#>      id colA        colB          colC         
#>   <int> <chr>       <chr>         <chr>        
#> 1     1 caacute     all your life you were only
#> 2     2 caacute     all your life you were only
#> 3     3 caacute     you were only you were only
#> 4     4 capitaacute you were only you were only

# use stringr::str_replace_all with a named vector
df %>%
  mutate(across(
    c(colB, colC),
    ~ str_replace_all(., c(
      "mara" = "all your life",
      "pido" = "you were only",
      "intreacute" = "waiting for this moment"
    ))
  )) %>% 
  head(4)
#> # A tibble: 4 x 4
#>      id colA        colB          colC         
#>   <int> <chr>       <chr>         <chr>        
#> 1     1 caacute     all your life you were only
#> 2     2 caacute     all your life you were only
#> 3     3 caacute     you were only you were only
#> 4     4 capitaacute you were only you were only


# set up vectors to be passed to a function
origs <- c("mara", "pido", "intreacute")
repls <- c("all your life", "you were only", "waiting for this moment")
cols <- c("colB", "colC")


modify_columns <- function(df, origs, repls, cols) {
  names(repls) <- origs
  df %>% 
    mutate(across(all_of(cols), ~ str_replace_all(., repls)))
}


df %>% 
  modify_columns(origs, repls, cols) %>% 
  head(4)
#> # A tibble: 4 x 4
#>      id colA        colB          colC         
#>   <int> <chr>       <chr>         <chr>        
#> 1     1 caacute     all your life you were only
#> 2     2 caacute     all your life you were only
#> 3     3 caacute     you were only you were only
#> 4     4 capitaacute you were only you were only

Created on 2020-08-26 by the reprex package (v0.3.0)

2 Likes

I have a question about the function "modify_columns" in this solution. It's super nifty, accomplishes what I need, and I understand how I can use this for lots of different uses. I was surprised, though, about how "df" seems to be an input of the function, but when executing the function, df is piped in, and if I remember from experimenting with this last night, the function doesn't work if you enter df into () with the other function arguments. If I use df with a pipe "inside" the function, does that mean I can use a pipe for df (I guess the answer is yes, because that's how it works in your example, this is just a new to me, and perhaps explains why I've been having trouble getting my functions to work).

1 Like

I can understand how it's confusing. I found it a battle to get my head round the syntax and structure of functions for quite a while. But now I love them and make them all the time! I used to think they were a big deal, only to be used in niche or advanced situations but now I know that's not true.

Basically, when you use the pipe, the left hand side (LHS) gets piped in as the first argument of the next line (RHS). So when I pipe in df to modify_columns, it gets used as the df (ie first) argument in the function. (It's just a coincidence - albeit deliberate - that the data frame "df" in the global environment has the same name as the first argument in the function. I could have just as well named the argument banana. Same for origs and repls - I've given the function arguments the same names as the data objects I'm planning to supply, which I find simpler but could also be confusing for others).

It makes sense to me to include the df in the function, because the purpose of the function is mutating a df. You could omit it, and not pipe in a df when you use the function, but then you'd need to hardcode the name of the data frame into the function.

I didn't need to use a pipe in the function, I could have just included the df explicitly as the first (.data) argument to mutate. But I like the pipe...

modify_columns <- function(banana, origs, repls, cols) {
  names(repls) <- origs
  mutate(banana, across(all_of(cols), ~ str_replace_all(., repls)))
}

What you can't do is this:

modify_columns <- function(origs, repls, cols) {
  names(repls) <- origs
  mutate(df, across(all_of(cols), ~ str_replace_all(., repls)))
}


df %>% 
  modify_columns(origs, repls, cols)

because the function is no longer expecting the variable df as an argument, it's already got it baked into mutate as the name of the data frame.

Hope that all helps clear it up a bit!

1 Like

Thank you! This is immensely helpful. I do appreciate your description that writing functions seemed challenging. That's encouraging for a beginner to hear. Writing scripts seemed confusing when I started with R, so I'm going to keep plugging away learning to write functions.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.