Which tidyverse is the equivalent to search & replace from spreadsheets?

Is any str_replace_all for an entire dataframe?

Nowadays, I use

df <- data.frame(lapply(df, function(x) {
                  gsub("valueToChange", "newVar", x)
              }))

I'm assuming you want to find and replace strings. I've been digging into dplyr lately, so this is how I'd do it.

library(tidyverse)
df <- mutate_if(df, 
                is.character, 
                str_replace_all, 
                pattern = "valueToChange", 
                replacement = "newVar")

mutate_if documentation

3 Likes

I think you need to add an as_tibble in there:


suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(col1 = c("a", "b", "c"), col2 = c("d", "a", "e"), stringsAsFactors = F)
df <- dplyr::mutate_if(
   tibble::as_tibble(df), 
   is.character, 
   stringr::str_replace_all, pattern = "a", 
   replacement = "x")

df
#> # A tibble: 3 x 2
#>    col1  col2
#>   <chr> <chr>
#> 1     x     d
#> 2     b     x
#> 3     c     e
7 Likes

Right you are. I'm actually surprised by this. I thought it would work with a data.frame (or else coerce the data.frame to tibble) but here's what I found:


library(tidyverse)

df <- tibble(c1 = letters[1:5], c2 = 5:9, c3 = LETTERS[5:9])
df %>% mutate_if(is.character, str_replace_all, pattern = '[aeiouAEIOU]', replacement = '%%%%')
#> # A tibble: 5 x 3
#>      c1    c2    c3
#>   <chr> <int> <chr>
#> 1  %%%%     5  %%%%
#> 2     b     6     F
#> 3     c     7     G
#> 4     d     8     H
#> 5  %%%%     9  %%%%
df2 <- data.frame(c1 = letters[1:5], c2 = 5:9, c3 = LETTERS[5:9])
df2 %>% mutate_if(is.character, str_replace_all, pattern = '[aeiouAEIOU]', replacement = '%%%%')
#>   c1 c2 c3
#> 1  a  5  E
#> 2  b  6  F
#> 3  c  7  G
#> 4  d  8  H
#> 5  e  9  I
1 Like

You’re seeing that because data.frame automatically converts characters to factors.

6 Likes

For the most part the tidyverse works with tibble's not data.frames. However a tibble can be substituted for a data.frame because it inherits data.frame.

suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(col1 = c("a", "b", "c"), col2 = c("d", "a", "e"))
class(df)
#> [1] "data.frame"
typeof(df)
#> [1] "list"
df <- dplyr::mutate_if(tibble::as_tibble(df),
                                is.character,
                                stringr::str_replace_all, pattern = "a", replacement = "x")
# order of inheritence is most derived to the left, base class 
# all the way to the right
class(df)
#> [1] "tbl_df"     "tbl"        "data.frame"
typeof(df)
#> [1] "list"

I've grown so accustomed to using tibbles, I totally forgot about stringsAsFactors = false. Thanks @hadley for making that possible. Also, thanks @danr for your feedback.

For those watching at home:


library(tidyverse)
df3 <- data.frame(c1 = letters[1:5], c2 = 5:9, c3 = LETTERS[5:9], stringsAsFactors = FALSE)
df3 %>% mutate_if(is.character, str_replace_all, pattern = '[aeiouAEIOU]', replacement = '%%%%')
#>     c1 c2   c3
#> 1 %%%%  5 %%%%
#> 2    b  6    F
#> 3    c  7    G
#> 4    d  8    H
#> 5 %%%%  9 %%%%
1 Like

Missed that, works for data.frame too if stringsAsFactors = FALSE

suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(col1 = c("a", "b", "c"), col2 = c("d", "a", "e"), 
                                 stringsAsFactors = F)
df <- dplyr::mutate_if(df,
                                is.character,
                                stringr::str_replace_all, pattern = "a", replacement = "x")
df
#>   col1 col2
#> 1    x    d
#> 2    b    x
#> 3    c    e