How do I filter a variable by a value and replace all these values?

SBA · November 28, 2017, 12:50pm

I have a dataframe with particular values for each variable I want to change. How would I filter a variable by a specific value and then change each value to something else? For example if I had a variable height and wanted to change all values at 20cm to NA, how would I do that?

martin.R · November 28, 2017, 12:57pm

df[df$height == 20, "height"] <- NA

Mark6 · November 28, 2017, 1:05pm

You could also use dplyr.

df <- df %>% mutate(height = replace(height, height == 20, NA))

Although note that you may want to leave your original data and add a new variable, rather than change values.

SBA · November 28, 2017, 1:09pm

Ignoring specific variables this time, if I just do
df[df == 20] <- NA
does this replace all values at 20 in the whole data set to NA?

martin.R · November 28, 2017, 1:32pm

It should do, yes. You should check stackoverflow.com for such queries.

pjperez · November 29, 2017, 8:22am

I think all the others answers work, but usually I do:
df <- df %>% mutate(height = ifelse(height == 20, NA, height))

nick · November 30, 2017, 2:04pm

Careful with ifelse -- it strips attributes and is often slow. if_else is generally a better idea if you are already working in tidyverse-land.

pjperez · December 1, 2017, 11:45am

Thanks nick. Yes I knew if_else and that is more strict/safe, but I don't understand why. I don't understand also what does mean from your reply the "it strips attributes" part . Thanks again

nick · December 1, 2017, 3:35pm

Attributes are essentially "metadata" about variables that can be stored and retrieved, and are used by several common systems. The most common is probably factors, where the attributes store what the levels of the factor are. The if_else documentation has a good example of the factor attributes getting stripped by ifelse:

# Unlike ifelse, if_else preserves types
x <- factor(sample(letters[1:5], 10, replace = TRUE))
ifelse(x %in% c("a", "b", "c"), x, factor(NA))
#>  [1]  2  3  1 NA NA NA  3 NA  3 NA
if_else(x %in% c("a", "b", "c"), x, factor(NA))
#>  [1] b    c    a    <NA> <NA> <NA> c    <NA> c    <NA>
#> Levels: a b c d e

ifelse also lets you mix types in the output, which can happen inadvertently in some cases. This can lead to unstable results:

ifelse(c(TRUE, FALSE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> [1] "1" "b" "3"
ifelse(c(TRUE, TRUE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> [1] 1 2 3
dplyr::if_else(c(TRUE, TRUE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> Error: `false` must be type double, not character

pjperez · December 2, 2017, 10:28am

Ok. Understood ....I think .At least I will always remenber to use if_else(). Thanks nick

rensa · December 3, 2017, 3:46am

Basically, the type that comes out of ifelse isn't always the type that you intended to receive—even if you're doing something apparently simple, like replacing a few values with NA.

@nick's examples there demonstrate that it's hard to predict what'll happen when you mix types. But I can't tell you how many times I've had bugs crop up lines down the road because ifelse had spat out a numeric full of garbage instead of the type I'd put into it.