How do I filter a variable by a value and replace all these values?


#1

I have a dataframe with particular values for each variable I want to change. How would I filter a variable by a specific value and then change each value to something else? For example if I had a variable height and wanted to change all values at 20cm to NA, how would I do that?


#2

df[df$height == 20, "height"] <- NA


#3

You could also use dplyr.

df <- df %>% mutate(height = replace(height, height == 20, NA))

Although note that you may want to leave your original data and add a new variable, rather than change values.


#4

Ignoring specific variables this time, if I just do
df[df == 20] <- NA
does this replace all values at 20 in the whole data set to NA?


#5

It should do, yes. You should check stackoverflow.com for such queries.


#6

I think all the others answers work, but usually I do:
df <- df %>% mutate(height = ifelse(height == 20, NA, height))


#7

Careful with ifelse – it strips attributes and is often slow. if_else is generally a better idea if you are already working in tidyverse-land.


#8

Thanks nick. Yes I knew if_else and that is more strict/safe, but I don’t understand why. I don’t understand also what does mean from your reply the “it strips attributes” part . Thanks again


#9

Attributes are essentially “metadata” about variables that can be stored and retrieved, and are used by several common systems. The most common is probably factors, where the attributes store what the levels of the factor are. The if_else documentation has a good example of the factor attributes getting stripped by ifelse:

# Unlike ifelse, if_else preserves types
x <- factor(sample(letters[1:5], 10, replace = TRUE))
ifelse(x %in% c("a", "b", "c"), x, factor(NA))
#>  [1]  2  3  1 NA NA NA  3 NA  3 NA
if_else(x %in% c("a", "b", "c"), x, factor(NA))
#>  [1] b    c    a    <NA> <NA> <NA> c    <NA> c    <NA>
#> Levels: a b c d e

ifelse also lets you mix types in the output, which can happen inadvertently in some cases. This can lead to unstable results:

ifelse(c(TRUE, FALSE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> [1] "1" "b" "3"
ifelse(c(TRUE, TRUE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> [1] 1 2 3
dplyr::if_else(c(TRUE, TRUE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> Error: `false` must be type double, not character

#10

Ok. Understood …I think .At least I will always remenber to use if_else(). Thanks nick


#11

Basically, the type that comes out of ifelse isn’t always the type that you intended to receive—even if you’re doing something apparently simple, like replacing a few values with NA.

@nick’s examples there demonstrate that it’s hard to predict what’ll happen when you mix types. But I can’t tell you how many times I’ve had bugs crop up lines down the road because ifelse had spat out a numeric full of garbage instead of the type I’d put into it.