I have a dataframe with particular values for each variable I want to change. How would I filter a variable by a specific value and then change each value to something else? For example if I had a variable height and wanted to change all values at 20cm to NA, how would I do that?
df[df$height == 20, "height"] <- NA
You could also use dplyr
.
df <- df %>% mutate(height = replace(height, height == 20, NA))
Although note that you may want to leave your original data and add a new variable, rather than change values.
Ignoring specific variables this time, if I just do
df[df == 20] <- NA
does this replace all values at 20 in the whole data set to NA?
I think all the others answers work, but usually I do:
df <- df %>% mutate(height = ifelse(height == 20, NA, height))
Careful with ifelse
-- it strips attributes and is often slow. if_else
is generally a better idea if you are already working in tidyverse-land.
Thanks nick. Yes I knew if_else
and that is more strict/safe, but I don't understand why. I don't understand also what does mean from your reply the "it strips attributes" part . Thanks again
Attributes are essentially "metadata" about variables that can be stored and retrieved, and are used by several common systems. The most common is probably factors, where the attributes store what the levels of the factor are. The if_else
documentation has a good example of the factor attributes getting stripped by ifelse
:
# Unlike ifelse, if_else preserves types
x <- factor(sample(letters[1:5], 10, replace = TRUE))
ifelse(x %in% c("a", "b", "c"), x, factor(NA))
#> [1] 2 3 1 NA NA NA 3 NA 3 NA
if_else(x %in% c("a", "b", "c"), x, factor(NA))
#> [1] b c a <NA> <NA> <NA> c <NA> c <NA>
#> Levels: a b c d e
ifelse
also lets you mix types in the output, which can happen inadvertently in some cases. This can lead to unstable results:
ifelse(c(TRUE, FALSE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> [1] "1" "b" "3"
ifelse(c(TRUE, TRUE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> [1] 1 2 3
dplyr::if_else(c(TRUE, TRUE, TRUE), c(1, 2, 3), c("a", "b", "c"))
#> Error: `false` must be type double, not character
Ok. Understood ....I think .At least I will always remenber to use if_else()
. Thanks nick
Basically, the type that comes out of ifelse
isn't always the type that you intended to receive—even if you're doing something apparently simple, like replacing a few values with NA
.
@nick's examples there demonstrate that it's hard to predict what'll happen when you mix types. But I can't tell you how many times I've had bugs crop up lines down the road because ifelse
had spat out a numeric full of garbage instead of the type I'd put into it.