A line of code I use nearly every time I open R is df %<% mutate_at(vars("variable1","variable 2", etc.),factor) to change multiple things into factors at once. Now however, I see that mutate_at is being replaced by across(). Only, I've tried at least half a dozen different ways to convert the old code into the format across uses, and I can't seem to get it to work. Does anyone know a solution?
The equivalent for your example would be something like this:
library(tidyverse)
# Subtract 1,000 from the selected columns
# Note that both the variable selection and the mutating function are
# inside across
mtcars %>%
mutate(across(c(mpg, hp), ~ . - 1000))
Some other examples:
# Operate only on numeric values
iris %>%
mutate(across(where(is.numeric), round))
# Get the standard deviation of all numeric columns and,
# separately (outside of across), a count of rows
iris %>%
summarise(across(where(is.numeric), sd), n = n())
This almost got me there, but I'm still having some trouble. Working with my data set and using the vignette, I was able to write a line that half works: pubdata=pubdata %>% mutate(across(.cols = "Race", "Sex", .fns = factor))
This transforms the Race variable into a factor, but for some reason it doesn't do anything with the Sex variable. It also completely destroys the Race variable somehow:
'data.frame': 1225 obs. of 2 variables:
Race: Factor w/ 1 level "Sex": NA NA NA NA NA NA NA NA NA NA ...
Sex : int 2 2 2 1 2 1 1 1 2 2 ...
Going back to the vignette, I tried to tweak what they had near the bottom (in the "how to transform existing code") to this: pubdata %>% mutate(across(c(pubdata, "Race","Sex")), factor)
But this doesn't work at all, and just returns an error:
Error: Problem with mutate() input ..1.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `data.frame<
Race: factor<9f711>
Sex : integer
. i It must be numeric or character. i Input ..1isacross(c(pubdata, "Race", "Sex")). Run rlang::last_error()` to see where the error occurred
For reference, this is what the old code does, and is exactly what I'm trying to duplicate:
pubdata is the name of the data frame, rather than a column of the data frame, which is causing the error. Also, you don't need quotation marks around the column names. The call will work without them as well.