Suppose I have a character variable I wanted to convert to factor with the mutate function. The variables has hundreds of values. Manually adding levels isn't the most efficient way to go. want to be able to do: mutate(sleep_total_discr = factor(sleep_total_dscr, levels = sleep_total_discr)
Here is a reprex example that has only 4 levels. Manageable by adding 'levels' manually.
You want to create a column of factors based on a column of integers? Are all the relationships greater than? Do you have a list of the cut offs and factor values? If so you can probably write a little function to do the lifting for you.
I like this solution, but you will need to add "droplevels()" to clean up unused levels! And again, if your list_of_categories were in hundreds or even thousands, you will have to manually populate the levels in the vector, right? Or is there any other way?
Thanks PJ. your suggestion works, but not sure why we should not be able to use "factor" and "levels" from mutate. I even tried levels = msleep$ sleep_total_discr.
You should be able to use factor and level with mutate. Can you please provide a reproducible example illustrating the problems you're having with your code?
But if your levels are unordered, you do not need to specify the levels. Using just factor and as.factor is enough. Compare z3 and z4, and check their levels and that of z2.
set.seed(seed = 47715)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
fake_df <- tibble(x = seq.int(to = 30),
y = runif(n = 30,
min = -100,
max = 100))
fake_df_mod <- fake_df %>%
mutate(z1 = case_when(y < -50 ~ "very_low",
y < 0 ~ "low",
y < 50 ~ "high",
TRUE ~ "very high"),
z2 = factor(x = z1,
levels = c("very low", "low", "high", "very high"),
ordered = TRUE),
z3 = factor(x = z1),
z4 = as.factor(x = z1))
str(object = fake_df_mod)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 30 obs. of 6 variables:
#> $ x : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ y : num 86.806 13.028 -0.782 -17.397 99.254 ...
#> $ z1: chr "very high" "high" "low" "low" ...
#> $ z2: Ord.factor w/ 4 levels "very low"<"low"<..: 4 3 2 2 4 NA NA 4 NA 3 ...
#> $ z3: Factor w/ 4 levels "high","low","very high",..: 3 1 2 2 3 4 4 3 4 1 ...
#> $ z4: Factor w/ 4 levels "high","low","very high",..: 3 1 2 2 3 4 4 3 4 1 ...