Hey @Dong! I think part of the problem here is that stringr::str_replace() takes 'Either a character vector, or something coercible to one.'
A factor is essentially a numeric vector where the possible labels are stored once, separately. By using str_replace(), you're converting your factor to character (essentially causing the entire vector to be re-written), searching and replacing every value, and then converting the whole thing back. The same is happening with the creation: you create a character column and then data.frame converts it to a factor automatically.
I think both your factor creation and releveling would go a lot faster this way, using the forcats package to change the levels without touching the values:
library(forcats)
df <- data.frame(
"y" = rnorm(3E7),
"Grp" = factor(rep(1:3, 1E7), levels = c("1" = "A_something", "2" = "B_something", "3" = "C_something")))
df$grp = df$Grp %>% fct_relabel(str_replace, "_something", "")
The original releveling took about a minute on my fairly new laptop; using fact_relabel took a fraction of a second
Creating the original data frame column directly as a factor also helps a bit; it took 2–3 seconds versus about 10 using a character vector!