I often use geom_smooth() to plot smooths to my data. I just discovered that when the y-axis is transformed (e.g. log axis), geom_smooth unexpectedly uses the transformed data for its smooth. This gives a biased smooth compared to smoothing the raw data. As far I can see there is no warning in the documentation that this is happening.
library(ggplot2) library(dplyr) library(mgcv) set.seed(1) df <- data.frame(x = seq(0, 1, 0.01)) %>% mutate(y = exp(runif(n()) + 6 * (x * (1 - x) * (0.5 - x) + 0.1 * x))) # data mod <- gam(y ~ s(x, bs = "cs"), data = df, method = "REML") # smooth df$pred <- predict(mod) mod2 <- gam(y ~ s(x, bs = "cs"), data = df %>% mutate(y = log10(y)), method = "REML") # smooth in log10 space df$pred2 <- 10 ^ predict(mod2) df %>% ggplot() + labs(colour = "Smooth") + geom_point(aes(x = x, y = y)) + geom_smooth(aes(x = x, y = y, colour = "geom_smooth"), method = "gam", size = 4) + geom_line(aes(x = x, y = pred, colour = "normal space"), size = 1) + # does not match geom_smooth geom_line(aes(x = x, y = pred2, colour = "log space"), size = 1) + # matches geom_smooth scale_y_log10() #> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
Created on 2021-12-20 by the reprex package (v2.0.1)