Unable to add regression line to plot

Hi. I am unable to add a regression line to my plot. I would appreciate your advice. Thank you.

anscombe <- data.frame(cbind(x1=c(10,8,13,9,11,14,6,4,12,7,5),
y1=c(8.04,6.95,7.58,8.81,8.33,9.96,7.24,4.26,10.84,4.82,5.68)))
library(ggplot2)
p1 <- ggplot(anscombe) +
geom_point(aes(x1, y1), color = "darkorange", size = 1.5) +
scale_x_continuous(breaks = seq(0,20,2)) +
scale_y_continuous(breaks = seq(0,12,2)) +
expand_limits(x = 0, y = 0) +
labs(x = "x1", y = "y1",
title = "Dataset 1" ) +
theme_bw() +
geom_smooth(method="lm", formula = y1 ~ x1, data= anscombe, col="blue", se=FALSE)
p1

Error: stat_smooth requires the following missing aesthetics: x and y
Backtrace:
x

  1. +-(function (x, ...) ...
  2. -ggplot2:::print.ggplot(x)
  3. +-ggplot2::ggplot_build(x)
  4. -ggplot2:::ggplot_build.ggplot(x)
  5. \-ggplot2:::by_layer(function(l, d) l$compute_statistic(d, layout))
    
  6.   \-ggplot2:::f(l = layers[[i]], d = data[[i]])
    
  7.     \-l$compute_statistic(d, layout)
    
  8.       \-ggplot2:::f(..., self = self)
    
  9.         \-self$stat$compute_layer(data, params, layout)
    
  10.           \-ggplot2:::f(..., self = self)
    
  11.             \-ggplot2:::check_required_aesthetics(...)
    

Execution halted

Try:

geom_smooth(method="lm", col="blue", se=FALSE)

anscombe <- data.frame(cbind(x1=c(10,8,13,9,11,14,6,4,12,7,5),
                             y1=c(8.04,6.95,7.58,8.81,8.33,9.96,7.24,4.26,10.84,4.82,5.68)))
library(ggplot2)
p1 <- ggplot(anscombe) +
  aes(x=x1, y=y1)+
  geom_point( color = "darkorange", size = 1.5) +
  scale_x_continuous(breaks = seq(0,20,2)) +
  scale_y_continuous(breaks = seq(0,12,2)) +
  expand_limits(x = 0, y = 0) +
  labs(x = "x1", y = "y1",
       title = "Dataset 1" ) +
  theme_bw() +
  geom_smooth(method="lm", 
              formula = y ~ x,
              col="blue", se=FALSE)
p1

Nirgrahamuk, Your suggestion worked, thank you. But why does yours work when mine failed? I provided too much information that apparently is unnecessary.

anscombe <- data.frame(cbind(x1=c(10,8,13,9,11,14,6,4,12,7,5),
                             y1=c(8.04,6.95,7.58,8.81,8.33,9.96,7.24,4.26,10.84,4.82,5.68)))
library(ggplot2)
p1 <- ggplot(anscombe) +
  aes(x=x1, y=y1)+
  geom_point( color = "darkorange", size = 1.5) +
  scale_x_continuous(breaks = seq(0,20,2)) +
  scale_y_continuous(breaks = seq(0,12,2)) +
  expand_limits(x = 0, y = 0) +
  labs(x = "x1", y = "y1",
       title = "Dataset 1" ) +
  theme_bw() +
  geom_smooth(method="lm", 
              formula = y ~ x,
              col="blue", se=FALSE)
p1

[/quote]

In your original code, you have formula=y1 ~ x1. However, geom_smooth requires the generic y and x for the variable names in the formula, regardless of the actual names of the columns you're plotting.

Also, for method="lm", the default formula is y ~ x, so it's not necessary to specify it explicitly. If you want to plot a different type of lm regression model, then you would need to specify the formula. For example:

# Polynomial 
 geom_smooth(method="lm", 
             formula = y ~ poly(x,2),
             col="blue", se=FALSE)

# Spline 
library(splines)
 geom_smooth(method="lm", 
             formula = y ~ bs(x, df=4),
             col="blue", se=FALSE)

The most important difference is the position of the "aes(x=x1, y=y1)".
In your example it is in the geom_point(), consequently only the points "see" the data, whereas the geom_smooth() don't have any data to work with.
Nirgrahamuk solved this by adding the "aes(x=x1, y=y1)" to the ggplot() call, so all layers have the data available.

Thanks to all who helped me with this. Ah, so much to learn :slight_smile:

1 Like