Regression line in ggplot2

So I am trying to create a scatterplot with a regression line in ggplot2. This code returns a scatterplot but no regression line. I am not sure why as there are no error or warning messages. Any suggestions would be greatly appreciated.

##Call Packages##
library(ggplot2)
library(tidyverse)
options(stringsAsFactors = FALSE)

##Set Directories##
setwd("/Users/davidowoods/Desktop/RainManSR/Data/GPP")
IN_dir="/Users/davidowoods/Desktop/RainManSR/Data/GPP/RainMan_GPPAveragesbyTreatement_Woods.csv"
OUT_dir= "/Users/davidowoods/Desktop/RainManSR/Data/GPP/Figures"

##Import Data##
GPPAverages <- read.csv(IN_dir, header = TRUE)

##Plot##
DATE <- GPPAverages$Date #make data into object
S1AVERAGE <- GPPAverages$S1Average #make data into object 
                                    #(these seem redundant but it fixed an issue)
S1 <- ggplot(data = GPPAverages, aes(x = DATE, y = S1AVERAGE)) + 
          geom_point(color='blue') + #plot data points
         geom_smooth(stat = 'smooth', #plot regression line
                     method = lm, formula = y~x, aes(DATE, S1AVERAGE))+
          ggtitle('Average GPP for Treatment One') + #add title
          xlab('Date') + #add x label
          ylab('GPP')   #add y label
print(S1) #print graph

Your code looks like it should work, but it's hard to say more without a sample of your data. For now, here's an example that shows your code works with a built-in data frame:

library(tidyverse)

# Create data frame with same names as yours
GPPAverages = mtcars %>% rename(DATE=mpg, S1AVERAGE=hp)

ggplot(data = GPPAverages, aes(x = DATE, y = S1AVERAGE)) + 
  geom_point(color='blue') + #plot data points
  geom_smooth(stat = 'smooth', #plot regression line
              method = lm, formula = y~x, aes(DATE, S1AVERAGE))+
  ggtitle('Average GPP for Treatment One') + #add title
  xlab('Date') + #add x label
  ylab('GPP')

Also, you have some redundant code in geom_smooth. Here's the equivalent plot without the unnecessary components:

ggplot(data = GPPAverages, aes(x = DATE, y = S1AVERAGE)) + 
  geom_point(color='blue') + 
  geom_smooth(method = lm) +
  labs(x="Date", y="GPP", title="Average GPP for Treatment One")

By any chance is your DATE column of class date, or class character? Run class(GPPAverages$DATE). If it's character class, then geom_smooth will not produce a regression line. Both variables need to be numeric (and date class is numeric "under the hood").

Your S1Average values on the y axis seem to be characters instead of numeric. Notice the value differences between the first "numbers" above : 0.756, 1.039, 1.069 are not evenly spaced. Where does S1Average come from?

I got it! Thanks for the help everyone :slight_smile:

Another question: how do you add the equation and r^2 value of the line onto the graph?

``
##Call Packages##
library(ggplot2)
library(tidyverse)
library(ggpmisc)
options(stringsAsFactors = FALSE)

##Set Directories##
setwd("/Users/davidowoods/Desktop/RainManSR/Data/GPP")
IN_dir="/Users/davidowoods/Desktop/RainManSR/Data/GPP/RainMan_GPPAveragesbyTreatement_Woods.csv"
OUT_dir= "/Users/davidowoods/Desktop/RainManSR/Data/GPP/Figures"

##Import Data##
df <- read.csv(IN_dir, header = TRUE)
GPPAverages <- na.omit(df)
View(GPPAverages)

##Plot S1##
GPPAverages$Date = lubridate::ymd(GPPAverages$Date)
S1 <- ggplot(data = GPPAverages, aes(x = Date, y = S1Average)) +
geom_point(color='blue') +
geom_smooth(method = lm, formula = y~x, color = 'red', se = FALSE) +
stat_poly_eq()+
labs(x="Date", y="GPP", title="Average GPP for Treatment One")
print(S1) #print graph

##Plot S2##
S2 <- ggplot(data = GPPAverages, aes(x = Date, y = S2Average)) +
geom_point(color='blue') +
geom_smooth(method = lm, formula = y~x, color = 'red', se = FALSE) +
labs(x="Date", y="GPP", title="Average GPP for Treatment Two")
print(S2) #print graph

##Plot S3##
S3 <- ggplot(data = GPPAverages, aes(x = Date, y = S3Average)) +
geom_point(color='blue') +
geom_smooth(method = lm, formula = y~x, color = 'red', se = FALSE) +
labs(x="Date", y="GPP", title="Average GPP for Treatment Three")
print(S3) #print graph

##Plot S4##
S4 <- ggplot(data = GPPAverages, aes(x = Date, y = S4Average)) +
geom_point(color='blue') +
geom_smooth(method = lm, formula = y~x, color = 'red', se = FALSE) +
labs(x="Date", y="GPP", title="Average GPP for Treatment Four")
print(S4) #print graph
``

stat_poly_eq() isn't working for some reason

I ran that an got back NULL. How do I make date into a numeric variable?

If column DATE is currently character strings of the form "2020-11-15", then you can do:

GPPAverages$DATE = lubridate::ymd(GPPAverages$DATE)

The graph now is making a line, but it is running over the points only. Do you know how to get R to ignore those and not put them into the plot?

Okay this is the code now:

``
##Import Data##
GPPAverages <- read.csv(IN_dir, header = TRUE)
na.omit(GPPAverages)

##Plot##
GPPAverages$Date = lubridate::ymd(GPPAverages$Date)
S1 <- ggplot(data = GPPAverages, aes(x = Date, y = S1Average)) +
geom_point(color='blue') +
geom_smooth(method = lm, formula = y~x, color = 'red') +
labs(x="Date", y="GPP", title="Average GPP for Treatment One")
print(S1) #print graph
``

And it outputs this warning:

Error in seq.int(0, to0 - from, by) : 'to' must be a finite number In addition: Warning message: Removed 18 rows containing non-finite values (stat_smooth).