ggplot2 using all coeficients when plotting regression line

I am trying to plot a regression line for how the wing length for a species of birds looks like over a year. I am using data from multiple years but I want to compress data from all years to only show the change of day of the year. For this I have taken the date and reformated it to only include month and day:
date <- format(BIRD$DATUM, "%b %d")
I have then attempted to define what coefficients it will use when running the plot:
coefs <- coef(lm(WING~ date, data = SISKIN))

Finally I have the code for the plot:
ggplot(BIRD, aes(date, WING, group = date)) +
geom_boxplot(fill="white", color="black", width= 0.8) +
ggtitle("BIRD DOY") +
xlab("DOY") +
ylab("Wing length (mm)") +
theme_bw() +
theme(axis.title = element_text(size = 9)) +
stat_summary(fun = mean, color="red", geom = "point") +
theme(axis.text= element_text(size= 7, angle = 0)) +
geom_abline(intercept = coefs[1], slope = coefs[2], color = "red")

Once I run this, the regression line only shows the relationship between the first and second data entry disregarding all other points. I want it to include all points.

BONUS: I also want to sequence the x-axis so that it is not so cluttered.

Dummy of data:


I think your problem starts with using date <- format(BIRD$DATUM, "%b %d"). The format function returns a character value and lm() will, I think, order these alphabetically. Try using the yday function from the lubridate package which returns a number for the day of the year. For example, Feb. 20 is day 51.

BIRD$DATUM_NUM <- lubridate::yday(BIRD$DATUM)

and then fit WING to DATUM_NUM. I am not sure of the relationship between BIRD and SISKIN, so I cannot provide the exact code for the next step.

you are telling geom_abline to use intercept of the first coefficient, and slope of the second coefficient, it doesnt consider any data points / your dataframe at all.

I would recommend if you want to plot the results of a regression, to calculated the values return from running a predict with your model, add that into your frame, and pick a geom_ to plot that in gg_plot along with your other data.

all the best.

Forgot to change that part, SISKIN = BIRD. But that will give me day of the year on the x-axis and I need to show calendar day.

Here is an example of how you can plot with one column and label the axis with information from another source. I plot with day-of-year but I built labels from a Date column. The dates did not need to be part of the data frame, I could have used an independent vector.
You need to think about how to handle leap year. Perhaps you can use the first 365 days of the year in those cases.

DF <- data.frame(DATUM_NUM = 1:365,
                 Value = rnorm(365),
                 DATUM = seq.Date(from = as.Date("1985-01-01"), 
                                  to = as.Date("1985-12-31"), by = 1))
Brks <- c(1, 60, 121, 182, 244, 305)
Lbls <- format(DF[Brks, "DATUM"], "%b %d")
ggplot(DF, aes(x = DATUM_NUM, y = Value)) + geom_point() +
  scale_x_continuous(labels = Lbls, breaks = Brks)

Created on 2020-06-10 by the reprex package (v0.3.0)

That was perfect, thank you so much!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.