So from both a stylistic and behind the scene R stand point it's better to make the function outside the ggplot call, than in it. See Hadley's page about stylistic ways to write code and why here. Always break code up into smaller chunks as it's easier for the computer to run, easier to trouble shoot, debug, and read.
gm_mean <- function(x) {
exp(mean(log(x)))
}
or with pipes which is easier to read but relies on magrittr to be loaded
gm_mean <- function(x) {
x %>%
log() %>%
mean() %>%
exp()
}
Next as pointed out the one thing I was missing that @FJCC caught is the mid_year column. I've added as such. You should never turn off warning conflicts as this helps you know what other packages have similar functions e.g. dplyr::select(). Select is a common name for many functions in many packages. When you load packages it's helpful to know that information, so you don't run into conlficts. When loading packages it's nice to load them in alphabetical order as it makes it easier to know which ones are loaded and which ones are missing.
# load packages ----
library(dplyr)
library(ggplot2)
library(lubridate)
library(tibble)
# create dataframe ----
tpexample <- tribble(
~lake, ~source, ~date, ~parm, ~value, ~unit,
"lake1", "lakewatch", "30-Jul-92", "tp", 0.06, "mg/L",
"lake1", "lakewatch", "18-Aug-92", "tp", 0.07, "mg/L",
"lake1", "lakewatch", "29-Sep-92", "tp", 0.13, "mg/L",
"lake1", "lakewatch", "29-Oct-92", "tp", 0.1, "mg/L",
"lake1", "lakewatch", "16-Nov-92", "tp", 0.16, "mg/L",
"lake1", "lakewatch", "16-Dec-92", "tp", 0.13, "mg/L",
"lake1", "lakewatch", "19-Jan-93", "tp", 0.09, "mg/L",
"lake1", "lakewatch", "10-Feb-93", "tp", 0.09, "mg/L",
"lake1", "lakewatch", "24-Mar-93", "tp", 0.05, "mg/L",
"lake1", "lakewatch", "28-Apr-93", "tp", 0.05, "mg/L",
"lake1", "lakewatch", "28-May-93", "tp", 0.04, "mg/L",
"lake1", "lakewatch", "29-Jun-93", "tp", 0.04, "mg/L",
"lake1", "lakewatch", "29-Jul-93", "tp", 0.04, "mg/L",
"lake1", "lakewatch", "20-Aug-93", "tp", 0.03, "mg/L",
"lake1", "lakewatch", "27-Sep-93", "tp", 0.03, "mg/L",
"lake1", "lakewatch", "20-Oct-93", "tp", 0.04, "mg/L",
"lake1", "lakewatch", "22-Nov-93", "tp", 0.04, "mg/L",
"lake1", "lakewatch", "22-Dec-93", "tp", 0.05, "mg/L",
"lake1", "lakewatch", "26-Jan-94", "tp", 0.06, "mg/L",
"lake1", "lakewatch", "23-Feb-94", "tp", 0.05, "mg/L",
"lake1", "lakewatch", "23-Mar-94", "tp", 0.04, "mg/L",
"lake1", "lakewatch", "27-Apr-94", "tp", 0.03, "mg/L",
"lake1", "lakewatch", "28-Jun-94", "tp", 0.05, "mg/L",
"lake1", "lakewatch", "9-Jul-94", "tp", 0.05, "mg/L",
"lake1", "lakewatch", "5-Oct-94", "tp", 0.08, "mg/L",
"lake1", "lakewatch", "1-Nov-94", "tp", 0.09, "mg/L",
"lake1", "lakewatch", "22-Dec-94", "tp", 0.11, "mg/L",
"lake1", "lakewatch", "31-Jan-95", "tp", 0.1, "mg/L",
"lake1", "lakewatch", "16-Feb-95", "tp", 0.08, "mg/L",
"lake1", "lakewatch", "14-Mar-95", "tp", 0.08, "mg/L",
"lake1", "lakewatch", "13-Apr-95", "tp", 0.06, "mg/L",
"lake1", "lakewatch", "11-May-95", "tp", 0.05, "mg/L",
"lake1", "lakewatch", "20-Jun-95", "tp", 0.03, "mg/L",
"lake1", "lakewatch", "25-Jul-95", "tp", 0.03, "mg/L",
"lake1", "lakewatch", "21-Aug-95", "tp", 0.17, "mg/L",
"lake1", "lakewatch", "12-Sep-95", "tp", 0.15, "mg/L",
"lake1", "lakewatch", "16-Oct-95", "tp", 0.1, "mg/L",
"lake1", "lakewatch", "14-Nov-95", "tp", 0.1, "mg/L",
"lake1", "lakewatch", "12-Dec-95", "tp", 0.06, "mg/L",
"lake1", "lakewatch", "23-Jan-96", "tp", 0.05, "mg/L",
"lake1", "lakewatch", "15-Feb-96", "tp", 0.07, "mg/L",
"lake1", "lakewatch", "20-Mar-96", "tp", 0.06, "mg/L"
)
# reformat the date column, add in year and add in mid_year column.
# Pipes are meant to be entered afterwards not continued on forwards.
tpexample <- tpexample %>%
mutate(date = dmy(date),
year = year(date),
mid_year = make_date(year = year, 6, 1))
# use glimpse() to assess the structure of the dataframe
glimpse(tpexample)
# plot -----
# I'll be honest, I missed a few things (commas and brackets)
# considering I originally wrote that free hand outside of R in the submission box
# without actually running it on any data.
# I apologize for that. Going forward see fixed plot that works with sample data
# I've commented a few lines as some of that code doesn't run with the reprex data
# group = year was wrong, sorry about that
# but I left it commented as a note so you can see how group works
# facet_wrap is commented as for the reprex data
# there is not ablitiy to facet as such
graph3 <- tpexample %>%
ggplot() +
geom_point(aes(x = date, y = value, color = source)) +
stat_summary(aes(x = mid_year, y = value,
# group = year
),
fun = gm_mean,
geom = "point",
size = 1,
color = "red") +
stat_summary(aes(x = mid_year, y = value,
# group = year
),
fun = gm_mean,
geom = "line",
size = 1,
color = "purple") +
# facet_wrap(.~ lake, ncol = 1, scales = "free",
# labeller = labeller(lake = lak.labs)
# ) +
scale_x_date(date_breaks = "1 years", date_labels = "%Y") +
theme(legend.position = "bottom") +
labs(x = "Year", y = expression(paste("TP (mg ", L^-1, ")")))
graph3
This produces the same graph whether you put the function inside the ggplot call or not. However, it is always better to do any extra things outside the ggplot call e.g. run and create a function or filter your data. Hope this helps, sorry for not taking a more detailed approach. Again I originally wrote all of that free hand from just looking at was already posted and didn't actually run any of it, that's my mistake, sorry. When making a reprex and updating it make sure that you continue to use the same data or data structure as some of the code doesn't work for the posted reprex data. This makes things difficult when trying to assist.