Plotting Annual Geometric Means On a Time Series Scatterplot

You can use this little function and then use stat_summary(). See this stackoverflow link....I simply googled the following "how to use stat_summary in for geometric mean" and the second link has an answer.

gm_mean <- function(x) {
 exp(mean(log(x)))
}

graph3 <- ocoee %>%
  filter(parm == "tn", year > 2009) %>% 
  ggplot() +
  geom_point(aes(x = date, y = value, color = source)) +
  stat_summary(aes(x = date, y = value), fun = gm_mean,  
               geom = "point",
               size = 1, 
               color = "red") +
  stat_summary(aes(x = date, y = value, linetype = source), 
               fun = gm_mean, 
               geom = "line" 
               size = 1, 
               color = "purple") + 
  facet_wrap(.~ lake, ncol = 1, scales =  "free", 
             labeller = labeller(lake = lak.labs))
  scale_x_date(date_breaks = "1 years", date_labels = "%Y") + 
  theme(legend.position = "bottom") +
  labs(x = "Year", y = expression(paste("TN (mg ", L^-1), ")")))

@benjaminhlina---thanks for these stylistic reccomendations--especially the spaces. Much easier to read.

Also, thanks for the facet_wrap order changes--I have wondered about this, and never seen a clear answer.

Finally, yes on the mg/L--way better

@benjaminhlina This code certainly simplifies thing, however, I am actually looking for the annual geometric means (i.e., geo mean of all values in 2010, 2011, etc.). The code looks like it is producing the geometric mean for that specific date? I looked at the stackoverflow page that you cited, and could not see anything about specifying a certain group for geo mean calcuation.

This is the plot that was created:

As far as the stylistic changes/order in which you call functions in ggplot. You can call ggplot functions however you want under ggplot() + but the idea that Hadley has around ggplot is that you build a figure in layers that follow a logical order hence the order that I provided which is the order Hadley uses in his books/documentation of ggplot.

The SO link I provided gives you the geometic mean function..this is where knowing a bit about the data and what you need to do comes in handy. You're never going to find an example that fits exactly your problem but there are examples of grouping the data that should help out. With that said pretty positive you need to add in the group argument within your aes(). As such

gm_mean <- function(x) {
 exp(mean(log(x)))
}

graph3 <- ocoee %>%
  filter(parm == "tn", year > 2009) %>% 
  ggplot() +
  geom_point(aes(x = date, y = value, color = source)) +
  stat_summary(aes(x = date, y = value, group = year), fun = gm_mean,  
               geom = "point",
               size = 1, 
               color = "red") +
  stat_summary(aes(x = date, y = value, group = year, 
                   linetype = source), 
               fun = gm_mean, 
               geom = "line" 
               size = 1, 
               color = "purple") + 
  facet_wrap(.~ lake, ncol = 1, scales =  "free", 
             labeller = labeller(lake = lak.labs))
  scale_x_date(date_breaks = "1 years", date_labels = "%Y") + 
  theme(legend.position = "bottom") +
  labs(x = "Year", y = expression(paste("TN (mg ", L^-1), ")")))

Noticed one other thing...you may want to remove linetype = source with in the second stat_summary() aes() call if you don't want it to calculate the geometric mean by source and instead overall regardless of source.

@benjaminhlina So, ran your code and resulted in this plot:

What I am looking for is this:

I made year a factor, but then I could not filter out data after year>2009, as it does not recognize this as a factor.

Any ideas?

Here is how I would use stat-summary().

library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
tpexample <- tibble::tribble(
  ~lake,     ~source,       ~date, ~parm, ~value,  ~unit,
  "lake1", "lakewatch", "30-Jul-92",  "tp",   0.06, "mg/L",
  "lake1", "lakewatch", "18-Aug-92",  "tp",   0.07, "mg/L",
  "lake1", "lakewatch", "29-Sep-92",  "tp",   0.13, "mg/L",
  "lake1", "lakewatch", "29-Oct-92",  "tp",    0.1, "mg/L",
  "lake1", "lakewatch", "16-Nov-92",  "tp",   0.16, "mg/L",
  "lake1", "lakewatch", "16-Dec-92",  "tp",   0.13, "mg/L",
  "lake1", "lakewatch", "19-Jan-93",  "tp",   0.09, "mg/L",
  "lake1", "lakewatch", "10-Feb-93",  "tp",   0.09, "mg/L",
  "lake1", "lakewatch", "24-Mar-93",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "28-Apr-93",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "28-May-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "29-Jun-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "29-Jul-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "20-Aug-93",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "27-Sep-93",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "20-Oct-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "22-Nov-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "22-Dec-93",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "26-Jan-94",  "tp",   0.06, "mg/L",
  "lake1", "lakewatch", "23-Feb-94",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "23-Mar-94",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "27-Apr-94",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "28-Jun-94",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch",  "9-Jul-94",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch",  "5-Oct-94",  "tp",   0.08, "mg/L",
  "lake1", "lakewatch",  "1-Nov-94",  "tp",   0.09, "mg/L",
  "lake1", "lakewatch", "22-Dec-94",  "tp",   0.11, "mg/L",
  "lake1", "lakewatch", "31-Jan-95",  "tp",    0.1, "mg/L",
  "lake1", "lakewatch", "16-Feb-95",  "tp",   0.08, "mg/L",
  "lake1", "lakewatch", "14-Mar-95",  "tp",   0.08, "mg/L",
  "lake1", "lakewatch", "13-Apr-95",  "tp",   0.06, "mg/L",
  "lake1", "lakewatch", "11-May-95",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "20-Jun-95",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "25-Jul-95",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "21-Aug-95",  "tp",   0.17, "mg/L",
  "lake1", "lakewatch", "12-Sep-95",  "tp",   0.15, "mg/L",
  "lake1", "lakewatch", "16-Oct-95",  "tp",    0.1, "mg/L",
  "lake1", "lakewatch", "14-Nov-95",  "tp",    0.1, "mg/L",
  "lake1", "lakewatch", "12-Dec-95",  "tp",   0.06, "mg/L",
  "lake1", "lakewatch", "23-Jan-96",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "15-Feb-96",  "tp",   0.07, "mg/L",
  "lake1", "lakewatch", "20-Mar-96",  "tp",   0.06, "mg/L"
)
tpexample <- tpexample %>% mutate(date = dmy(date),
                                  year = year(date),
                                  MidYear=make_date(year = year,6,1)) 
ggplot(tpexample,) +
  geom_point(mapping =aes(x=date, y=value,))+
  stat_summary(geom = "point",mapping=aes(x=MidYear,y=value),
               fun = function(z)exp(mean(log(z))),color="green")+
  stat_summary(geom = "line",mapping=aes(x=MidYear,y=value),
               fun = function(z)exp(mean(log(z))),color="green")

Created on 2020-11-21 by the reprex package (v0.3.0)

@FJCC --this is fantastic! thanks. I will be using this in the future. Thank you!

So from both a stylistic and behind the scene R stand point it's better to make the function outside the ggplot call, than in it. See Hadley's page about stylistic ways to write code and why here. Always break code up into smaller chunks as it's easier for the computer to run, easier to trouble shoot, debug, and read.

gm_mean <- function(x) {
  exp(mean(log(x)))
}

or with pipes which is easier to read but relies on magrittr to be loaded

gm_mean <- function(x) {
  x %>%
    log() %>% 
    mean() %>% 
    exp()
}

Next as pointed out the one thing I was missing that @FJCC caught is the mid_year column. I've added as such. You should never turn off warning conflicts as this helps you know what other packages have similar functions e.g. dplyr::select(). Select is a common name for many functions in many packages. When you load packages it's helpful to know that information, so you don't run into conlficts. When loading packages it's nice to load them in alphabetical order as it makes it easier to know which ones are loaded and which ones are missing.

# load packages ----
library(dplyr) 
library(ggplot2)
library(lubridate)
library(tibble)

# create dataframe  ----
tpexample <- tribble(
  ~lake,     ~source,       ~date, ~parm, ~value,  ~unit,
  "lake1", "lakewatch", "30-Jul-92",  "tp",   0.06, "mg/L",
  "lake1", "lakewatch", "18-Aug-92",  "tp",   0.07, "mg/L",
  "lake1", "lakewatch", "29-Sep-92",  "tp",   0.13, "mg/L",
  "lake1", "lakewatch", "29-Oct-92",  "tp",    0.1, "mg/L",
  "lake1", "lakewatch", "16-Nov-92",  "tp",   0.16, "mg/L",
  "lake1", "lakewatch", "16-Dec-92",  "tp",   0.13, "mg/L",
  "lake1", "lakewatch", "19-Jan-93",  "tp",   0.09, "mg/L",
  "lake1", "lakewatch", "10-Feb-93",  "tp",   0.09, "mg/L",
  "lake1", "lakewatch", "24-Mar-93",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "28-Apr-93",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "28-May-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "29-Jun-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "29-Jul-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "20-Aug-93",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "27-Sep-93",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "20-Oct-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "22-Nov-93",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "22-Dec-93",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "26-Jan-94",  "tp",   0.06, "mg/L",
  "lake1", "lakewatch", "23-Feb-94",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "23-Mar-94",  "tp",   0.04, "mg/L",
  "lake1", "lakewatch", "27-Apr-94",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "28-Jun-94",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch",  "9-Jul-94",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch",  "5-Oct-94",  "tp",   0.08, "mg/L",
  "lake1", "lakewatch",  "1-Nov-94",  "tp",   0.09, "mg/L",
  "lake1", "lakewatch", "22-Dec-94",  "tp",   0.11, "mg/L",
  "lake1", "lakewatch", "31-Jan-95",  "tp",    0.1, "mg/L",
  "lake1", "lakewatch", "16-Feb-95",  "tp",   0.08, "mg/L",
  "lake1", "lakewatch", "14-Mar-95",  "tp",   0.08, "mg/L",
  "lake1", "lakewatch", "13-Apr-95",  "tp",   0.06, "mg/L",
  "lake1", "lakewatch", "11-May-95",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "20-Jun-95",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "25-Jul-95",  "tp",   0.03, "mg/L",
  "lake1", "lakewatch", "21-Aug-95",  "tp",   0.17, "mg/L",
  "lake1", "lakewatch", "12-Sep-95",  "tp",   0.15, "mg/L",
  "lake1", "lakewatch", "16-Oct-95",  "tp",    0.1, "mg/L",
  "lake1", "lakewatch", "14-Nov-95",  "tp",    0.1, "mg/L",
  "lake1", "lakewatch", "12-Dec-95",  "tp",   0.06, "mg/L",
  "lake1", "lakewatch", "23-Jan-96",  "tp",   0.05, "mg/L",
  "lake1", "lakewatch", "15-Feb-96",  "tp",   0.07, "mg/L",
  "lake1", "lakewatch", "20-Mar-96",  "tp",   0.06, "mg/L"
)

# reformat the date column, add in year and add in mid_year column. 
# Pipes are meant to be entered afterwards not continued on forwards.  

tpexample <- tpexample %>%
  mutate(date = dmy(date),
         year = year(date), 
         mid_year = make_date(year = year, 6, 1))

# use glimpse() to assess the structure of the dataframe

glimpse(tpexample)

# plot -----
# I'll be honest, I missed a few things (commas and brackets) 
# considering I originally wrote that free hand outside of R in the submission box
# without actually running it on any data. 
# I apologize for that. Going forward see fixed plot that works with sample data
# I've commented a few lines as some of that code doesn't run with the reprex data

# group = year was wrong, sorry about that 
# but I left it commented as a note so you can see how group works
# facet_wrap is commented as for the reprex data 
# there is not ablitiy to facet as such 
 
graph3 <- tpexample %>%
  ggplot() +
  geom_point(aes(x = date, y = value, color = source)) +
  stat_summary(aes(x = mid_year, y = value, 
                   # group = year
                   ), 
               fun = gm_mean, 
               geom = "point",
               size = 1, 
               color = "red") +
  stat_summary(aes(x = mid_year, y = value, 
                   # group = year
                   ), 
               fun = gm_mean, 
               geom = "line", 
               size = 1, 
               color = "purple") + 
  # facet_wrap(.~ lake, ncol = 1, scales =  "free", 
  #            labeller = labeller(lake = lak.labs)
  #            ) +
  scale_x_date(date_breaks = "1 years", date_labels = "%Y") + 
  theme(legend.position = "bottom") +
  labs(x = "Year", y = expression(paste("TP (mg ", L^-1, ")")))

graph3

This produces the same graph whether you put the function inside the ggplot call or not. However, it is always better to do any extra things outside the ggplot call e.g. run and create a function or filter your data. Hope this helps, sorry for not taking a more detailed approach. Again I originally wrote all of that free hand from just looking at was already posted and didn't actually run any of it, that's my mistake, sorry. When making a reprex and updating it make sure that you continue to use the same data or data structure as some of the code doesn't work for the posted reprex data. This makes things difficult when trying to assist.

@FJCC Thank you!!!! This saved me, as I have to finish this up today. A few odd questions:

  1. I had to change from summarize to summarise. Maybe R knows what country I live in???
  2. My dates were brought in (from Excel) as ymd, so I had to change your mutate script. This was first time I have seen that, as usually it is brought in as dmy?
  3. The mean date works fine in this situation, but how would you set to a specific date, such as January 1 of that year?

Thanks again!
Cheers.

  1. I thought either spelling of summarise would work in any situation. I guess that isn't true but I have never looked into it. Both spellings work for me.
  2. From my limited experience importing from Excel, I would say the dates usually come in formatted as they are in Excel. If your dates came in as ymd, I would check whether they came in as characters or as a numeric dates. You might not have to transform them.
  3. Lubridate has a make_date() function. You could use the year column in GeoMean and make a new column with mutate.
GeoMean <- GeoMean %>% mutate(MyDate = make_date(year = year, month = 1, day = 1))

@FJCC--thanks again.

I am getting this error on your lubridate example--see below. I ran the script on my original data, and it added the correct Jan 1 column.

Geomean <- Geomean %>% 
  mutate(MyDate = make_date(year = year, month = 1, day = 1))
#> Error in Geomean %>% mutate(MyDate = make_date(year = year, month = 1, : could not find function "%>%"

Created on 2020-11-13 by the reprex package (v0.3.0)

The %>% pipe comes from the magrittr package and is automatically loaded with dplyr. Have you loaded one of those two packages?

This is also a stylistic thing, space after every operator and comma i.e. x = date ect. Set a margin in gobal options of 80 characters as it keeps the code from trailing off. Notice how I've entered and tabbed everything and things are in a left to right top to bottom order. This is much easier to read and write than code that continues to trail off forever.

Notice also how I've changed the order in which facet_wrap(), scale_x_date(), theme(), and labs() are being called...this reads much easier as you start the plot with the data you want plotted, you then move to making multiple plots with facet_wrap(), change the axis, then change the underlying themes, and lastly finish with labels.

You'll also notice that I changed your code for the label to mg L^-1 which will result in a subscript of -1 and looks way better than mg/l

graph3 <- ocoee %>%
  filter(parm == "tn", year > 2009) %>% 
  ggplot() +
  geom_point(aes(x = date, y = value, color = source)) +
  geom_point(data = GeomMean2, 
             aes(x = MyDate2, y = GMean2), 
             size = 1, 
             color = "red") +
  geom_line(data = GeomMean2, 
            aes(x = MyDate2, y = GMean2, linetype = GeoMean), 
            size = 1, 
            color = "purple") + 
  facet_wrap(.~ lake, ncol = 1, scales =  "free", 
             labeller = labeller(lake = lak.labs))
  scale_x_date(date_breaks = "1 years", date_labels = "%Y") + 
  theme(legend.position = "bottom") +
  labs(x = "Year", y = expression(paste("TN (mg ", L^-1), ")")))