Trying to figure out how to plot an age-length line / scatter graph in ggplot2?

Hi, I was wondering if anyone has code or suggestions on how to formulate this kind of graph on R using ggplot2:

I have tried for a while now trying to play with jitter and its just not working. I have a feeling its something simple but I just can't seem to figure it out as I have never worked with line graphs of this type before.

I have sample code provided below:

Year Total Length (mm) Age
2017 68 4
2017 35 1
2017 37 1
2017 36 1
2017 37.5 1
2017 41 2
2017 36 2
2017 51 2
2017 49 2
2017 68 4
2017 54 3
2017 53 3
2017 49 3
2017 51 3
2017 50 4
2017 59 4
2017 55 3

Any suggestions on code for this kind of figure?

Thanks

Please ask your questions with a reproducible example like this one

df <- data.frame(
    year = c(2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
             2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
             2017L),
    total_length = c(68, 35, 37, 36, 37.5, 41, 36, 51, 49, 68, 54, 53, 49, 51,
                     50, 59, 55),
    age = c(4, 1, 1, 1, 1, 2, 2, 2, 2, 4, 3, 3, 3, 3, 4, 4, 3)
)

library(ggplot2)
library(dplyr)
df %>% 
    group_by(age) %>% 
    mutate(mean_length = mean(total_length)) %>% 
    ggplot(aes(x = age, y = total_length)) +
    geom_point(color = "blue", shape = 1) +
    geom_smooth(aes(y = mean_length), method = "loess") +
    geom_point(aes(y = mean_length), color = "red")

Created on 2019-02-21 by the reprex package (v0.2.1)

If you've never heard of a reprex before, you might want to start by reading this FAQ:

2 Likes

Sorry, still learning... Heres my reprex:

df <- data.frame(
  year = c(2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
           2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
           2017L),
  total_length = c(68, 35, 37, 36, 37.5, 41, 36, 51, 49, 68, 54, 53, 49, 51,
                   50, 59, 55),
  age = c(4, 1, 1, 1, 1, 2, 2, 2, 2, 4, 3, 3, 3, 3, 4, 4, 3),
  parasites = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.4.4
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.4.4
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df %>% 
  group_by(parasites) %>%
  group_by(age) %>% 
  mutate(mean_length = mean(total_length)) %>% 
  ggplot(aes(x = age, y = total_length)) +
  geom_point(aes(shape = as.factor(parasites), size = 2)) +
  geom_smooth(aes(y = mean_length), method = "loess") +
  geom_point(aes(y = mean_length), color = "black") + 
  theme_bw()
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : pseudoinverse used at 0.985
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : neighborhood radius 2.015
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : reciprocal condition number 4.2401e-017
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : There are other near singularities as well. 4.0602
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
#> at 0.985
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
#> 2.015
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal
#> condition number 4.2401e-017
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : There are other
#> near singularities as well. 4.0602

Created on 2019-02-22 by the reprex package (v0.2.1)

My data set will be bigger than this and what I am trying to do is plot on the same graph, Parasitized and Non Parasitized Points for each age class, with each having there respective mean length for each curve. Also how would I use jitter to ensure points are not on top of each other?

Thanks

Do you mean something like this?

library(ggplot2)
library(dplyr)

df %>%
    mutate(parasites = as.factor(parasites)) %>%
    group_by(parasites, age) %>%
    mutate(mean_length = mean(total_length)) %>% 
    arrange(parasites, age) %>% 
    ggplot(aes(x = age, y = total_length, shape = parasites, color = parasites)) +
    geom_point(size = 2) +
    geom_smooth(aes(y = mean_length), method = "loess", show.legend = FALSE) +
    geom_point(aes(y = mean_length), shape = 9, show.legend = FALSE, color = "black") + 
    theme_bw()

2 Likes

Wow Unreal, this is exactly what I want!

Am I able to extract all of my observations from an excel file in order to place them into a dataframe? As specified by "df", or can I import an excel file and use the respective dplyr commands as shown in your example?

You can read data from excel into a dataframe using something like this

df <- readxl::read_xlsx("path_to_your/file.xlsx")
1 Like

Thank you for all your help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.