Error: geom_text requires the following missing aesthetics: label

What I have been working on is an iteration with ggplot. I'll take a dataframe and create a scatter plot with a regression line with an order of 2 for each unique Site ID level and then export to a one page PDF. So the first page would have Site ID 1, second page would have Site ID 2, and so on and so forth. I am having trouble displaying the regression line equation. I have been working with a user on Stakeoverflow, but have not been able to replicate his solution.

The issue I am having is with the function "stat_smooth_func". I am being returned with an error, "Error: geom_text requires the following missing aesthetics: label". I have tried looking up stat_smooth_func, but am unable to find it in the R help section or much online either.

library(tidyverse)
library(purrr)

Infil_Data2 <- tibble::tribble(
  ~Time, ~Site_ID, ~Vol_mL, ~Sqrt_Time.x, ~Cal_Vol_cm,
     0L,     "H1",      63,            0,           0,
    30L,     "H1",      62,  5.477225575, 0.124339799,
    60L,     "H1",      60,  7.745966692, 0.373019398,
    90L,     "H1",      59,  9.486832981, 0.497359197,
   120L,     "H1",      58,  10.95445115, 0.621698996,
   150L,     "H1",      56,  12.24744871, 0.870378595,
   180L,     "H1",      54,  13.41640786, 1.119058194,
   210L,     "H1",    52.5,  14.49137675, 1.305567893,
   240L,     "H1",      50,  15.49193338, 1.616417391,
   270L,     "H1",    48.5,  16.43167673,  1.80292709,
   300L,     "H1",    46.5,  17.32050808, 2.051606688,
     0L,     "H2",      82,            0,           0,
    30L,     "H2",      77,  5.477225575, 0.621698996,
    60L,     "H2",      73,  7.745966692, 1.119058194,
    90L,     "H2",      68,  9.486832981,  1.74075719,
   120L,     "H2",      65,  10.95445115, 2.113776588,
   150L,     "H2",      51,  12.24744871, 3.854533778,
   180L,     "H2",      56,  13.41640786, 3.232834782,
   210L,     "H2",      52,  14.49137675, 3.730193979,
   240L,     "H2",    47.5,  15.49193338, 4.289723076,
   270L,     "H2",    42.5,  16.43167673, 4.911422072,
   300L,     "H2",    37.5,  17.32050808, 5.533121068,
     0L,     "H3",      69,            0,           0,
    30L,     "H3",      67,  5.477225575, 0.248679599,
    60L,     "H3",      65,  7.745966692, 0.497359197,
    90L,     "H3",      63,  9.486832981, 0.746038796,
   120L,     "H3",      61,  10.95445115, 0.994718394,
   150L,     "H3",      60,  12.24744871, 1.119058194,
   180L,     "H3",      58,  13.41640786, 1.367737792,
   210L,     "H3",      56,  14.49137675, 1.616417391,
   240L,     "H3",      54,  15.49193338, 1.865096989,
   270L,     "H3",    51.5,  16.43167673, 2.175946488,
   300L,     "H3",      49,  17.32050808, 2.486795986
  )
#> # A tibble: 33 x 5
#>     Time Site_ID Vol_mL Sqrt_Time.x Cal_Vol_cm
#>    <int> <chr>    <dbl>       <dbl>      <dbl>
#>  1     0 H1        63          0         0    
#>  2    30 H1        62          5.48      0.124
#>  3    60 H1        60          7.75      0.373
#>  4    90 H1        59          9.49      0.497
#>  5   120 H1        58         11.0       0.622
#>  6   150 H1        56         12.2       0.870
#>  7   180 H1        54         13.4       1.12 
#>  8   210 H1        52.5       14.5       1.31 
#>  9   240 H1        50         15.5       1.62 
#> 10   270 H1        48.5       16.4       1.80 
#> # ... with 23 more rows


plot_5 <-
  Infil_Data2 %>% 
  split(.$Site_ID) %>% 
  map2(names(.),
       ~ggplot(.x, aes(Sqrt_Time.x, Cal_Vol_cm)) + 
         geom_point() +
         labs(title = paste(.y)) +
         theme(plot.title = element_text(hjust = 0.5)) + 
         stat_smooth_func(geom="text", method = "lm", hjust=0, parse=TRUE) +
         stat_smooth(mapping = aes(x = Sqrt_Time.x, y = Cal_Vol_cm),
                     method = "lm", se = FALSE, 
                     formula = y ~ poly(x, 2, raw = TRUE),
                     color = "red") +
         theme(plot.margin = unit(c(1, 5, 1, 1), "cm")) +
         stat_smooth_func(geom = "text", method = "lm", hjust = 0, parse = TRUE))

pdf("allplots5.pdf", onefile = TRUE)
walk(plot_5, print)
dev.off()

Does this solution with the ggpmisc package work for you?
stat_poly_eq() is from ggpmisc.

library(tidyverse)
library(purrr)
library(ggpmisc)

Infil_Data2 <- tibble::tribble(
  ~Time, ~Site_ID, ~Vol_mL, ~Sqrt_Time.x, ~Cal_Vol_cm,
  0L,     "H1",      63,            0,           0,
  30L,     "H1",      62,  5.477225575, 0.124339799,
  60L,     "H1",      60,  7.745966692, 0.373019398,
  90L,     "H1",      59,  9.486832981, 0.497359197,
  120L,     "H1",      58,  10.95445115, 0.621698996,
  150L,     "H1",      56,  12.24744871, 0.870378595,
  180L,     "H1",      54,  13.41640786, 1.119058194,
  210L,     "H1",    52.5,  14.49137675, 1.305567893,
  240L,     "H1",      50,  15.49193338, 1.616417391,
  270L,     "H1",    48.5,  16.43167673,  1.80292709,
  300L,     "H1",    46.5,  17.32050808, 2.051606688,
  0L,     "H2",      82,            0,           0,
  30L,     "H2",      77,  5.477225575, 0.621698996,
  60L,     "H2",      73,  7.745966692, 1.119058194,
  90L,     "H2",      68,  9.486832981,  1.74075719,
  120L,     "H2",      65,  10.95445115, 2.113776588,
  150L,     "H2",      51,  12.24744871, 3.854533778,
  180L,     "H2",      56,  13.41640786, 3.232834782,
  210L,     "H2",      52,  14.49137675, 3.730193979,
  240L,     "H2",    47.5,  15.49193338, 4.289723076,
  270L,     "H2",    42.5,  16.43167673, 4.911422072,
  300L,     "H2",    37.5,  17.32050808, 5.533121068,
  0L,     "H3",      69,            0,           0,
  30L,     "H3",      67,  5.477225575, 0.248679599,
  60L,     "H3",      65,  7.745966692, 0.497359197,
  90L,     "H3",      63,  9.486832981, 0.746038796,
  120L,     "H3",      61,  10.95445115, 0.994718394,
  150L,     "H3",      60,  12.24744871, 1.119058194,
  180L,     "H3",      58,  13.41640786, 1.367737792,
  210L,     "H3",      56,  14.49137675, 1.616417391,
  240L,     "H3",      54,  15.49193338, 1.865096989,
  270L,     "H3",    51.5,  16.43167673, 2.175946488,
  300L,     "H3",      49,  17.32050808, 2.486795986
)
#> # A tibble: 33 x 5
#>     Time Site_ID Vol_mL Sqrt_Time.x Cal_Vol_cm
#>    <int> <chr>    <dbl>       <dbl>      <dbl>
#>  1     0 H1        63          0         0    
#>  2    30 H1        62          5.48      0.124
#>  3    60 H1        60          7.75      0.373
#>  4    90 H1        59          9.49      0.497
#>  5   120 H1        58         11.0       0.622
#>  6   150 H1        56         12.2       0.870
#>  7   180 H1        54         13.4       1.12 
#>  8   210 H1        52.5       14.5       1.31 
#>  9   240 H1        50         15.5       1.62 
#> 10   270 H1        48.5       16.4       1.80 
#> # ... with 23 more rows


plot_5 <-
  Infil_Data2 %>% 
  split(.$Site_ID) %>% 
  map2(names(.),
       ~ggplot(.x, aes(Sqrt_Time.x, Cal_Vol_cm)) + 
         geom_point() +
         labs(title = paste(.y)) +
         theme(plot.title = element_text(hjust = 0.5)) + 
         stat_smooth(mapping = aes(x = Sqrt_Time.x, y = Cal_Vol_cm),
                     method = "lm", se = FALSE, 
                     formula = y ~ poly(x, 2, raw = TRUE),
                     color = "red") +
         theme(plot.margin = unit(c(1, 5, 1, 1), "cm")) +
         stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
               label.x.npc = "left", label.y.npc = 0.90, #set the position of the eq
               formula = y ~ poly(x, 2, raw = TRUE), parse = TRUE, rr.digits = 3) 
   )

Yes, that works perfectly, is there a way to extract the coefficients from each equation back into a list?

There may be a way to get the fit coefficients out of the ggplot() output but I would just directly calculate them.

FIT <- Infil_Data2 %>% group_by(Site_ID) %>% 
    do(tidy(lm(Cal_Vol_cm ~ I(Sqrt_Time.x^2) + Sqrt_Time.x, data = .)))

> FIT
# A tibble: 9 x 6
# Groups:   Site_ID [3]
  Site_ID term             estimate std.error statistic     p.value
  <chr>   <chr>               <dbl>     <dbl>     <dbl>       <dbl>
1 H1      (Intercept)       0.0240   0.0478       0.501 0.630      
2 H1      I(Sqrt_Time.x^2)  0.00854  0.000557    15.3   0.000000327
3 H1      Sqrt_Time.x      -0.0322   0.0106      -3.04  0.0161     
4 H2      (Intercept)      -0.0577   0.359       -0.161 0.876      
5 H2      I(Sqrt_Time.x^2)  0.0154   0.00418      3.68  0.00618    
6 H2      Sqrt_Time.x       0.0516   0.0796       0.648 0.535      
7 H3      (Intercept)       0.0235   0.0553       0.425 0.682      
8 H3      I(Sqrt_Time.x^2)  0.00839  0.000644    13.0   0.00000114 
9 H3      Sqrt_Time.x      -0.00801  0.0123      -0.653 0.532 

I cribbed that method of calculation from Use dplyr to do grouped t-tests and get number of observations simultanously

I have been messing around with the broom package to try and figure this out. Could you please explain why the I and ^2 are inserted here? I've seen it on a couple of other examples, but do not understand it.

Oh I see. That is the equation format. So I am getting the Intercept and x^2 coefficients?

The estimate column of the FIT object gives the fit coefficients. For each level of Site_ID you get a coefficient for the intercept, the variable and the variable squared. The variable squared is shown as I(Sqrt_Time.x^2). The I() function is used to force the interpretation of the ^ operator "as is". From the help on I():

"Function I has two main uses... In function formula . There it is used to inhibit the interpretation of operators such as "+" , "-" , "*" and "^" as formula operators, so they are used as arithmetical operators. This is interpreted as a symbol by terms.formula .

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.