regression line with geom_smooth for groups and coefficient estimate labels

hi I have data I want to make into different graphs by groups and add a regression line distinct to each group

ggplot(data = cats, aes(x = life_years, y =happy_years, group = cat_type, color = cat_type)) +
facet_wrap(~ cat_type) +
geom_point() +
geom_smooth(method = lm)

from this it looks like the same regression line is being apply to each graph? can anyone assist! thank you!!

additionally, is it possible to label the coefficient of each line (like if its .34, or .55) etc thank you

Thanks for providing code. Could you kindly take further steps to make it easier for other forum users to help you? Share some representative data that will enable your code to run and show the problematic behaviour.

How do I share data for a reprex?

You might use tools such as the library datapasta, or the base function dput() to share a portion of data in code form, i.e. that can be copied from forum and pasted to R session.

Reprex Guide

Advice.
If you have code that process many files, in iterative fashion, sre the repetitions relevant to the problem, or would you see the issue with a single file ? If so your example should only concern a single file.
If your issue does not relate to file reading , i.e . You have no problem loading your raw data, your problem is manipulating/processing it, then you should modify your example to exclude all file loading code and substitute that code with example data that you prepared following the guide.

1 Like

structure(list(life_years = c(33, 42, 21, 11, 11, 11, 32, 32,
32, 44, 44, 49), happy_years = c(11, 33, 14, 4, 4, 2, 23, 12,
11, 23, 33, 22), cat_type = c("orange", "orange", "black", "white",
"white", "white", "black", "orange", "white", "black", "black",
"white")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-12L))

hi sir here is my data sample.

Your code works fine. With the same scales for all three plots, the estimated lines are clearly not the same. The slope for black cats is close to 0.5 and it is about 2.0 for orange cats.

BTW, ggplot2 will automatically group the data when you map an aesthetic (color) to a discrete variable (cat_type), so group = cat_type is not needed

library(tidyverse)

cats <- structure(list(life_years = c(33, 42, 21, 11, 11, 11, 32, 32,
32, 44, 44, 49), happy_years = c(11, 33, 14, 4, 4, 2, 23, 12,
11, 23, 33, 22), cat_type = c("orange", "orange", "black", "white",
"white", "white", "black", "orange", "white", "black", "black",
"white")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-12L))

ggplot(data = cats, aes(x = life_years, y = happy_years, color = cat_type)) +
  facet_wrap(~ cat_type) +
  geom_point() +
  geom_smooth(method = lm)
#> `geom_smooth()` using formula = 'y ~ x'

Created on 2022-12-22 with reprex v2.0.2

1 Like

Thans for that! Note that below is a reprex, the ideal format for seeking help here on coding forums..

library(ggplot2)
cats <- structure(list(
  life_years = c(33, 42, 21, 11, 11, 11, 32, 32,32, 44, 44, 49), 
  happy_years = c(11, 33, 14, 4, 4, 2, 23, 12,11, 23, 33, 22), 
  cat_type = c("orange", "orange", "black", "white","white", "white", "black", "orange", "white", "black", "black","white")), 
  class = c("tbl_df", "tbl", "data.frame"), 
  row.names = c(NA, -12L))

ggplot(data = cats, aes(x = life_years, y =happy_years, group = cat_type, color = cat_type)) +
  facet_wrap(~ cat_type) +
  geom_point() +
  geom_smooth(method = lm)
#> `geom_smooth()` using formula = 'y ~ x'

Created on 2022-12-22 with reprex v2.0.2

The regression line for each panel appears different to me.


additionally, is it possible to label the coefficient of each line (like if its .34, or .55) etc thank you

You can add arbitrary annotations via annotate. Create an annotation layer — annotate • ggplot2. I'm actually not sure the best way to add specifically the labels of each coefficient though.

Winston Chang's R Graphics Cookbook (which I've found useful for ten years now!) has some guidance, 5.9 Adding Annotations with Model Coefficients | R Graphics Cookbook, 2nd edition. But I do feel there has got to be a helper package that makes this trivially easy -- sorry I just don't know it.

1 Like

The ggpmisc package is one way to add the estimated linear equation;

library(tidyverse)
library(ggpmisc)
#> Loading required package: ggpp
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate

cats <- structure(list(life_years = c(33, 42, 21, 11, 11, 11, 32, 32,
32, 44, 44, 49), happy_years = c(11, 33, 14, 4, 4, 2, 23, 12,
11, 23, 33, 22), cat_type = c("orange", "orange", "black", "white",
"white", "white", "black", "orange", "white", "black", "black",
"white")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-12L))

ggplot(data = cats, aes(x = life_years, y = happy_years, color = cat_type)) +
  facet_wrap(~ cat_type) +
  stat_poly_line(formula = y ~ x) +
  stat_poly_eq(formula = y ~ x, aes(label = after_stat(eq.label))) +
  geom_point()

Created on 2022-12-22 with reprex v2.0.2

The reason the equations are staggered vertically is so they do not overlap when all on one graph. There is probably a way to change that.

library(tidyverse)
library(ggpmisc)
#> Loading required package: ggpp
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate

cats <- structure(list(life_years = c(33, 42, 21, 11, 11, 11, 32, 32,
32, 44, 44, 49), happy_years = c(11, 33, 14, 4, 4, 2, 23, 12,
11, 23, 33, 22), cat_type = c("orange", "orange", "black", "white",
"white", "white", "black", "orange", "white", "black", "black",
"white")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-12L))

ggplot(data = cats, aes(x = life_years, y = happy_years, color = cat_type)) +
# facet_wrap(~ cat_type) +
  stat_poly_line(formula = y ~ x) +
  stat_poly_eq(formula = y ~ x, aes(label = after_stat(eq.label))) +
  geom_point()

Created on 2022-12-22 with reprex v2.0.2

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.