Display correlation coeffiecient for each value of the grouping(faceting) variable

Good afternoon colleagues,
Presently, I am working on a movies dataset, sample provided below;

 A tibble: 5 x 3
  content_rating gross_income imdb_score
  <chr>                 <dbl>      <dbl>
1 PG                      200        6.5
2 PG                      207        7  
3 PG                      209        8  
4 PG-13                   105        9  
5 PG-13                   110        4.5

The goal is to create a scatter plot facetted by the content rating variable, as well as get the correlation coefficient for each plot in the top right of individual scatter plots. Here is my attempt so far;

ggplot(movie_select ,aes(gross_income,imdb_score))+
  geom_point()+
  facet_wrap(~content_rating)

But I am not sure how to add the code that will display the corresponding correlation coefficient in the top right of each scatter plot. Help is appreciated.

1 Like

Here is some base-R code to perform the task (wrapr is just there to give us a pipe operator and make defining the data easy).

library("wrapr")
library("ggplot2")

movie_select <- wrapr::build_frame(
  "content_rating", "gross_income", "imdb_score" |
  "PG"            , 200           , 6.5          |
  "PG"            , 207           , 7            |
  "PG"            , 209           , 8            |
  "PG-13"         , 105           , 9            |
  "PG-13"         , 110           , 4.5          )

cor_map <- movie_select %.>%
  split(., movie_select$content_rating) %.>%
  vapply(.,
         function(di) {
           cor(di$imdb_score, di$gross_income)
         }, numeric(1))

print(cor_map)

movie_select$cor <- cor_map[movie_select$content_rating]
movie_select$content_and_cor <- paste(movie_select$content_rating,
                                      movie_select$cor)

ggplot(movie_select ,aes(gross_income,imdb_score))+
  geom_point()+
  facet_wrap(~content_and_cor, ncol=1, labeller = "label_both") + 
  ggtitle("correlation between imdb score grouped by content rating")
1 Like

I'm gonna do a cheeky plug here and put out a link to my own package, stickylabeller, which makes this stuff a little easier. There's a section in the README on adding summary statistics to the facet labels—basically, I recommend creating a summary data frame with the correlations and then joining it back to your original data.

If you're feeling brave, this feature branch also allows you to create facet strips that are overlaid on the facets, in case you want something less like a label strip and more like text overlaid on the plot.

3 Likes

Thank you colleagues for awesome suggestions.I was able to implement both of them successfully. thanks

2 Likes