Overlaying an x-axis density plot with a 2D plot

KenOsborne · April 8, 2020, 8:15pm

I'll begin by loading some data for a reprex.

library(fivethirtyeight)
library(tidyverse)
data("biopics")

df <- biopics %>%
  filter(!is.na(box_office))

I'd like to have two plots appear in the same plot area (not faceted or aligned, but actually one atop the other).

The first being a density plot of year_release.
The second being a plot of log10(box_office) vs year_release as a scatter plot.

log10(box_office) has a range of ~2 to ~10
the density of year_release has a range of 0 to ~0.4

My attempts to plot the two on the same time plot have been using the secondary axis functionality.

df %>%
  ggplot() +
  geom_density(aes(x = year_release)) +
  geom_point(aes(x = year_release, y = log10(box_office))) +
  scale_y_continuous(sec.axis = sec_axis(~. * 20, name = "log10(box_office)"))

I chose to scale the secondary axis up by a factor of 20 because
0.5 * 20 = 10, which is around the maximum value of log10(box_office)

But this doesn't seem to produce anything like the desired effect.
Any thoughts for what I could do differently?

FJCC · April 8, 2020, 8:50pm

I think you need to divide your log10(boxoffice) values by 20 so they match well with the density values. Then you can scale the second y axis to have labels that are 20 times bigger than those on the primary axis. The plotting will be done with respect to the primary axis. The secondary axis is just for labeling.

KenOsborne · April 8, 2020, 9:15pm

I tried doing that, but it didn't change anything about the plot.
Also, since the density was the first plot, the secondary plot is normally scaled by the range of the values of the first plot.

Changing the order of the plots doesn't seem to matter though. No matter the order of the plots, the density plot looks like a tiny blip and the only thing that really shows up is the geom_point plot.

FJCC · April 8, 2020, 9:51pm

This isn't exactly what you are doing but it shows how to scale data with max values similar to yours to appear on a single plot with two y axes.

library(ggplot2)
suppressPackageStartupMessages(library(dplyr))
DF <- data.frame(Year = 2008:2020, 
                 Density_Year_Release = 2 * dnorm(seq(-6, 6, 1), 0, 2),
                 log10_Box = runif(13, min = 2, max = 10))
DF
#>    Year Density_Year_Release log10_Box
#> 1  2008          0.004431848  2.919972
#> 2  2009          0.017528300  2.031263
#> 3  2010          0.053990967  2.285790
#> 4  2011          0.129517596  7.423009
#> 5  2012          0.241970725  2.069733
#> 6  2013          0.352065327  5.963712
#> 7  2014          0.398942280  2.037093
#> 8  2015          0.352065327  5.933777
#> 9  2016          0.241970725  4.789759
#> 10 2017          0.129517596  5.975399
#> 11 2018          0.053990967  3.479170
#> 12 2019          0.017528300  9.727424
#> 13 2020          0.004431848  7.550284
DF <- DF %>% mutate(AdjBoxOffice = log10_Box/20)
ggplot(DF) + geom_line(aes(x = Year, y = Density_Year_Release)) +
  geom_point(aes(x = Year, y = AdjBoxOffice)) +
  scale_y_continuous(sec.axis = sec_axis(~. * 20, name = "log10(box_office)"))

^{Created on 2020-04-08 by the reprex package (v0.2.1)}

technocrat · April 9, 2020, 3:03am

Density ranges from almost 0 to almost 0.4, which is always going to be a blip on the radar unless box_office can be scaled down to match a similar range.

library(fivethirtyeight)
library(tidyverse)
data("biopics")

df <- biopics %>%
  filter(!is.na(box_office))

df %>%
  ggplot() +
  geom_density(aes(x = year_release)) +
  geom_point(aes(x = year_release, y = log10(box_office)/250)) +
  scale_y_continuous(sec.axis = sec_axis(~. * 20, name = "log10(box_office)/250"))

^{Created on 2020-04-08 by the reprex package (v0.3.0)}

Probably the better alternative is to compress using $10,000,000,000 bills or whatever denomination is required to squeeze into the interval

system · April 30, 2020, 3:03am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.