Add secondary horizontal axis with specific breaks

Code block:

my_diamonds <- diamonds %>% 
  mutate(log_price = log(price)) %>% 
  group_by(cut) %>% 
  mutate(scaled_log_price = scale(log_price) %>% as.numeric) %>% # scale within each group as opposed to overall
  nest() %>% 
  mutate(mean_log_price = map_dbl(data, ~ .x$log_price %>% mean)) %>% 
  mutate(sd_log_price = map_dbl(data, ~ .x$log_price %>% sd)) %>% 
  unnest %>% 
  select(cut, price, price_scaled:sd_log_price)

Looks like this:

my_diamonds
# A tibble: 53,940 x 7
# Groups:   cut [5]
   cut   price price_scaled log_price scaled_log_price mean_log_price sd_log_price
   <ord> <int>        <dbl>     <dbl>            <dbl>          <dbl>        <dbl>
 1 Ideal   326       -0.904      5.79            -1.87           7.64        0.992
 2 Ideal   340       -0.901      5.83            -1.82           7.64        0.992
 3 Ideal   344       -0.900      5.84            -1.81           7.64        0.992
 4 Ideal   348       -0.899      5.85            -1.80           7.64        0.992
 5 Ideal   403       -0.885      6.00            -1.65           7.64        0.992
 6 Ideal   403       -0.885      6.00            -1.65           7.64        0.992
 7 Ideal   403       -0.885      6.00            -1.65           7.64        0.992
 8 Ideal   404       -0.885      6.00            -1.65           7.64        0.992
 9 Ideal   404       -0.885      6.00            -1.65           7.64        0.992
10 Ideal   405       -0.884      6.00            -1.65           7.64        0.992

I'd like to use ggplot to visualize the distribution of scaled_log_price:

my_diamonds %>% 
  ggplot(aes(x = scaled_log_price)) +
  geom_density() +
  facet_wrap(vars(cut)) +
  scale_x_continuous(breaks = -3:3)

Result:

This shows the scaled log normal distribution for each cut. I would like to overlay, perhaps using geom_text(), the original price values that correspond to each Zscore unit.

For example, cut 'Ideal' has a mean log price of 7.64 and a standard deviation log price of 0.992. So, on the break for cut that is e.g. +2 I would like to show exp(7.64 + (2 * 0.992)) = 15,123.42. I.e. two log normal deviations above the mean for 'Ideal' diamonds is $15.1K.

Tried adding geom_text()

my_diamonds %>% 
  ggplot(aes(x = scaled_log_price)) +
  geom_density() +
  facet_wrap(vars(cut)) +
  scale_x_continuous(breaks = -3:3) +
  geom_text(mapping = aes(x = scaled_log_price, y = 1, label = price))

Result:

I'm not sure what's happening here, it looks like ggplot is perhaps trying to add each value of price between each Zscore.

Desired result would be 6 new labels per facet, underneath the existing x axis and at 90 degrees so as to fit comfortably. Also open to suggestions for better ways to present this.

More holistically, I am trying to visualize a log normal distribution and would like to know the actual price values for each Zscore break.

(Note, this post is similar to a post I made yesterday that was already given a solution. The difference here though is that I realized that since I am scaling, I must do this within the groups of cut whereas previously I scaled the entire data frame across all cuts. So it's an additional layer of complexity since I'm doing log transformations and scales within a group)

1 Like

The previous solution required only minor adjustments.

(diamonds2 <- mutate(group_by(diamonds, cut),
  scale_price = scale(price) %>% as.numeric()
))
lm1 <- lm(price ~ scale_price:cut, data = diamonds2)

text_labels <- expand_grid(
  cut = unique(diamonds2$cut),
  data.frame(
    scale_price = -3:3,
    y = 1))

text_labels$price_label <- round(
                            predict(lm1, newdata = text_labels),
                            0)

diamonds2 %>%
  ggplot(aes(x = scale_price)) +
  geom_density() +
  coord_fixed(xlim = c(-3, 3), clip = "off") +
  geom_text(
    data = text_labels,
    mapping = aes(x = scale_price,
                  y = y, 
                  label = price_label), vjust = 6,
    size = 3
  ) +
  geom_text(
    data = text_labels,
    mapping = aes(x = scale_price,
                  y = y, 
                  label = scale_price), vjust = 7.5,
    size = 3
  ) +
  facet_wrap(vars(cut), ncol = 2) +
  theme(
    panel.spacing = unit(2, "lines"),
    axis.line.x = element_blank(),
    axis.text.x = element_blank(),
    axis.title.x = element_blank()
  )

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

I am able to read this and follow. Once again thank you very much.