Code block:
my_diamonds <- diamonds %>%
mutate(log_price = log(price)) %>%
group_by(cut) %>%
mutate(scaled_log_price = scale(log_price) %>% as.numeric) %>% # scale within each group as opposed to overall
nest() %>%
mutate(mean_log_price = map_dbl(data, ~ .x$log_price %>% mean)) %>%
mutate(sd_log_price = map_dbl(data, ~ .x$log_price %>% sd)) %>%
unnest %>%
select(cut, price, price_scaled:sd_log_price)
Looks like this:
my_diamonds
# A tibble: 53,940 x 7
# Groups: cut [5]
cut price price_scaled log_price scaled_log_price mean_log_price sd_log_price
<ord> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Ideal 326 -0.904 5.79 -1.87 7.64 0.992
2 Ideal 340 -0.901 5.83 -1.82 7.64 0.992
3 Ideal 344 -0.900 5.84 -1.81 7.64 0.992
4 Ideal 348 -0.899 5.85 -1.80 7.64 0.992
5 Ideal 403 -0.885 6.00 -1.65 7.64 0.992
6 Ideal 403 -0.885 6.00 -1.65 7.64 0.992
7 Ideal 403 -0.885 6.00 -1.65 7.64 0.992
8 Ideal 404 -0.885 6.00 -1.65 7.64 0.992
9 Ideal 404 -0.885 6.00 -1.65 7.64 0.992
10 Ideal 405 -0.884 6.00 -1.65 7.64 0.992
I'd like to use ggplot to visualize the distribution of scaled_log_price:
my_diamonds %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
facet_wrap(vars(cut)) +
scale_x_continuous(breaks = -3:3)
Result:
This shows the scaled log normal distribution for each cut. I would like to overlay, perhaps using geom_text(), the original price values that correspond to each Zscore unit.
For example, cut 'Ideal' has a mean log price of 7.64 and a standard deviation log price of 0.992. So, on the break for cut that is e.g. +2 I would like to show exp(7.64 + (2 * 0.992))
= 15,123.42. I.e. two log normal deviations above the mean for 'Ideal' diamonds is $15.1K.
Tried adding geom_text()
my_diamonds %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
facet_wrap(vars(cut)) +
scale_x_continuous(breaks = -3:3) +
geom_text(mapping = aes(x = scaled_log_price, y = 1, label = price))
Result:
I'm not sure what's happening here, it looks like ggplot is perhaps trying to add each value of price between each Zscore.
Desired result would be 6 new labels per facet, underneath the existing x axis and at 90 degrees so as to fit comfortably. Also open to suggestions for better ways to present this.
More holistically, I am trying to visualize a log normal distribution and would like to know the actual price values for each Zscore break.
(Note, this post is similar to a post I made yesterday that was already given a solution. The difference here though is that I realized that since I am scaling, I must do this within the groups of cut whereas previously I scaled the entire data frame across all cuts. So it's an additional layer of complexity since I'm doing log transformations and scales within a group)