my_diamonds <- diamonds %>% mutate(log_price = log(price)) %>% group_by(cut) %>% mutate(scaled_log_price = scale(log_price) %>% as.numeric) %>% # scale within each group as opposed to overall nest() %>% mutate(mean_log_price = map_dbl(data, ~ .x$log_price %>% mean)) %>% mutate(sd_log_price = map_dbl(data, ~ .x$log_price %>% sd)) %>% unnest %>% select(cut, price, price_scaled:sd_log_price)
Looks like this:
my_diamonds # A tibble: 53,940 x 7 # Groups: cut  cut price price_scaled log_price scaled_log_price mean_log_price sd_log_price <ord> <int> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Ideal 326 -0.904 5.79 -1.87 7.64 0.992 2 Ideal 340 -0.901 5.83 -1.82 7.64 0.992 3 Ideal 344 -0.900 5.84 -1.81 7.64 0.992 4 Ideal 348 -0.899 5.85 -1.80 7.64 0.992 5 Ideal 403 -0.885 6.00 -1.65 7.64 0.992 6 Ideal 403 -0.885 6.00 -1.65 7.64 0.992 7 Ideal 403 -0.885 6.00 -1.65 7.64 0.992 8 Ideal 404 -0.885 6.00 -1.65 7.64 0.992 9 Ideal 404 -0.885 6.00 -1.65 7.64 0.992 10 Ideal 405 -0.884 6.00 -1.65 7.64 0.992
I'd like to use ggplot to visualize the distribution of scaled_log_price:
my_diamonds %>% ggplot(aes(x = scaled_log_price)) + geom_density() + facet_wrap(vars(cut)) + scale_x_continuous(breaks = -3:3)
This shows the scaled log normal distribution for each cut. I would like to overlay, perhaps using geom_text(), the original price values that correspond to each Zscore unit.
For example, cut 'Ideal' has a mean log price of 7.64 and a standard deviation log price of 0.992. So, on the break for cut that is e.g. +2 I would like to show
exp(7.64 + (2 * 0.992)) = 15,123.42. I.e. two log normal deviations above the mean for 'Ideal' diamonds is $15.1K.
Tried adding geom_text()
my_diamonds %>% ggplot(aes(x = scaled_log_price)) + geom_density() + facet_wrap(vars(cut)) + scale_x_continuous(breaks = -3:3) + geom_text(mapping = aes(x = scaled_log_price, y = 1, label = price))
I'm not sure what's happening here, it looks like ggplot is perhaps trying to add each value of price between each Zscore.
Desired result would be 6 new labels per facet, underneath the existing x axis and at 90 degrees so as to fit comfortably. Also open to suggestions for better ways to present this.
More holistically, I am trying to visualize a log normal distribution and would like to know the actual price values for each Zscore break.
(Note, this post is similar to a post I made yesterday that was already given a solution. The difference here though is that I realized that since I am scaling, I must do this within the groups of cut whereas previously I scaled the entire data frame across all cuts. So it's an additional layer of complexity since I'm doing log transformations and scales within a group)