Individual Scatter Boxplots for very large dataset marking price and product code


UPDATE: I figured out subset will extract the unique product codes and make a new dataset of the unique product code that will later be turned into a scatter boxplot:

AA22 <- subset(product.sales_tibble,product.sales_tibble$Product.Code=="AA22")

But will I have to make a dataset manually like this for every one of the 15,000 unique product codes I have?!? :sweat:

Dataset sample:

Product.Code Product.Name Price

AA22 blah1 $2.12
AA22 blah1 $2.42
AA22 blah1 $4.00
BB33 blah2 $5.54
BB33 blah2 $3.42
BB33 blah2 $4.34
CC23 blah3 $100.23
CC23 blah3 $25.23
CC23 blah3 $105.25

Dataset (Tibble): product.sales_tibble
Variable: Product.Code
Variable: Product.Name
Variable: Price

Request for help (anything helps, websites, tutorials, code):

I would like to create individual horizontal scatter boxplots with whiskers for each Product.Code and corresponding prices: (ex. scatter boxplot for AA22's prices)

Model graphs:

-How to create legend for the average dot on the box plot
-How to include descriptive statistics around the box plot (mean, min, high)
-How to include a vertical red line(s) depicting a price(s) derived from API connection and/or regular dataframe/tibble.


Is this close to what you want?

df <- tibble::tribble(
    ~Product.Code, ~Product.Name, ~Price,
    "AA22",       "blah1",   2.12,
    "AA22",       "blah1",   2.42,
    "AA22",       "blah1",      4,
    "BB33",       "blah2",   5.54,
    "BB33",       "blah2",   3.42,
    "BB33",       "blah2",   4.34,
    "CC23",       "blah3", 100.23,
    "CC23",       "blah3",  25.23,
    "CC23",       "blah3", 105.25


ggplot(df, aes(x = "", y = Price, fill = Product.Code)) +
    geom_boxplot() +
    geom_point(position = 'jitter') +
    facet_grid(vars(Product.Code)) +
    coord_flip() +
    labs(x = "")

Created on 2019-02-06 by the reprex package (v0.2.1)

We could give you better help if you provide a reproducible example, A reprex makes it much easier for others to understand your issue and figure out how to help.



I appreciate your help very much. I will check out this reprex and make sure to follow the guidelines from now on. I am working on tweaking this code for my purposes and trying to understand it. I'm much closer to where I need to be with your help. I will get back soon. Stay tuned...


(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)



#This is phase 1 finished product. Amazing, thank you R community! 
#I was not allowed to copy or attach the graph because I am a new user (I think). 

df <- tibble::tribble(
  ~Product.Code, ~Product.Name, ~Price,
  "AA22",       "blah1",   2.12,
  "AA22",       "blah1",   2.42,
  "AA22",       "blah1",      4,
  "AA22",       "blah1",   3.50,
  "AA22",       "blah1",   5.35, 
  "BB33",       "blah2",   5.54,
  "BB33",       "blah2",   3.42,
  "BB33",       "blah2",   4.34,
  "CC23",       "blah3", 100.23,
  "CC23",       "blah3",  25.23,
  "CC23",       "blah3", 105.25

df_tibble <- as_tibble(df)

#I  have so much data and hence so many product codes that I had to subset the product codes for 
#each set of product codes to make individual graphs for select product codes. I had too much data
#for @andresrcs original suggested code.  

AA22_t <- subset(df_tibble,df_tibble$Product.Code=="AA22")

AA22String <- "AA22"

#Boxplot code. You can change the color of the boxplot to your taste. Red dot is the average price.
#Next step is to come up with a legend for red dot representing average prices


ggplot(AA22_t, aes(x = "", y = Price, fill = Product.Code)) + 
  geom_boxplot(fill = 'lightcyan') +
  stat_summary(fun.y=mean, geom = "point", shape = 20, size = 7, color = "red", fill = "red") +
  geom_point(position = 'jitter') +
  coord_flip() +
  labs(x = "") +
  labs(title = AA22_t$Product.Name) +
  labs(subtitle = AA22_t$Product.Code)

#Pastecs includes descriptive stats function stat.desc 
library (pastecs)

#To undo scientific notation and to revert back below:
#Descriptive statistics to 5 decimal places 

options(digits= 5)