Individual Scatter Boxplots for very large dataset marking price and product code

#1

UPDATE: I figured out subset will extract the unique product codes and make a new dataset of the unique product code that will later be turned into a scatter boxplot:

AA22 <- subset(product.sales_tibble,product.sales_tibble$Product.Code=="AA22")

But will I have to make a dataset manually like this for every one of the 15,000 unique product codes I have?!? :sweat:

Dataset sample:

Product.Code Product.Name Price

AA22 blah1 $2.12
AA22 blah1 $2.42
AA22 blah1 $4.00
BB33 blah2 $5.54
BB33 blah2 $3.42
BB33 blah2 $4.34
CC23 blah3 $100.23
CC23 blah3 $25.23
CC23 blah3 $105.25

Dataset (Tibble): product.sales_tibble
Variable: Product.Code
Variable: Product.Name
Variable: Price

Request for help (anything helps, websites, tutorials, code):

I would like to create individual horizontal scatter boxplots with whiskers for each Product.Code and corresponding prices: (ex. scatter boxplot for AA22's prices)

Model graphs:



BONUS:
-How to create legend for the average dot on the box plot
-How to include descriptive statistics around the box plot (mean, min, high)
-How to include a vertical red line(s) depicting a price(s) derived from API connection and/or regular dataframe/tibble.

0 Likes

#2

Is this close to what you want?

df <- tibble::tribble(
    ~Product.Code, ~Product.Name, ~Price,
    "AA22",       "blah1",   2.12,
    "AA22",       "blah1",   2.42,
    "AA22",       "blah1",      4,
    "BB33",       "blah2",   5.54,
    "BB33",       "blah2",   3.42,
    "BB33",       "blah2",   4.34,
    "CC23",       "blah3", 100.23,
    "CC23",       "blah3",  25.23,
    "CC23",       "blah3", 105.25
)

library(ggplot2)

ggplot(df, aes(x = "", y = Price, fill = Product.Code)) +
    geom_boxplot() +
    geom_point(position = 'jitter') +
    facet_grid(vars(Product.Code)) +
    coord_flip() +
    labs(x = "")

Created on 2019-02-06 by the reprex package (v0.2.1)

We could give you better help if you provide a reproducible example, A reprex makes it much easier for others to understand your issue and figure out how to help.

0 Likes

#3

@Andresrcs,

I appreciate your help very much. I will check out this reprex and make sure to follow the guidelines from now on. I am working on tweaking this code for my purposes and trying to understand it. I'm much closer to where I need to be with your help. I will get back soon. Stay tuned...

0 Likes

#5

@andresrcs


#This is phase 1 finished product. Amazing, thank you R community! 
#I was not allowed to copy or attach the graph because I am a new user (I think). 

df <- tibble::tribble(
  ~Product.Code, ~Product.Name, ~Price,
  "AA22",       "blah1",   2.12,
  "AA22",       "blah1",   2.42,
  "AA22",       "blah1",      4,
  "AA22",       "blah1",   3.50,
  "AA22",       "blah1",   5.35, 
  "BB33",       "blah2",   5.54,
  "BB33",       "blah2",   3.42,
  "BB33",       "blah2",   4.34,
  "CC23",       "blah3", 100.23,
  "CC23",       "blah3",  25.23,
  "CC23",       "blah3", 105.25
)

df_tibble <- as_tibble(df)
rm("df")

#I  have so much data and hence so many product codes that I had to subset the product codes for 
#each set of product codes to make individual graphs for select product codes. I had too much data
#for @andresrcs original suggested code.  

#L0021
AA22_t <- subset(df_tibble,df_tibble$Product.Code=="AA22")
AA22_t

AA22String <- "AA22"
AA22String

#Boxplot code. You can change the color of the boxplot to your taste. Red dot is the average price.
#Next step is to come up with a legend for red dot representing average prices

library(ggplot2)

ggplot(AA22_t, aes(x = "", y = Price, fill = Product.Code)) + 
  geom_boxplot(fill = 'lightcyan') +
  stat_summary(fun.y=mean, geom = "point", shape = 20, size = 7, color = "red", fill = "red") +
  geom_point(position = 'jitter') +
  coord_flip() +
  labs(x = "") +
  labs(title = AA22_t$Product.Name) +
  labs(subtitle = AA22_t$Product.Code)


#Pastecs includes descriptive stats function stat.desc 
install.packages("pastecs")
library (pastecs)

#To undo scientific notation and to revert back below:
options(scipen=999)
#options(scipen=0)
#Descriptive statistics to 5 decimal places 

options(digits= 5)
stat.desc(AA22_t$Price)

0 Likes

closed #6

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes