Pivot table logic in R

Hello, what would your recommended chart type be for this number of variables? The requirement is to show in terms of quantity ALL products sold in Portugal. I have the below query to pull the results - but do not know the best way to present circa 700 items!

data <- read.csv("onlineretail1.csv")
data_portugal <- data[data$Country == "Portugal",]
library(dplyr)
data_portugal %>% group_by(Description) %>% summarise(Sold_Qty = sum(Quantity)) ->first_plot

Appreciate your help in advance.

In my experience, not all "requirements" are created equal. I expect poor requirements , even when satisfied to result in unsatisfactory outcomes. I always challenge requirements.

Part of being an expert is reflecting back your expertise to your stakeholders as to why they shouldn't want what they first said they want.

It's actually better to understand the rational and motives behind original requirements, as that allows collaboration towards agreeing fresh requirements that are more likely to satisfy an actual need.

I completely agree and following conversations with the business about exactly how unwieldy this visual will be, we have discussed additional data categories that can be drilled down. However, in order to make the point I have been asked to still create the visual and I am at a loss, except for have a long, long bar chart

Another idea would be to see if your top most sold products make up the lion share of the sales. For example, you could list your top 25 products individually, and then group everything else into an "other" label.

I would use a line chart if the individual points must be shown and I suggest using a histogram if the range of values is not too large.

library(ggplot2)
DF <- data.frame(Prod = paste0("P", 1:700), Sales = runif(700, 100, 1000))
ggplot(DF, aes(Prod, Sales, group = 1)) + geom_line() +
  theme(axis.text.x = element_blank())


ggplot(DF, aes(Sales)) + geom_histogram(binwidth = 100, fill = "skyblue", color = "white")

Created on 2021-03-29 by the reprex package (v0.3.0)

On a similar line of thought are you aware of any functions that could 'group' items based on common occurrences of a word? For example, 'mug' or 'cake'?

Oh man, I do! @julia 's and @drob 's super-cool tidytext package will let you separate the words in each description into their own record. So you can have a table with a product_id and word variables, that you can then analyze and decide which product IDs you wish to group. 1 The tidy text format | Text Mining with R

1 Like

In reading up around this I have also come across k means clustering - if anyone has any exposure to this, would it be suitable for this please?