Check if there are products that do not sell in some location?

Screenshot 2022-07-12 092059

Hi, i have a dataframe as shown above, i need to check if there is product that do not sell in some region but i dont figure out which functions to use.
I thought of create 2 lists of products, one is unique values from product_name and another from each region and then put them in comparison.
Can someone help me? Thanks in advance!

First, it would be easier to help if you provided a reprex.

Second, I assume in this dataframe, each row is a sale. Are you looking for products that did not even have a single sale in a region, or is it real-life data and there could always be exceptions? In the first case, you can use table(region, product_id) to get exact numbers, or, for a specific product, something like:

sum(df$product_id == "FUR-BO-10001789" & df$region == "West")

In the second case (real-life, approximate data), I would suggest to start with plotting. Perhaps you could do:

ggplot(df) +
  geom_bar(aes(x=product_id, y = ..count.., fill = region),
           position = position_stack())

Note that for that approach, if you have enough data, it might be best to first separate your dataframe into an exploration set and a validation set: you can use the first one to make plots and formulate hypotheses, then the second one to check whether these hypotheses are true on independent data. You can do that with something like:

rows_exploratory_subset <- sample(nrow(df), floor(nrow(df)/2))

exploratory_df <- df[rows_exploratory_subset, ]
validation_df <- df[-rows_exploratory_subset, ]

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.