plot aggregated numbers based on cat.

Hi R community

I have this data-frame

I want a bar plot of that sums Sales based on Item.group (na's will be deleted)

And I am using ggplot2, can someone help?

library(tidyverse)

df <- tibble(
  Item.group = c("Breast", "Breast", "Whole", "Whole", "Whole", "Breast", "Minced", "Thigh"),
  Customer = c("D17", "D17", "D17", "D21", "D7", "D17", "D17", "D17"),
  Date = rep(x = "2018-01-01", times = 8),
  Sales = c(255, 138, 368, NA_integer_, 335, 75, 188, 118),
  Campaign.flag = c(0, 0, 0, 0, 1, 0, 0, 0)
)

df %>%
  group_by(Item.group) %>%
  summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%
  ggplot() +
  geom_bar(aes(x = Item.group, y = sum_sale), stat = "identity")

Created on 2021-09-01 by the reprex package (v2.0.0)

Works like a charm cactusoxbird! can we order as well? :slight_smile:

Say that i want to format sales per item group how do I split the groups into breast, thigh, wholef etc. (1 column for each of the types) so that i can measure sales for each group?

i need a unique time stamp for each product.

Yep, if I understand correctly then this should take care of the ordering. See comments for info:

library(tidyverse)

df <- tibble(
  Item.group = c("Breast", "Breast", "Whole", "Whole", "Whole", "Breast", "Minced", "Thigh"),
  Customer = c("D17", "D17", "D17", "D21", "D7", "D17", "D17", "D17"),
  Date = rep(x = "2018-01-01", times = 8),
  Sales = c(255, 138, 368, NA_integer_, 335, 75, 188, 118),
  Campaign.flag = c(0, 0, 0, 0, 1, 0, 0, 0)
)

# Reordering the x-axis based on y values:
df %>%
  group_by(Item.group) %>%
  summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%
  ggplot() +
  # You can use reorder() for ascending sorting of values
  geom_bar(aes(x = reorder(Item.group, sum_sale),
               y = sum_sale), stat = "identity") +
  # You might want to change the axis labels as well
  xlab("Item Group") +
  ylab("Sum Sales")

df %>%
  group_by(Item.group) %>%
  summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%
  ggplot() +
  # Alternatively add desc() to reverse the order
  geom_bar(aes(x = reorder(Item.group, sum_sale, FUN = desc),
               y = sum_sale), stat = "identity") +
  xlab("Item Group") +
  ylab("Sum Sales")

Separate columns per group
I'm a little less certain I know what you're looking for here, so let me know if the output below isn't what you're going for. I assumed you wanted to aggregate within Item Groups and within Dates

# Separate columns per group:
df2 <- tibble(
  Item.group = c("Breast", "Breast", "Whole", "Whole", "Whole", "Breast", "Minced", "Thigh"),
  Customer = c("D17", "D17", "D17", "D21", "D7", "D17", "D17", "D17"),
  # Multiple dates this time
  Date = c(rep(x = "2018-01-01", times = 4), rep(x = "2018-01-02", times = 4)),
  Sales = c(255, 138, 368, NA_integer_, 335, 75, 188, 118),
  Campaign.flag = c(0, 0, 0, 0, 1, 0, 0, 0)
)

df2 %>%
  group_by(Date, Item.group) %>%
  # Get summaries per date
  summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%
  # Go from "long" to "wide" format, filling in 0s instead of NAs
  pivot_wider(names_from = "Item.group", values_from = "sum_sale", values_fill = 0)
#> `summarise()` has grouped output by 'Date'. You can override using the `.groups` argument.
#> # A tibble: 2 x 5
#> # Groups:   Date [2]
#>   Date       Breast Whole Minced Thigh
#>   <chr>       <dbl> <dbl>  <dbl> <dbl>
#> 1 2018-01-01    393   368      0     0
#> 2 2018-01-02     75   335    188   118

Created on 2021-09-01 by the reprex package (v2.0.0)

Hi Cactus

Looks good, if the purpose is to analyse (time series VAR model) the Sales based on item.group this would be the approach (unique time stamp for each product). But can i still add "customer" and "camp. flag." or does it mess up the dimensions?

And thanks a lot Cactus!

What if I want a nice table with the aggregated Sales pr customer? can i use tapply?

Wrote this:
nr<- aggregate(x = df$Sales,
by = list(unique.values = df$Customer),
FUN = length)

nr1 <- nr %>%
filter(x > 100)

But is there a nicer way?:slight_smile:

Yep! How about this?

# Counts per customer:
df %>%
  count(Customer)
#> # A tibble: 3 x 2
#>   Customer     n
#>   <chr>    <int>
#> 1 D17          6
#> 2 D21          1
#> 3 D7           1

# or...
count(df, Customer)
#> # A tibble: 3 x 2
#>   Customer     n
#>   <chr>    <int>
#> 1 D17          6
#> 2 D21          1
#> 3 D7           1


count(df, Customer) %>%
  filter(n > 100)
#> # A tibble: 0 x 2
#> # ... with 2 variables: Customer <chr>, n <int>

Created on 2021-09-04 by the reprex package (v2.0.0)

How does it know that it is in terms of "Sales" that I want to see the Customers?

The last question: I want to see the difference in "sales" based on the different "item groups" 3 month prior to COVID 19 and after COVID 19.

Let say "Date" from 2019-12-13 to 2020-03-13 compared to 2020-03-13 to 2020-06-13

2020-03-13 (YMD) was the day the government closed the country :slight_smile:

thx a lot :slight_smile:

haha i keep finding things that i want to do :slight_smile: u gave me this code:

df %>%
group_by(Customer) %>%
summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%

ggplot() +
geom_bar(aes(x = reorder(Customer,  sum_sale, FUN = desc), 
             y= sum_sale), stat = "identity") +
theme_bw() +
labs(title = "fordeling af items")

But is it possible to filter is somehow so that only customers with aggregated sales greater then fx. 100 show ? Customer is a factor with 40 levels :slight_smile:

any good ideas on how to do this? :smiley:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.