plot aggregated numbers based on cat.

jak123 · September 1, 2021, 5:17pm

Hi R community

I have this data-frame

I want a bar plot of that sums Sales based on Item.group (na's will be deleted)

And I am using ggplot2, can someone help?

cactusoxbird · September 1, 2021, 5:37pm

library(tidyverse)

df <- tibble(
  Item.group = c("Breast", "Breast", "Whole", "Whole", "Whole", "Breast", "Minced", "Thigh"),
  Customer = c("D17", "D17", "D17", "D21", "D7", "D17", "D17", "D17"),
  Date = rep(x = "2018-01-01", times = 8),
  Sales = c(255, 138, 368, NA_integer_, 335, 75, 188, 118),
  Campaign.flag = c(0, 0, 0, 0, 1, 0, 0, 0)
)

df %>%
  group_by(Item.group) %>%
  summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%
  ggplot() +
  geom_bar(aes(x = Item.group, y = sum_sale), stat = "identity")

^{Created on 2021-09-01 by the reprex package (v2.0.0)}

jak123 · September 1, 2021, 5:46pm

Works like a charm cactusoxbird! can we order as well?

jak123 · September 1, 2021, 7:30pm

Say that i want to format sales per item group how do I split the groups into breast, thigh, wholef etc. (1 column for each of the types) so that i can measure sales for each group?

i need a unique time stamp for each product.

cactusoxbird · September 2, 2021, 12:47am

Yep, if I understand correctly then this should take care of the ordering. See comments for info:

library(tidyverse)

df <- tibble(
  Item.group = c("Breast", "Breast", "Whole", "Whole", "Whole", "Breast", "Minced", "Thigh"),
  Customer = c("D17", "D17", "D17", "D21", "D7", "D17", "D17", "D17"),
  Date = rep(x = "2018-01-01", times = 8),
  Sales = c(255, 138, 368, NA_integer_, 335, 75, 188, 118),
  Campaign.flag = c(0, 0, 0, 0, 1, 0, 0, 0)
)

# Reordering the x-axis based on y values:
df %>%
  group_by(Item.group) %>%
  summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%
  ggplot() +
  # You can use reorder() for ascending sorting of values
  geom_bar(aes(x = reorder(Item.group, sum_sale),
               y = sum_sale), stat = "identity") +
  # You might want to change the axis labels as well
  xlab("Item Group") +
  ylab("Sum Sales")

df %>%
  group_by(Item.group) %>%
  summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%
  ggplot() +
  # Alternatively add desc() to reverse the order
  geom_bar(aes(x = reorder(Item.group, sum_sale, FUN = desc),
               y = sum_sale), stat = "identity") +
  xlab("Item Group") +
  ylab("Sum Sales")

Separate columns per group
I'm a little less certain I know what you're looking for here, so let me know if the output below isn't what you're going for. I assumed you wanted to aggregate within Item Groups and within Dates

# Separate columns per group:
df2 <- tibble(
  Item.group = c("Breast", "Breast", "Whole", "Whole", "Whole", "Breast", "Minced", "Thigh"),
  Customer = c("D17", "D17", "D17", "D21", "D7", "D17", "D17", "D17"),
  # Multiple dates this time
  Date = c(rep(x = "2018-01-01", times = 4), rep(x = "2018-01-02", times = 4)),
  Sales = c(255, 138, 368, NA_integer_, 335, 75, 188, 118),
  Campaign.flag = c(0, 0, 0, 0, 1, 0, 0, 0)
)

df2 %>%
  group_by(Date, Item.group) %>%
  # Get summaries per date
  summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%
  # Go from "long" to "wide" format, filling in 0s instead of NAs
  pivot_wider(names_from = "Item.group", values_from = "sum_sale", values_fill = 0)
#> `summarise()` has grouped output by 'Date'. You can override using the `.groups` argument.
#> # A tibble: 2 x 5
#> # Groups:   Date [2]
#>   Date       Breast Whole Minced Thigh
#>   <chr>       <dbl> <dbl>  <dbl> <dbl>
#> 1 2018-01-01    393   368      0     0
#> 2 2018-01-02     75   335    188   118

^{Created on 2021-09-01 by the reprex package (v2.0.0)}

jak123 · September 2, 2021, 8:47am

Hi Cactus

Looks good, if the purpose is to analyse (time series VAR model) the Sales based on item.group this would be the approach (unique time stamp for each product). But can i still add "customer" and "camp. flag." or does it mess up the dimensions?

And thanks a lot Cactus!

jak123 · September 2, 2021, 4:55pm

What if I want a nice table with the aggregated Sales pr customer? can i use tapply?

jak123 · September 2, 2021, 5:15pm

Wrote this:
nr<- aggregate(x = df$Sales,
by = list(unique.values = df$Customer),
FUN = length)

nr1 <- nr %>%
filter(x > 100)

But is there a nicer way?

cactusoxbird · September 4, 2021, 6:07pm

Yep! How about this?

# Counts per customer:
df %>%
  count(Customer)
#> # A tibble: 3 x 2
#>   Customer     n
#>   <chr>    <int>
#> 1 D17          6
#> 2 D21          1
#> 3 D7           1

# or...
count(df, Customer)
#> # A tibble: 3 x 2
#>   Customer     n
#>   <chr>    <int>
#> 1 D17          6
#> 2 D21          1
#> 3 D7           1


count(df, Customer) %>%
  filter(n > 100)
#> # A tibble: 0 x 2
#> # ... with 2 variables: Customer <chr>, n <int>

^{Created on 2021-09-04 by the reprex package (v2.0.0)}

jak123 · September 6, 2021, 7:09am

How does it know that it is in terms of "Sales" that I want to see the Customers?

The last question: I want to see the difference in "sales" based on the different "item groups" 3 month prior to COVID 19 and after COVID 19.

Let say "Date" from 2019-12-13 to 2020-03-13 compared to 2020-03-13 to 2020-06-13

2020-03-13 (YMD) was the day the government closed the country

thx a lot

jak123 · September 6, 2021, 7:59am

haha i keep finding things that i want to do u gave me this code:

df %>%
group_by(Customer) %>%
summarize(sum_sale = sum(Sales, na.rm = TRUE)) %>%

ggplot() +
geom_bar(aes(x = reorder(Customer,  sum_sale, FUN = desc), 
             y= sum_sale), stat = "identity") +
theme_bw() +
labs(title = "fordeling af items")

But is it possible to filter is somehow so that only customers with aggregated sales greater then fx. 100 show ? Customer is a factor with 40 levels

jak123 · September 10, 2021, 5:21am

any good ideas on how to do this?

system · October 1, 2021, 5:22am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.