Help with aggregating values on many to many relationship

Hello!

I have a data where each part is associated with many models. Its like many to many relationship. I am trying to aggregate total sales of models per part without aggregating the sales of part itself as sales of part remains same for a given time. Example, Sales of Part A was 100 in Jan 2018 and Sales of Model 1& 2 was 1000 & 2000 respectively for the same time period and thus, total sales of models for part A was 3000 in Jan 2018.

How do I aggregate sales of models per part in a way that sales of part A still remains 100 in above example and not 200? Of course, I can create another table and perform aggregation separately. But would like to see if it can be done within the same table without creating several tables based on several other columns not shown in the example below.

Thanks for your help!

library(tidyverse)
library(lubridate)

vol <- data.frame(
  Date = c("2018 Jan","2018 Jan","2018 Jan","2018 Jan",
           "2018 Feb","2018 Feb","2018 Feb","2018 Feb",
           "2019 Jan","2019 Jan","2019 Jan","2019 Jan",
           "2019 Feb","2019 Feb","2019 Feb","2019 Feb"),
  
  Country = c("CA","CA","US","US", "CA","CA","US","US", "CA","CA","US","US", "CA","CA","US","US"),
  
  Part = c("A", "A", "B", "B",  "A", "A", "B", "B", "A", "A", "B", "B", "A", "A", "B", "B"),
  
 `Part Sales` = c(100, 100, 110, 110, 120, 120, 90, 90, 200, 200, 190, 190, 110, 110, 200, 200),
  Model = c("M1", "M2", "M1", "M3", "M1", "M3", "M2", "M3", "M1", "M2", "M1", "M3", "M1", "M3", "M2", "M3"),
  `Model Sales` = c(1000, 2000, 1000, 3000, 1120, 3120, 2090, 3120, 1200, 2200, 1200, 3120, 1110, 3110, 2200, 3110)
)

vol$Date <- ym(vol$Date)

I am not sure I understand your goal. Is this what you are after?

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

vol <- data.frame(
  Date = c("2018 Jan","2018 Jan","2018 Jan","2018 Jan",
           "2018 Feb","2018 Feb","2018 Feb","2018 Feb",
           "2019 Jan","2019 Jan","2019 Jan","2019 Jan",
           "2019 Feb","2019 Feb","2019 Feb","2019 Feb"),
  
  Country = c("CA","CA","US","US", "CA","CA","US","US", "CA","CA","US","US", "CA","CA","US","US"),
  
  Part = c("A", "A", "B", "B",  "A", "A", "B", "B", "A", "A", "B", "B", "A", "A", "B", "B"),
  
  `Part Sales` = c(100, 100, 110, 110, 120, 120, 90, 90, 200, 200, 190, 190, 110, 110, 200, 200),
  Model = c("M1", "M2", "M1", "M3", "M1", "M3", "M2", "M3", "M1", "M2", "M1", "M3", "M1", "M3", "M2", "M3"),
  `Model Sales` = c(1000, 2000, 1000, 3000, 1120, 3120, 2090, 3120, 1200, 2200, 1200, 3120, 1110, 3110, 2200, 3110)
)

vol$Date <- ym(vol$Date)

vol |> group_by(Date, Part, Part.Sales) |> summarize(Total = sum(Model.Sales))
#> `summarise()` has grouped output by 'Date', 'Part'. You can override using the `.groups` argument.
#> # A tibble: 8 x 4
#> # Groups:   Date, Part [8]
#>   Date       Part  Part.Sales Total
#>   <date>     <chr>      <dbl> <dbl>
#> 1 2018-01-01 A            100  3000
#> 2 2018-01-01 B            110  4000
#> 3 2018-02-01 A            120  4240
#> 4 2018-02-01 B             90  5210
#> 5 2019-01-01 A            200  3400
#> 6 2019-01-01 B            190  4320
#> 7 2019-02-01 A            110  4220
#> 8 2019-02-01 B            200  5310

Created on 2022-05-06 by the reprex package (v2.0.1)

Yes, this is the output on Part Sales & Total I am looking for. But not in a separate dataset. Wanted to see if I can still have same output in existing dataset, if at all possible to avoid maintaining different datasets.

Like this?

vol <- data.frame(
   Date = c("2018 Jan","2018 Jan","2018 Jan","2018 Jan",
            "2018 Feb","2018 Feb","2018 Feb","2018 Feb",
            "2019 Jan","2019 Jan","2019 Jan","2019 Jan",
            "2019 Feb","2019 Feb","2019 Feb","2019 Feb"),
   
   Country = c("CA","CA","US","US", "CA","CA","US","US", "CA","CA","US","US", "CA","CA","US","US"),
   
   Part = c("A", "A", "B", "B",  "A", "A", "B", "B", "A", "A", "B", "B", "A", "A", "B", "B"),
   
   `Part Sales` = c(100, 100, 110, 110, 120, 120, 90, 90, 200, 200, 190, 190, 110, 110, 200, 200),
   Model = c("M1", "M2", "M1", "M3", "M1", "M3", "M2", "M3", "M1", "M2", "M1", "M3", "M1", "M3", "M2", "M3"),
   `Model Sales` = c(1000, 2000, 1000, 3000, 1120, 3120, 2090, 3120, 1200, 2200, 1200, 3120, 1110, 3110, 2200, 3110)
 )
 
vol$Date <- ym(vol$Date)
 
vol <- vol |> group_by(Date, Part, Part.Sales) |> 
   mutate(Total = sum(Model.Sales))
vol
# A tibble: 16 x 7
# Groups:   Date, Part, Part.Sales [8]
   Date       Country Part  Part.Sales Model Model.Sales Total
   <date>     <chr>   <chr>      <dbl> <chr>       <dbl> <dbl>
 1 2018-01-01 CA      A            100 M1           1000  3000
 2 2018-01-01 CA      A            100 M2           2000  3000
 3 2018-01-01 US      B            110 M1           1000  4000
 4 2018-01-01 US      B            110 M3           3000  4000
 5 2018-02-01 CA      A            120 M1           1120  4240
 6 2018-02-01 CA      A            120 M3           3120  4240
 7 2018-02-01 US      B             90 M2           2090  5210
 8 2018-02-01 US      B             90 M3           3120  5210
 9 2019-01-01 CA      A            200 M1           1200  3400
10 2019-01-01 CA      A            200 M2           2200  3400
11 2019-01-01 US      B            190 M1           1200  4320
12 2019-01-01 US      B            190 M3           3120  4320
13 2019-02-01 CA      A            110 M1           1110  4220
14 2019-02-01 CA      A            110 M3           3110  4220
15 2019-02-01 US      B            200 M2           2200  5310
16 2019-02-01 US      B            200 M3           3110  5310

Yes, thank you @FJCC !

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.