How to calculate weighted average

I am going to start with an example:

tibble::tribble(
         ~Date, ~Material, ~Quantity,
  "2020-01-01",   "nails",       10L,
  "2020-01-01",   "nails",      100L,
  "2020-02-02",   "nails",       50L,
  "2020-02-02",   "nails",       70L,
  "2020-01-01",  "hammer",       20L,
  "2020-01-01",  "hammer",       80L,
  "2020-02-02",  "hammer",       20L,
  "2020-02-02",  "hammer",       50L
  )

This is the example data set. The original data set is way longer with 100+ items and dates.

  1. First I subset data for one day
  2. Sum all item quantity in that day
  3. Divide each item quantity with the sum(from previous step), lets call the result weight
  4. multiply each items quantity with the corresponding weight and sum the results for each item.
  5. Divide result from step 4 with the sum of all weights.
  6. Repeat for all days.

The result from the example should be:

  1. First all items in day 2020-01-01
  2. 10 + 100 + 20 + 80 = 210
  3. 10/210 = 0.048, 100/210 = 0.48, 20/210 = 0.095, 80/210 = 0.38
  4. 10 * 0.048 + 100 * 0.48 + 20 * 0.095 + 80 * 0.38 = 0.48 + 48 + 1.9 + 30.4 = 80.78
  5. 80.78/(0.048 + 0.48 + 0.095 + 0.38) = 80.78/1.003 = 80.54
  6. And the same process for 2020-02-02

I though that maybe its doable with a for loop, this is how far I got and now I am stuck:

inv <- read.csv("Book1.csv")
inv$Date <- as.Date(inv$Date)
n <- unique(inv$Date)
item <- vector()
length(n)
for (i in 1:length(n)){
  day <- subset(inv, subset = (Date == n[i]))
  day_total <- sum(day$Quantity)
  m <- unique(day$Material)
  for (j in 1:length(m)){
    material <- subset(day, subset = (Material == m[j]))
    material_total <- sum(material$Quantity)
    material_weight <- material_total/day_total
    item[j] <- material_total/day_total
    
    
  }
}

If you have any suggestion it will be greatly appreciated

I'm not sure I understood all the steps correctly. This gets through step 4. For step 5, isn't the sum of all the weights equal to one for each day?

library(dplyr)

DF <- tibble::tribble(
  ~Date, ~Material, ~Quantity,
  "2020-01-01",   "nails",       10L,
  "2020-01-01",   "nails",      100L,
  "2020-02-02",   "nails",       50L,
  "2020-02-02",   "nails",       70L,
  "2020-01-01",  "hammer",       20L,
  "2020-01-01",  "hammer",       80L,
  "2020-02-02",  "hammer",       20L,
  "2020-02-02",  "hammer",       50L
)

DFnew <- DF |> group_by(Date) |> #step 1
  mutate(DailySum = sum(Quantity), #Step 2
         Weight = Quantity/DailySum, #Step 3
         ScaledQuantity = Quantity * Weight) #Step 4a
Totals <- DFnew |> group_by(Date, Material) |> 
  summarize(TotalSclQuant = sum(ScaledQuantity)) #Step 4b
#> `summarise()` has grouped output by 'Date'. You can override using the `.groups`
#> argument.
Totals
#> # A tibble: 4 x 3
#> # Groups:   Date [2]
#>   Date       Material TotalSclQuant
#>   <chr>      <chr>            <dbl>
#> 1 2020-01-01 hammer            32.4
#> 2 2020-01-01 nails             48.1
#> 3 2020-02-02 hammer            15.3
#> 4 2020-02-02 nails             38.9

Created on 2022-05-12 by the reprex package (v2.0.1)

1 Like

FJCC once again you have come to my rescue :smiley: I can't thank you enough. Yes the weights add up to one for a date, an error I made. But from the code you provided I can easily adjust it to fix the problem. Thank you, Thank you, thank you!!!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.