sharing out a variable

Windows2000 · February 14, 2023, 9:22pm

Hello,

I need some help with a database manipulation. This might seem trivial but I really haven't been able to handle it.

I have a database df that follows the below pattern. It happens that I need to breakdown the "marketing" value in df$team into three components "tv", "social_media" and "podcast" , as in df2.

I need to apply a share factor to the budget value, that is, to break down the marketing value into 3 shares. This shares are region specific.

I really appreciate any advise on this.

df <- data.frame(Region   = rep(c("a","b","c","d","e"),2),
                 team   = c(rep(c("wages","marketing"),5)),
                 budget = rnorm(10,100,1)
                 )

df2 <-  data.frame(Region = c(rep("a",4), rep("d",4), rep("c",4), rep("d",4), rep("e",4)),
                   team   = c(rep(c("wages","tv","social_med","podcast"),5)),
                   budget = c(rep(c(rnorm(1,100,1),
                                    rnorm(3,33,3)))))
df2$budget <- jitter(df2$budget, factor = 1)

shares <- data.frame(
  s_a= rnorm(3,33.3,2),
  s_b= rnorm(3,33.3,3),
  s_c= rnorm(3,33.3,1),
  s_d= rnorm(3,33.3,0.5),
  s_e= rnorm(3,33.3,2))

Windows2000 · February 15, 2023, 3:43pm

What exactly does that mean?

scottyd22 · February 15, 2023, 4:47pm

Below is one approach I believe gets at what you're trying to accomplish. The example assumes "share factor" to be the portion of one marketing component relative the sum of all marketing components (by region). The sum of "share factors" for each region thus sum to 100%. I hope this is the correct interpretation.

library(tidyverse)
set.seed(123)

df <- data.frame(Region   = rep(c("a","b","c","d","e"),2),
                 team   = c(rep(c("wages","marketing"),5)),
                 budget = rnorm(10,100,1)
                 )

shares <- data.frame(
  team = 'marketing',
  component = c('tv', 'social media', 'podcast'),
  a= rnorm(3,33.3,2),
  b= rnorm(3,33.3,3),
  c= rnorm(3,33.3,1),
  d= rnorm(3,33.3,0.5),
  e= rnorm(3,33.3,2)
  ) %>%
  pivot_longer(cols = c(-'team', -'component'),
               names_to = 'Region',
               values_to = 'share_factor') %>%
  arrange(Region) %>%
  group_by(Region) %>%
  mutate(share_factor = share_factor/sum(share_factor)) %>%
  ungroup()

# combine
final = df %>%
  left_join(shares) %>%
  arrange(Region) %>%
  mutate(budget_new = ifelse(is.na(share_factor),
                             budget, 
                             budget * share_factor)
         ) %>%
  mutate(component = ifelse(is.na(component), team, component)) %>%
  arrange(Region, desc(component)) %>%
  select(Region, team = component, budget_new)

final
#>    Region         team budget_new
#> 1       a        wages   99.43952
#> 2       a           tv   35.00674
#> 3       a social media   33.31405
#> 4       a      podcast   33.39427
#> 5       b        wages  100.46092
#> 6       b           tv   32.28727
#> 7       b social media   30.36766
#> 8       b      podcast   37.11489
#> 9       c        wages  101.55871
#> 10      c           tv   33.66228
#> 11      c social media   31.20770
#> 12      c      podcast   33.86497
#> 13      d        wages   99.31315
#> 14      d           tv   33.41414
#> 15      d social media   33.11347
#> 16      d      podcast   33.54290
#> 17      e        wages  100.12929
#> 18      e           tv   32.69780
#> 19      e social media   33.31960
#> 20      e      podcast   33.53694

Windows2000 · February 16, 2023, 9:37pm

Hello!

I implemented that solution worked perfectly! And yes, the shares column is taken from a convex set of elements who must add up to 1.

Thank you for taking the time to answer, I was getting a bit frustrated with this one

system · February 23, 2023, 9:37pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.