Problem when looping through groups by ID

cotmb2 · September 21, 2020, 12:43pm

Hi, I am having some problem creating a for loop which will go systematically through each group in the data frame. I want a loop that adds a new column with values from the sum formula within the loop.

Consider the following structure:

df1 <- data.frame(N = c(1,2,3,4,5,6),
                  name = c("a","b","c","d","e","f"),
                  ID =c(1,1,1,2,2,2),
                  sales = c(100, 250, 300, 50, 600, 390),
                  t   = c(0.1,0.3,0.4,0.05,0.15,0.2),
                  n=c(1,2,3,1,2,3),
                  correct_result = c(-221.4,-27.8,69.1,-143.71,-19.11,43.19))

Each name and corresponding values have one unique ID code.

The formula I am trying to calculate implies that for a given name, say, a, (in the loop i would be=1)
I want to take the sum of sales of all other related names (by their ID) and divide by 1-t for the respective names. The formula itself worked when I only have one ID in the dataset (i.e., in this case 3 rows).

However, my loop is unable to distinguish between the group ID's and thus takes the sum of all values except i. Ideal output is that it takes the sum of all sales which has the same ID (and only those) except i.

df1$output <- 0

for(i in 1:nrow(df1)){                                                                          
  if(df1$ID[i]== df1$ID[i]){
    
    for(k in df1$n){ 
      k = df1$n
      
      df1$output[i] <- sum((df1$sales[k!=i]/(1-df1$t[k!=i]))*(df1$t[i]-df1$t[k!=i])) 
  }}}

I also tried to subset within the loop, this gave me the correct result, but only returned ID=2:

for(i in 1:nrow(df1)){
  p <- df1[df1$ID[i] == df1$ID,]
 
 for(i in 1:nrow(p)){
    if(p$ID[i] == p$ID[i]){
      for(k in p$n){
        k = p$n
        p$d1[i] <- sum((p$sales[k!=i]/(1-p$t[k!=i]))*(p$t[i]-p$t[k!=i])) 
      }}}}

Any help is appreciated,
Thanks!

FJCC · September 21, 2020, 1:05pm

Can you demonstrate how you get -221.4 for the correct_result in the first row? I do not understand your calculations.

nirgrahamuk · September 21, 2020, 1:08pm

I don't understand your calculations dividing by 1-t etc.
but assuming you know what you want to do with in a group, the general approach would be to write a function that processes a group, and then apply it on each ID, here is an example

library(tidyverse)

df1 <- data.frame(N = c(1,2,3,4,5,6),
                  name = c("a","b","c","d","e","f"),
                  ID =c(1,1,1,2,2,2),
                  sales = c(100, 250, 300, 50, 600, 390),
                  t   = c(0.1,0.3,0.4,0.05,0.15,0.2),
                  n=c(1,2,3,1,2,3),
                  correct_result = c(-221.4,-27.8,69.1,-143.71,-19.11,43.19))


sumforgoup <-function(forname , groupid){

 
  temp <- dplyr::filter(df1,
                 ID == groupid,
                 name!=forname)
  cat("processing" , forname, " of ", groupid,"\n"
      "the other values are:")
  print(temp)
  sum((temp$sales / (1-temp$t)))
}

# see what happens with just one 
sumforgoup(forname = "a",
           groupid = 1)

#do for all
mutate(rowwise(df1),
       result = sumforgoup(name,ID))

cotmb2 · September 21, 2020, 1:13pm

Yes; to get correct value in the first row:
(df1$sales[2]/(1-df1$t[2]))*(df1$t[1]-df1$t[2]) + (df1$sales[3]/(1-df1$t[3]))*(df1$t[1]-df1$t[3])

cotmb2 · September 21, 2020, 1:21pm

(and for the second and third row:

(df1$sales[1]/(1-df1$t[1]))*(df1$t[2]-df1$t[1]) + (df1$sales[3]/(1-df1$t[3]))*(df1$t[2]-df1$t[3])
(df1$sales[1]/(1-df1$t[1]))*(df1$t[3]-df1$t[1]) + (df1$sales[1]/(1-df1$t[1]))*(df1$t[3]-df1$t[1])

)

nirgrahamuk · September 21, 2020, 1:24pm

full solution

library(tidyverse)

df1 <- tibble(N = c(1,2,3,4,5,6),
                  name = c("a","b","c","d","e","f"),
                  ID =c(1,1,1,2,2,2),
                  sales = c(100, 250, 300, 50, 600, 390),
                  t   = c(0.1,0.3,0.4,0.05,0.15,0.2),
                  n=c(1,2,3,1,2,3),
                  correct_result = c(-221.4,-27.8,69.1,-143.71,-19.11,43.19)) %>% mutate(ovt=
                                                                                           sales/(1-t))

sumforgoup <-function(forname , groupid){

  key_t <- dplyr::filter(df1,
                        ID == groupid,
                        name==forname) %>% pull(t)
  
  temp <- dplyr::filter(df1,
                 ID == groupid,
                 name!=forname) %>% mutate(diff_key_t=
                                             key_t - t)

  sum(temp$ovt*temp$diff_key_t)
}

mutate(rowwise(df1),
       result = sumforgoup(name,ID))

cotmb2 · September 21, 2020, 3:57pm

Really appreciate it,
Thank you!

cotmb2 · September 23, 2020, 4:31pm

Hi, I tried to apply this approach on a larger dataset with approximately 30.000 rows, and thus more unique ID's. However, it seems like the function goes in loops and are unable to return numbers at all..
Maybe you have some recommendations on how to proceed?

system · September 30, 2020, 4:31pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.