Problem when looping through groups by ID

Hi, I am having some problem creating a for loop which will go systematically through each group in the data frame. I want a loop that adds a new column with values from the sum formula within the loop.

Consider the following structure:

df1 <- data.frame(N = c(1,2,3,4,5,6),
                  name = c("a","b","c","d","e","f"),
                  ID =c(1,1,1,2,2,2),
                  sales = c(100, 250, 300, 50, 600, 390),
                  t   = c(0.1,0.3,0.4,0.05,0.15,0.2),
                  n=c(1,2,3,1,2,3),
                  correct_result = c(-221.4,-27.8,69.1,-143.71,-19.11,43.19))

Each name and corresponding values have one unique ID code.

The formula I am trying to calculate implies that for a given name, say, a, (in the loop i would be=1)
I want to take the sum of sales of all other related names (by their ID) and divide by 1-t for the respective names. The formula itself worked when I only have one ID in the dataset (i.e., in this case 3 rows).

However, my loop is unable to distinguish between the group ID's and thus takes the sum of all values except i. Ideal output is that it takes the sum of all sales which has the same ID (and only those) except i.

df1$output <- 0

for(i in 1:nrow(df1)){                                                                          
  if(df1$ID[i]== df1$ID[i]){
    
    for(k in df1$n){ 
      k = df1$n
      
      df1$output[i] <- sum((df1$sales[k!=i]/(1-df1$t[k!=i]))*(df1$t[i]-df1$t[k!=i])) 
  }}}

I also tried to subset within the loop, this gave me the correct result, but only returned ID=2:

for(i in 1:nrow(df1)){
  p <- df1[df1$ID[i] == df1$ID,]
 
 for(i in 1:nrow(p)){
    if(p$ID[i] == p$ID[i]){
      for(k in p$n){
        k = p$n
        p$d1[i] <- sum((p$sales[k!=i]/(1-p$t[k!=i]))*(p$t[i]-p$t[k!=i])) 
      }}}}

Any help is appreciated,
Thanks!

Can you demonstrate how you get -221.4 for the correct_result in the first row? I do not understand your calculations.

I don't understand your calculations dividing by 1-t etc.
but assuming you know what you want to do with in a group, the general approach would be to write a function that processes a group, and then apply it on each ID, here is an example

library(tidyverse)

df1 <- data.frame(N = c(1,2,3,4,5,6),
                  name = c("a","b","c","d","e","f"),
                  ID =c(1,1,1,2,2,2),
                  sales = c(100, 250, 300, 50, 600, 390),
                  t   = c(0.1,0.3,0.4,0.05,0.15,0.2),
                  n=c(1,2,3,1,2,3),
                  correct_result = c(-221.4,-27.8,69.1,-143.71,-19.11,43.19))


sumforgoup <-function(forname , groupid){

 
  temp <- dplyr::filter(df1,
                 ID == groupid,
                 name!=forname)
  cat("processing" , forname, " of ", groupid,"\n"
      "the other values are:")
  print(temp)
  sum((temp$sales / (1-temp$t)))
}

# see what happens with just one 
sumforgoup(forname = "a",
           groupid = 1)

#do for all
mutate(rowwise(df1),
       result = sumforgoup(name,ID))

Yes; to get correct value in the first row:
(df1$sales[2]/(1-df1$t[2]))*(df1$t[1]-df1$t[2]) + (df1$sales[3]/(1-df1$t[3]))*(df1$t[1]-df1$t[3])

(and for the second and third row:

(df1$sales[1]/(1-df1$t[1]))*(df1$t[2]-df1$t[1]) + (df1$sales[3]/(1-df1$t[3]))*(df1$t[2]-df1$t[3])
(df1$sales[1]/(1-df1$t[1]))*(df1$t[3]-df1$t[1]) + (df1$sales[1]/(1-df1$t[1]))*(df1$t[3]-df1$t[1])

)

full solution

library(tidyverse)

df1 <- tibble(N = c(1,2,3,4,5,6),
                  name = c("a","b","c","d","e","f"),
                  ID =c(1,1,1,2,2,2),
                  sales = c(100, 250, 300, 50, 600, 390),
                  t   = c(0.1,0.3,0.4,0.05,0.15,0.2),
                  n=c(1,2,3,1,2,3),
                  correct_result = c(-221.4,-27.8,69.1,-143.71,-19.11,43.19)) %>% mutate(ovt=
                                                                                           sales/(1-t))

sumforgoup <-function(forname , groupid){

  key_t <- dplyr::filter(df1,
                        ID == groupid,
                        name==forname) %>% pull(t)
  
  temp <- dplyr::filter(df1,
                 ID == groupid,
                 name!=forname) %>% mutate(diff_key_t=
                                             key_t - t)

  sum(temp$ovt*temp$diff_key_t)
}

mutate(rowwise(df1),
       result = sumforgoup(name,ID))
2 Likes

Really appreciate it,
Thank you!

Hi, I tried to apply this approach on a larger dataset with approximately 30.000 rows, and thus more unique ID's. However, it seems like the function goes in loops and are unable to return numbers at all..
Maybe you have some recommendations on how to proceed?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.