Understanding Loop behaviour

I am try to trying to understand how loops in operate. I have a simple dataframe xx which is as follows

COMPANY_NUMBER NUMBER_OF_YEARS
#0070837 3
#0070837 3
#0070837 3
1000403 4
1000403 4
1000403 4
1000403 4
10029943 3
10029943 3
10029943 3
10037980 4
10037980 4
10037980 4
10037980 4
10057418 3
10057418 3
10057418 3
1009550 4
1009550 4
1009550 4
1009550 4

The code I have written is

  while (i  <= nrow(xx1) )
              
          {
          
               for (j in 1:xx1$NUMBER_OF_YEARS[i])
         {
                           xx1$I[i]  <- i
                           xx1$J[j]  <- j
                  xx1$NUMBER_OF_YEARS_j[j] <- xx1$NUMBER_OF_YEARS[j]
           }
          i=i + (xx1$NUMBER_OF_YEARS[i] ) 
                        }

After running the code I want my dataframe to look like

|COMPANY_NUMBER |NUMBER_OF_YEARS| | I| |J|

|#0070837 |3| |1| |1|
|#0070837 |3| |1| |2|
|#0070837 |3| |3| |3|
|1000403 |4| |1| |1|
|1000403 |4| |1| |2|
|1000403 |4| |1| |3|
|1000403 |4| |4| |4|
|10029943 |3| |1| |1|
|10029943 |3| |1| |2|
|10029943 |3| |3| |3|
|10037980 |4| |1| |1|
|10037980 |4| |1| |2|
|10037980 |4| |1| |3|
|10037980 |4| |4| |4|
|10057418 |3| |1| |1|
|10057418 |3| |1| |1|
|10057418 |3| |1| |1|
|1009550 |4| |1| |1|
|1009550 |4| |1| |2|
|1009550 |4| |1| |3|
|1009550 |4| |4| |4|

I get the correct value of I but in the wrong row but the vaule of J is correct in the first iteration and then it goes to 1

Any help will be greatly appreciated

You don't really need a loop to do this. You can used vectorised operations. Check out the code below:

library (dplyr)

xx  %>% 
group_by(company_number, number_of_years)  %>% 
mutate (I = row_number(),
                 J = row_number(),
                I = if_else(I < max(I), 1, I))  %>% 
ungroup()

1 Like

I agree, but get error

suppressPackageStartupMessages({
  library(dplyr)
})

xx <- data.frame(
   COMPANY_NUMBER = c(1000403L,1000403L,1000403L,
                      1000403L,10029943L,10029943L,10029943L,10037980L,
                      10037980L,10037980L,10037980L,10057418L,10057418L,10057418L,
                      1009550L,1009550L,1009550L,1009550L),
  NUMBER_OF_YEARS = c(4L,4L,4L,4L,3L,3L,3L,4L,
                      4L,4L,4L,3L,3L,3L,4L,4L,4L,4L)
)

xx  %>% 
  group_by(COMPANY_NUMBER, NUMBER_OF_YEARS)  %>% 
  mutate (I = row_number(),
          J = row_number(),
          I = if_else(I < max(I), 1, I))  %>% 
  ungroup()
#> Error: Problem with `mutate()` input `I`.
#> x `false` must be a double vector, not an integer vector.
#> ℹ Input `I` is `if_else(I < max(I), 1, I)`.
#> ℹ The error occurred in group 1: COMPANY_NUMBER = 1000403, NUMBER_OF_YEARS = 4.

Anyone fresh to R from one of the procedural languages would expect this to work, but loops in R run in a local, not a global environment. And, worse, each trip through the loop discards state from the previous loop. There is no i++ operator to do that internally. You'd have to set it externally to the loop.

Under the hood, R is implemented either functionally, chaining other functions, or through its limited procedural facility. Mostly, R is exposed to the user as functions.

Procedural language approaches beyond the simplest are candidates for Rcpp or retriculate, which allow objects and functions from C++ and Python, respectively. A package for Go has been rumored. Another approach is to make a system call to another interpreted or compiled language.

It's worth the pain to embrace the vectorized and other functional tools.

Change the last if_else as follows:

if_else(I < max(I), 1L, I)

Change the 1 to 1L to force it to be integer. That should work.

Vishal, thanks for proposing the solution which looks very elegant-I ran it but it didn't work. It gives the following error:

Error: Problem with mutate() input I.
x false must be a double vector, not an integer vector.
i Input I is if_else(I < max(I), 1, I).
i The error occurred in group 1: COMPANY_NUMBER = "#0070837", NUMBER_OF_YEARS = 8.
Run rlang::last_error() to see where the error occurred.

Vishal answered this in the above post