How can I use calculated values by formula in a new column for other rows in new column in R?

MiladBagheri · September 26, 2020, 4:02pm

I have one column, A, that I need to use it to make a new column:

DT<- data.table(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3), B=0)

In a new column, B, I want to use a formula to separate every 1...9. So, the output will be like this:

I did in Excel with the below formula: = IF(A2=1,IF(A1>1,B1+1,B1),B1

I tried this formula in R with different methods, shift(), lag(), ...:

DT$B = ifelse(DT$A == 1, ifelse(shift(DT$A, 1L, type="lag")>1, 
              DT$B<- 1+shift(DT$B, 1L, type="lag"), 
              DT$B<-shift(DT$B, 1L, type="lag")),
              DT$B<-shift(DT$B, 1L, type="lag"))

The results for the above formula was:

I would appreciate if you could help me. Thanks, Milad

GreyMerchant · September 26, 2020, 6:16pm

Hello there,

The quickest way to achieve the desired output would simply be to make use of rep along with sorting it. Let me know if this solves your problem

library(data.table)

DT<- data.table(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3), B= sort(rep(1:3,9)))
#> Warning in as.data.table.list(x, keep.rownames = keep.rownames, check.names
#> = check.names, : Item 1 has 21 rows but longest item has 27; recycled with
#> remainder.

DT
#>     A B
#>  1: 1 1
#>  2: 2 1
#>  3: 3 1
#>  4: 4 1
#>  5: 5 1
#>  6: 6 1
#>  7: 7 1
#>  8: 8 1
#>  9: 9 1
#> 10: 1 2
#> 11: 2 2
#> 12: 3 2
#> 13: 4 2
#> 14: 5 2
#> 15: 6 2
#> 16: 7 2
#> 17: 8 2
#> 18: 9 2
#> 19: 1 3
#> 20: 2 3
#> 21: 3 3
#> 22: 1 3
#> 23: 2 3
#> 24: 3 3
#> 25: 4 3
#> 26: 5 3
#> 27: 6 3
#>     A B

^{Created on 2020-09-26 by the reprex package (v0.3.0)}

mkulkarni · September 26, 2020, 6:27pm

That works. Thanks, Man

MiladBagheri · September 26, 2020, 11:01pm

Thanks, That's great!
Could you please give me some advice if I have more than one replicate for each number in A? for example: [B is my expect output]
DT<- data.table(A=c(1,1,1,1,2,3,4,4,5,6,7,8,8,9,1,2,3,4,5,5,5,6,6,6,7,7,7,8,9,1,2,2,2,3))
The number of each value might be different.
The expected output [B] is:

A 1,1,1,1,2,3,4,4,5,6,7,8,8,9,1,2,3,4,5,5,5,6,6,6,7,7,7,8,9,1,2,2,2,3
B 1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3

Thanks again,
Milad

srbhgaur · September 27, 2020, 12:23am

After shifting the column(col1), calculate the difference in the columns. This is A- col1.
Negative values mark the beginning of new series. Mark those as 1 (col2 here) and take cumulative sum.

library(data.table)
DT<- data.table(A=c(1,1,1,1,2,3,4,4,5,6,7,8,8,9,1,2,3,4,5,5,5,6,6,6,7,7,7,8,9,1,2,2,2,3))
DT[,col1:=shift(A,1,fill = 1000)][,col2:=ifelse(A-col1<0,1,0)][,B:=cumsum(col2)]
DT
#>     A col1 col2 B
#>  1: 1 1000    1 1
#>  2: 1    1    0 1
#>  3: 1    1    0 1
#>  4: 1    1    0 1
#>  5: 2    1    0 1
#>  6: 3    2    0 1
#>  7: 4    3    0 1
#>  8: 4    4    0 1
#>  9: 5    4    0 1
#> 10: 6    5    0 1
#> 11: 7    6    0 1
#> 12: 8    7    0 1
#> 13: 8    8    0 1
#> 14: 9    8    0 1
#> 15: 1    9    1 2
#> 16: 2    1    0 2
#> 17: 3    2    0 2
#> 18: 4    3    0 2
#> 19: 5    4    0 2
#> 20: 5    5    0 2
#> 21: 5    5    0 2
#> 22: 6    5    0 2
#> 23: 6    6    0 2
#> 24: 6    6    0 2
#> 25: 7    6    0 2
#> 26: 7    7    0 2
#> 27: 7    7    0 2
#> 28: 8    7    0 2
#> 29: 9    8    0 2
#> 30: 1    9    1 3
#> 31: 2    1    0 3
#> 32: 2    2    0 3
#> 33: 2    2    0 3
#> 34: 3    2    0 3
#>     A col1 col2 B

^{Created on 2020-09-26 by the reprex package (v0.3.0)}

MiladBagheri · September 27, 2020, 1:08am

Thank you so much, @srbhgaur

system · October 4, 2020, 1:08am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.