first occurrence of a string value within a group

DuyTran16 · May 5, 2020, 1:42am

I have an example data below where I would like to flag in a new variable the first occurrence where VAR does not equal to "1" within each SUBJ. Can you help?

data <- read.table(header = T, text = "
SUBJ	TIME	VAR
1	TIME1	1
1	TIME2	1
1	TIME3	1
1	TIME4	1
1	TIME5	1
1	TIME6	1
1	TIME7	5
1	TIME8	1
1	TIME9	3
1	TIME10	1
2	TIME1	1
2	TIME2	1
2	TIME3	6
2	TIME4	1
2	TIME5	3
2	TIME6	2
2	TIME7	1
2	TIME8	1
3	TIME1	1
3	TIME2	1
3	TIME3	1
3	TIME4	4
3	TIME5	2
3	TIME6	1
3	TIME7	1
3	TIME8	8
4	TIME1	1
5	TIME1	1
5	TIME2	1
5	TIME3	2
5	TIME4	1
5	TIME5	4
5	TIME6	1
")

FJCC · May 5, 2020, 2:38am

Here is one approach. for SUBJ 4 it returns an NA since there is no case of VAR != 1. That is easily fixed if it is a problem.

data <- read.table(header = T, text = "
SUBJ    TIME    VAR
                   1    TIME1   1
                   1    TIME2   1
                   1    TIME3   1
                   1    TIME4   1
                   1    TIME5   1
                   1    TIME6   1
                   1    TIME7   5
                   1    TIME8   1
                   1    TIME9   3
                   1    TIME10  1
                   2    TIME1   1
                   2    TIME2   1
                   2    TIME3   6
                   2    TIME4   1
                   2    TIME5   3
                   2    TIME6   2
                   2    TIME7   1
                   2    TIME8   1
                   3    TIME1   1
                   3    TIME2   1
                   3    TIME3   1
                   3    TIME4   4
                   3    TIME5   2
                   3    TIME6   1
                   3    TIME7   1
                   3    TIME8   8
                   4    TIME1   1
                   5    TIME1   1
                   5    TIME2   1
                   5    TIME3   2
                   5    TIME4   1
                   5    TIME5   4
                   5    TIME6   1
                   ")
FindFirst <- function(X){
  which(X != 1)[1]
}
library(dplyr)

data <- data %>% group_by(SUBJ) %>% 
  mutate(Index = FindFirst(VAR), ROW = row_number(), FIRST = Index == ROW)
head(data, 7)
#> # A tibble: 7 x 6
#> # Groups:   SUBJ [1]
#>    SUBJ TIME    VAR Index   ROW FIRST
#>   <int> <fct> <int> <int> <int> <lgl>
#> 1     1 TIME1     1     7     1 FALSE
#> 2     1 TIME2     1     7     2 FALSE
#> 3     1 TIME3     1     7     3 FALSE
#> 4     1 TIME4     1     7     4 FALSE
#> 5     1 TIME5     1     7     5 FALSE
#> 6     1 TIME6     1     7     6 FALSE
#> 7     1 TIME7     5     7     7 TRUE
data <- data %>% select(-Index, -ROW)

^{Created on 2020-05-04 by the reprex package (v0.3.0)}

DuyTran16 · May 5, 2020, 4:05am

My objective for finding the first occurrence where VAR does not equal to "1" within each SUBJ so I can filter the TIME prior to this event so that I can calculate final TIME from TIME1. I revised your code to be able to do this filtering:

data1 <- data %>% 
  group_by(SUBJ) %>% 
  mutate(Index = FindFirst(VAR), 
         ROW = row_number()) %>% 
  filter(ROW < Index)

Thank you!

system · May 12, 2020, 4:05am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.