Counting "Grouped Sequences" of Events

omario · January 21, 2023, 7:11pm

Suppose there are two students - each student takes an exam multiple times (e.g.result_id = 1 is the first exam, result_id = 2 is the second exam, etc.). The student can either "pass" (1) or "fail" (0).

The data looks something like this:

library(data.table)

my_data = data.frame(id = c(1,1,1,1,1,1,2,2,2,2,2,2,2,2,2), results = c(0,1,0,1,0,0,1,1,1,0,1,1,0,1,0), result_id = c(1,2,3,4,5,6,1,2,3,4,5,6,7,8,9))

my_data = setDT(my_data)

  id results result_id
 1:  1       0         1
 2:  1       1         2
 3:  1       0         3
 4:  1       1         4
 5:  1       0         5
 6:  1       0         6
 7:  2       1         1
 8:  2       1         2
 9:  2       1         3
10:  2       0         4
11:  2       1         5
12:  2       1         6
13:  2       0         7
14:  2       1         8
15:  2       0         9

I am interested in counting the number of times that a student passes an exam, given that the student passed the previous two exams.

I tried to do this with the following code:

my_data$current_exam = shift(my_data$results, 0)
my_data$prev_exam = shift(my_data$results, 1)
my_data$prev_2_exam = shift(my_data$results, 2)

# Count the number of exam results for each record
out <- my_data[!is.na(prev_exam), .(tally = .N), by = .(id, current_exam, prev_exam, prev_2_exam)]

out = na.omit(out)

My code produces the following results:

> out
   id current_exam prev_exam prev_2_exam tally
1:  1            0         1           0     2
2:  1            1         0           1     1
3:  1            0         0           1     1
4:  2            1         0           0     1
5:  2            1         1           0     2
6:  2            1         1           1     1
7:  2            0         1           1     2
8:  2            1         0           1     2
9:  2            0         1           0     1

However, I do not think that my code is correct.

For example, with Student_ID = 2 :

My code says that "Current_Exam = 1, Prev_Exam = 1, Prev_2_Exam = 0" happens 1 time, but looking at the actual data - this does not happen at all

Can someone please show me what I am doing wrong and how I can correct this?

scottyd22 · January 21, 2023, 10:06pm

When shifting the results, the shifting should occur grouped by each student id. This update has been added below.

library(data.table)

my_data = data.frame(id = c(1,1,1,1,1,1,2,2,2,2,2,2,2,2,2), 
                     results = c(0,1,0,1,0,0,1,1,1,0,1,1,0,1,0), 
                     result_id = c(1,2,3,4,5,6,1,2,3,4,5,6,7,8,9))

my_data = setDT(my_data)

# shift results by id
my_data[, current_exam := shift(results, 0), by = id]
my_data[, prev_exam := shift(results, 1), by = id]
my_data[, prev_2_exam := shift(results, 2), by = id]

out <- my_data[!is.na(prev_exam), .(tally = .N), by = .(id, current_exam, prev_exam, prev_2_exam)]

out = na.omit(out)

out
#>    id current_exam prev_exam prev_2_exam tally
#> 1:  1            0         1           0     2
#> 2:  1            1         0           1     1
#> 3:  1            0         0           1     1
#> 4:  2            1         1           1     1
#> 5:  2            0         1           1     2
#> 6:  2            1         0           1     2
#> 7:  2            1         1           0     1
#> 8:  2            0         1           0     1

Looking at the original data, this appears to happen for student id 2 in lines 10 (previous 2), 11 (previous), and 12 (current)

my_data
#>     id results result_id
#>  1:  1       0         1
#>  2:  1       1         2
#>  3:  1       0         3
#>  4:  1       1         4
#>  5:  1       0         5
#>  6:  1       0         6
#>  7:  2       1         1
#>  8:  2       1         2
#>  9:  2       1         3
#> 10:  2       0         4
#> 11:  2       1         5
#> 12:  2       1         6
#> 13:  2       0         7
#> 14:  2       1         8
#> 15:  2       0         9

Created on 2023-01-21 with reprex v2.0.2.9000

system · January 28, 2023, 10:06pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.