Iterate string search and find related data

  id                    re_enter_id             date                  score
101                        NA                  01.1.2021               60
102                       101                  12.1.2021                65
103                       102                  17.1.2021                70
104                       103                  18.1.2021                68
105                        NA                  25.1.2021                81
106                        NA                  03.1.2021                55
107                        NA                  10.1.2021                67
108                       107                  11.1.2021                69
109                        NA                  09.1.2021                85

About data:
The data represents the students who entered to a examination multiple times. It consists of three columns student id, student id re-entered examination or not, date of examination and score to the exam.

If we consider student with id 101, he entered in exam on 1.1.2021, again entered to the exam 12.1.2021 but with student id 102 and so on. So first 4 rows of the data is related to student 101.
Similarly for student 107, who entered to the exam on 10.1.2021, re-entered on 11.1.2021. So student 107 and 108 are same.
Other students like 105,106 entered to the exam only once.

Topic:
The requirement is I need to search for similar ids and extract all related data for a student.
So I was trying to create a function which takes id as parameter, f(id). Hence f(102) will return all data related to 101 i.e. first 4 rows and hence f(101) = f(102) = f(103) = f(104).
Similarly f(107) = f(108).

How can I create this function with sequential string search and find out the data related to a student?
Thanks in advance.

my solution:

library(tidyverse)

example_df <- tibble::tribble(
  ~id, ~re_enter_id,       ~date, ~score,
  101L,           NA, "01.1.2021",    60L,
  102L,         101L, "12.1.2021",    65L,
  103L,         102L, "17.1.2021",    70L,
  104L,         103L, "18.1.2021",    68L,
  105L,           NA, "25.1.2021",    81L,
  106L,           NA, "03.1.2021",    55L,
  107L,           NA, "10.1.2021",    67L,
  108L,         107L, "11.1.2021",    69L,
  109L,           NA, "09.1.2021",    85L
)

#make a simpler table to work with
edf <- example_df %>% select(id,re_id=re_enter_id)

get_related_ids <- function(x){
no_change <- TRUE

(last_scan <- filter(edf,id==x| re_id==x) %>% unlist %>% unique()%>%na.omit)
if(length(last_scan)==0)
  stop("invalid id given")
while (no_change) {

  new_scan <- filter(edf,id %in% last_scan | re_id %in% last_scan)%>% unlist %>% unique() %>% na.omit() %>% as.integer()

  no_change <- !identical(new_scan,last_scan)
  last_scan <- new_scan
}
return(last_scan)
}

#testing on all ids
map(101:109, ~get_related_ids(.)) %>% set_names(101:109)
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.