Fetching values from one column based on other columns in long-formatted dataset

Hi all,

I have a long format dataset of 100,000 + individuals, capturing clinic visits at 5 different time points (not chronological). I've included a toy dataset below that replicates the formatting of my data:

ID: participant ID
visit_number: order of the clinic visits in the original dataset
age_visit: age at the time of the visit
clinic_number: the identifier for the specific clinic location
age_sorted: For each ID, age sorted in ascending order across the 5 clinic visits
age_sorted_index: For each ID, the visit number corresponding to the sorted age

I would like to create a new column (clinic_number_extracted) that fetches the clinic identifier (clinic_number) corresponding to each sorted age (age_sorted value) for each participant. I was thinking that it might be possible to use the age_sorted_index and visit_number variables to do so (generating key-value pairs?), but am not quite sure how.

I've looked here and stack exchange for clues, but haven't been able to find exactly what I'm looking for (likely not using the correct search terms). Any feedback or advice on how to proceed would be greatly appreciated - thanks in advance! Please let me know if anything is unclear, I would be happy to clarify.

# set seed for reproducibility
set.seed(42)

# Initial dataset format
das <- data.frame(id = rep(letters[1:3], each = 5), 
                  visit_number = rep(1:5, times = 3),
                  age_visit = c(50, rep(NA_real_, times = 7), 34, 40, 72, rep(NA_real_, times = 3), 87), 
                  clinic_number = sample(30:50, 15, replace=TRUE), 
                  age_sorted = c(50, rep(NA_real_, times = 4), 34, 40,rep(NA_real_, times = 3), 72, 87, rep(NA_real_, times = 3)), 
                  age_sorted_index = c(rep(1:5), 4, 5, rep(1:3), 1, 5, 2, 3, 4)) 

# Print out dataset
das
#>    id visit_number age_visit clinic_number age_sorted age_sorted_index
#> 1   a            1        50            46         50                1
#> 2   a            2        NA            34         NA                2
#> 3   a            3        NA            30         NA                3
#> 4   a            4        NA            39         NA                4
#> 5   a            5        NA            33         NA                5
#> 6   b            1        NA            47         34                4
#> 7   b            2        NA            46         40                5
#> 8   b            3        NA            44         NA                1
#> 9   b            4        34            36         NA                2
#> 10  b            5        40            33         NA                3
#> 11  c            1        72            34         72                1
#> 12  c            2        NA            43         87                5
#> 13  c            3        NA            49         NA                2
#> 14  c            4        NA            47         NA                3
#> 15  c            5        87            44         NA                4

# End goal 
das_final <- cbind(das, 
                  clinic_number_extracted = c(46, rep(NA_real_, times = 4), 36, 33, rep(NA_real_, times = 3), 34, 44, rep(NA_real_, times = 3)))

# Print out final dataset
das_final 
#>    id visit_number age_visit clinic_number age_sorted age_sorted_index
#> 1   a            1        50            46         50                1
#> 2   a            2        NA            34         NA                2
#> 3   a            3        NA            30         NA                3
#> 4   a            4        NA            39         NA                4
#> 5   a            5        NA            33         NA                5
#> 6   b            1        NA            47         34                4
#> 7   b            2        NA            46         40                5
#> 8   b            3        NA            44         NA                1
#> 9   b            4        34            36         NA                2
#> 10  b            5        40            33         NA                3
#> 11  c            1        72            34         72                1
#> 12  c            2        NA            43         87                5
#> 13  c            3        NA            49         NA                2
#> 14  c            4        NA            47         NA                3
#> 15  c            5        87            44         NA                4
#>    clinic_number_extracted
#> 1                       46
#> 2                       NA
#> 3                       NA
#> 4                       NA
#> 5                       NA
#> 6                       36
#> 7                       33
#> 8                       NA
#> 9                       NA
#> 10                      NA
#> 11                      34
#> 12                      44
#> 13                      NA
#> 14                      NA
#> 15                      NA

Created on 2022-05-05 by the reprex package (v2.0.1)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.