How to subset all rows from a data frame for longitudinal study

I have a dataset consisting of repeated measures(4 waves), one data point per row. These data are from a longitudinal study and therefore at the moment, not every subject has all data points.

the data looks like

<subject_id waves Value>
1 wave1 23
1 Wave2 13
1 Wave3 12
2 Wave1 25
2 Wave2 31
2 Wave3 34
2 Wave4 25
3 Wave1 18
3 Wave2 34
4 Wave1 30
5 Wave1 19
5 Wave2 28
5 Wave3 25
5 Wave4 31
6 Wave1 50
6 Wave2 35
7 Wave1 33
7 Wave2 35
7 Wave3 41
7 Wave4 25

So,
subject 1 has three repeated measures
subject 2,5, & 7 have four repeated measures
subject 3 has two repeated measures
subject 4 has one measure.

I would like to be able to extract all data points involved in all repeated measures or those with two repeated measures.
The background of my question was to create a new data frame for those who have measured in all waves removing subject 1, 3 and 4.
Any thoughts, please?

I find dplyr::add_count() very useful for this sort of thing.

library(dplyr, warn.conflicts = FALSE)

data <- tribble(~ subject_id, ~ waves, ~ Value,
                1, "wave1", 23,
                1, "Wave2", 13,
                1, "Wave3", 12,
                2, "Wave1", 25,
                2, "Wave2", 31,
                2, "Wave3", 34,
                2, "Wave4", 25,
                3, "Wave1", 18,
                3, "Wave2", 34,
                4, "Wave1", 30,
                5, "Wave1", 19,
                5, "Wave2", 28,
                5, "Wave3", 25,
                5, "Wave4", 31,
                6, "Wave1", 50,
                6, "Wave2", 35,
                7, "Wave1", 33,
                7, "Wave2", 35,
                7, "Wave3", 41,
                7, "Wave4", 25)


# Data points with all repeated measures.
data %>% 
  add_count(subject_id) %>% 
  filter(n == 4)
#> # A tibble: 12 x 4
#>    subject_id waves Value     n
#>         <dbl> <chr> <dbl> <int>
#>  1          2 Wave1    25     4
#>  2          2 Wave2    31     4
#>  3          2 Wave3    34     4
#>  4          2 Wave4    25     4
#>  5          5 Wave1    19     4
#>  6          5 Wave2    28     4
#>  7          5 Wave3    25     4
#>  8          5 Wave4    31     4
#>  9          7 Wave1    33     4
#> 10          7 Wave2    35     4
#> 11          7 Wave3    41     4
#> 12          7 Wave4    25     4

# Data points with only 2 repeated measures.
data %>% 
  add_count(subject_id) %>% 
  filter(n == 2)
#> # A tibble: 4 x 4
#>   subject_id waves Value     n
#>        <dbl> <chr> <dbl> <int>
#> 1          3 Wave1    18     2
#> 2          3 Wave2    34     2
#> 3          6 Wave1    50     2
#> 4          6 Wave2    35     2

Created on 2020-06-07 by the reprex package (v0.3.0)

1 Like

Thank you so much for your suggestions. Still, it doesn't work. It could be the value col has many NA's. The dim of the data is 448 x 123. I was trying to remove NA's using a code
Do you have suggestions to create a data frame removing the NA's?

You could use dplyr::filter() to remove the NA values before count(). Maybe something like filter(!is.na(Value)).

Thank you so much. Now it works.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.