Return most frequent string for each group

Context: For each patient, return the most frequent Concessional_Status. If the frequency value is equal for a patient, use the first occurrence of Concessional_Status

I am unable to return the first occurrence of Concessional_Status if the frequency value is equal for a patient. I have tried @cderv solution which has a similar problem but I couldn't get the correct first occurrence : Replace values of a variable by the most frequent value

sample data:

sample <- tibble::tribble(
  ~Patient_ID, ~Concessional_Status,
           1L,                  "G",
           1L,                  "G",
           1L,                  "B",
           2L,                  "G",
           2L,                  "B",
           3L,                  "O",
           3L,                  "B",
           3L,                  "G",
           4L,                  "X",
           4L,                  "B"
  )

My solution: Which is not correct

sample %>% 
  mutate(Concessional_Status = fct_inorder(Concessional_Status)) %>% 
  group_by_all() %>% 
  tally(sort = T) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(-n, Majority = Concessional_Status) %>% 
  left_join(sample, by=c("Patient_ID"="Patient_ID")) %>% 
  select(Patient_ID, Concessional_Status, everything())

Expected solution:

Expected_solution <- tibble::tribble(
                     ~Patient_ID, ~Concessional_Status, ~Majority,
                              1L,                  "G",       "G",
                              1L,                  "G",       "G",
                              1L,                  "B",       "G",
                              2L,                  "G",       "G",
                              2L,                  "B",       "G",
                              3L,                  "O",       "O",
                              3L,                  "B",       "O",
                              3L,                  "G",       "O",
                              4L,                  "X",       "X",
                              4L,                  "B",       "G"
                     )

Any help would be greatly appreciated. Thanks

2 Likes

Did you try my own answer (the last one in the same tipic), should work in your case

Is there a tidyverse way to get the solution without writing up a function?

I think you can do this in this way

sample <- tibble::tribble(
  ~Patient_ID, ~Concessional_Status,
  1L,                  "G",
  1L,                  "G",
  1L,                  "B",
  2L,                  "G",
  2L,                  "B",
  3L,                  "O",
  3L,                  "B",
  3L,                  "G",
  4L,                  "X",
  4L,                  "B"
)

library(tidyverse)
sample %>%
  # add a column n with count by categories
  add_count(Patient_ID, Concessional_Status) %>%
  # select max or first occurrence by patient
  group_by(Patient_ID) %>%
  # keep only first TRUE
  mutate(Majority = Concessional_Status[n == max(n)][1]) %>%
  # do not keep temp var
  select(-n)
#> # A tibble: 10 x 3
#> # Groups:   Patient_ID [4]
#>    Patient_ID Concessional_Status Majority
#>         <int> <chr>               <chr>   
#>  1          1 G                   G       
#>  2          1 G                   G       
#>  3          1 B                   G       
#>  4          2 G                   G       
#>  5          2 B                   G       
#>  6          3 O                   O       
#>  7          3 B                   O       
#>  8          3 G                   O       
#>  9          4 X                   X       
#> 10          4 B                   X

Created on 2018-12-27 by the reprex package (v0.2.1)

Is this what you seek ?

2 Likes

Here is another way to do it (it's longer).

library(tidyverse)

foo_tbl <- tibble::tribble(
  ~Patient_ID, ~Concessional_Status,
  1L,                  "G",
  1L,                  "G",
  1L,                  "B",
  2L,                  "G",
  2L,                  "B",
  3L,                  "O",
  3L,                  "B",
  3L,                  "G",
  4L,                  "X",
  4L,                  "B"
)

foo_order <- foo_tbl %>% 
  mutate(id_row = row_number()) %>%
  group_by(Patient_ID, Concessional_Status) %>%
  add_count() %>% 
  group_by(Patient_ID)

foo_majority <- foo_order %>%
  filter(n == max(n) & id_row == min(id_row)) %>% # highest number AND appear first
  rename(Majority = Concessional_Status)

foo_tbl %>% 
  left_join(foo_majority) %>%
  select(Patient_ID, Concessional_Status, Majority)
#> Joining, by = "Patient_ID"
#> # A tibble: 10 x 3
#>    Patient_ID Concessional_Status Majority
#>         <int> <chr>               <chr>   
#>  1          1 G                   G       
#>  2          1 G                   G       
#>  3          1 B                   G       
#>  4          2 G                   G       
#>  5          2 B                   G       
#>  6          3 O                   O       
#>  7          3 B                   O       
#>  8          3 G                   O       
#>  9          4 X                   X       
#> 10          4 B                   X

Created on 2018-12-27 by the reprex package (v0.2.1)

1 Like

Great! this was exactly what I was looking for - simple, concise and without writing up a function. Thank you

You should switch solutions to guide future peeps with the same question to the better answer

What do you mean by switch solution ? Sorry, I didn't get it.

Hey mate, but I can't see your solution in this page. Are we talking about different things here - I am a bit lost? I have only marked cderv's answer as solution.

Probably me who is lost. I trashed my suggestion. Sorry for the trouble.

All good :slight_smile: But I appreciate your intention for helping me.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.