Replace values of a variable by the most frequent value


#1
ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)
  
df <- tibble(ID, Value)

I have to group by the ID, keep the value of Value which is the most repeated
If the values are equal (ID == "B"), I choose the first value

The value variable should look like :

Value_output <- c("blue", "blue", "blue", "red", "red", NA)

Thanks for help


#2

It is surely not the shortest solution but it is one approach that works: Create a table with the value you want for each ID then join to the first table

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)

library(tidyverse)
df <- tibble(ID, Value)
# create a replacement table
replacement <- df %>%
  # create ordered factor as we want first after
  mutate(Value = fct_inorder(Value)) %>%
  # group_by all columns
  group_by_all() %>%
  # count the occurrence in last column and sort
  tally(sort = TRUE) %>%
  # keep only the first row in each remaining group
  # (the max because sorted)
  slice(1) %>%
  ungroup() %>%
  select(ID, Value_output = Value)
#> Warning: le package 'bindrcpp' a été compilé avec la version R 3.4.4

df %>% 
  inner_join(replacement)
#> Joining, by = "ID"
#> # A tibble: 6 x 3
#>   ID    Value  Value_output
#>   <chr> <chr>  <fct>       
#> 1 A     blue   blue        
#> 2 A     blue   blue        
#> 3 A     green  blue        
#> 4 B     red    red         
#> 5 B     orange red         
#> 6 c     <NA>   <NA>

Created on 2018-07-17 by the reprex package (v0.2.0).