Dplyr: Filtering out errorneous entries

Suppose you have a tibble t, with factor variable Fac with 2levels and character variable Var with 6 different string values. I want to filter unique options for each factor (A, B) such that only the value that is in majority (in this case "foo" for A & "bar" for B) for each factor remains. Think of the other values as errors. You can't filter them out simply by name(my original dataset has way too many options for it to be considered a clean solution). At the end, you should be able to group_by & summarize by Fac and get unique Var for each Fac.

t = tibble(Fac = as.factor(c(rep("A", 6), rep("B", 6))), Var = c(rep("foo", 4), "fii", "fa", "bor", rep("bar", 4), "bat"))

I found out, that I am able to come up only with "dirty" solutions (such as creating intermediate summaries etc.) but clean solution eludes me.

What would be your tidyverse solution?

Does any of these two pass the "tidy" check?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dataset <- tibble(Fac = as.factor(x = rep(x = c("A", "B"), each = 6)),
                  Var = c(rep(x = "foo", times = 4), "fii", "fa", "bor", rep(x = "bar", times = 4), "bat"))

dataset %>%
    count(Fac, Var) %>%
    group_by(Fac) %>%
    filter(n == max(n)) %>%
    select(-n) %>%
    ungroup()
#> # A tibble: 2 x 2
#>   Fac   Var  
#>   <fct> <chr>
#> 1 A     foo  
#> 2 B     bar

# https://stackoverflow.com/a/8189441
Mode <- function(observations)
{
    unique_observations <- unique(x = observations)
    unique_observations[which.max(x = tabulate(match(x = observations,
                                                     table = unique_observations)))]
}

dataset %>%
    group_by(Fac) %>%
    summarise(Var = Mode(Var))
#> # A tibble: 2 x 2
#>   Fac   Var  
#>   <fct> <chr>
#> 1 A     foo  
#> 2 B     bar

Created on 2019-09-30 by the reprex package (v0.3.0)

3 Likes

Like them both. Thank you very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.