Deleting less frequent rows in the df

Hi,
I have this df:

source <- data.frame(
  stringsAsFactors = FALSE,
                                   Resp = c(1,2,3,4,5,6,7,8,9,
                                            10,11,12,13),
                         Neareststation = c("Abbey Road",
                                            "Abbey Wood","Abbey Wood","Abbey Wood",
                                            "Abbey Wood","Aber",
                                            "Abercynon","Aberdare","Aberdare",
                                            "Aberdare","Aberdare","Aberdare",
                                            "Aberdare"),
                           Postcodearea = c("E","DA","SE","DA",
                                            "SE","XY","CF","CF","CG",
                                            "CF","SA","CF","SA")
                     )

Where respondents specify their Postcodearea. Some of them are incorrect.
Is it possible to remove less frequent responses and get something like this?

result <- data.frame(
  stringsAsFactors = FALSE,
    Neareststation = c("Abbey Road","Abbey Wood",
                       "Aber","Abercynon","Aberdare"),
      Postcodearea = c("E", "DA", "XY", "CF", "CF")
)

The rule is: We are keeping the most frequent response. If there is draw (Abbey Wood has two DAs and two SEs), we select any of the two.

Is it easy to do?

You just have to count and arrange

library(dplyr)

source <- data.frame(
    stringsAsFactors = FALSE,
    Resp = c(1,2,3,4,5,6,7,8,9,
             10,11,12,13),
    Neareststation = c("Abbey Road",
                       "Abbey Wood","Abbey Wood","Abbey Wood",
                       "Abbey Wood","Aber",
                       "Abercynon","Aberdare","Aberdare",
                       "Aberdare","Aberdare","Aberdare",
                       "Aberdare"),
    Postcodearea = c("E","DA","SE","DA",
                     "SE","XY","CF","CF","CG",
                     "CF","SA","CF","SA")
)

source %>% 
    count(Neareststation, Postcodearea) %>% 
    group_by(Neareststation) %>% 
    arrange(Neareststation, desc(n)) %>%
    summarise(Postcodearea = first(Postcodearea))
#> # A tibble: 5 × 2
#>   Neareststation Postcodearea
#>   <chr>          <chr>       
#> 1 Abbey Road     E           
#> 2 Abbey Wood     DA          
#> 3 Aber           XY          
#> 4 Abercynon      CF          
#> 5 Aberdare       CF

Created on 2021-09-16 by the reprex package (v2.0.1)

1 Like

Of course. Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.