Combining rows that do not match filter

Apologies in advance for no reprex - apparently my clipboard is not available

Pretty sure I saw there was a way of combining rows after mutate/summarize
so that in the example below I could filter on pc >20 and end up with rows A, D and 'other' with a value of 23

reprex1

Any help appreciated

Try the github version... the one on CRAN has some issues.

devtools::install_github("tidyverse/reprex")

if this gives you an error saying that devtools is not available then use

install.packages("devtools")

I don't understand what you expect for output. what should have a value of 23 and what is 'other"?

Prose is a poor way to describe output, you should use R to build the output you expect as you did to show the input.

This is how to filter for pc >20

suppressPackageStartupMessages(library(dplyr))

x <- data.frame(a = c("A", "B", "C", "D"), b = c(8, 3, 2, 9))

x %>% mutate(pc = round(100*b/sum(b))) %>%
        filter(pc > 20)
#>   a b pc
#> 1 A 8 36
#> 2 D 9 41

Created on 2018-03-26 by the reprex package (v0.2.0).

This would be the output, I would want to see.

y <-data.frame(a = c("A", "D", "Others"), b = c(8, 9, 5),pc=c(36, 41, 23))

dput(y)
structure(list(a = structure(1:3, .Label = c("A", "D", "Others"
), class = "factor"), b = c(8, 9, 5), pc = c(36, 41, 23)), .Names = c("a",
"b", "pc"), row.names = c(NA, -3L), class = "data.frame")

print(y)
a b pc
1 A 8 36
2 D 9 41
3 Others 5 23

Would have some parameter in filter() e.g Combine = TRUE
I'm pretty sure I am not imagining it but cannot identify in documentation

Thanks for heads up!

I think this is what you are after:

library(dplyr)

x <- tibble(a = c("A", "B", "C", "D"), b = c(8, 3, 2, 9))

x %>% 
  mutate(pc = round(100*b/sum(b)), 
         a = ifelse(pc > 20, a, "Other")) %>% 
  group_by(a) %>% 
  summarise(pc = sum(pc))

2 Likes

It's a little tricky because column a of your data.frame is a factor

suppressPackageStartupMessages(library(dplyr))
x <- data.frame(a = c("A", "B", "C", "D"), b = c(8, 3, 2, 9))

# because column a is a factor "others" needs to be
# added so that value can be used in column a
levels(x$a) <- c(levels(x$a), "others")

x %>%
    # do computation
    mutate( pc = round(100*b/sum(b))) %>%
    # add "others" row
    rbind(data.frame(a = "others", 
                                     b = sum(filter(., pc <= 20)$b),
                                     pc = sum(filter(., pc <= 20)$pc))) %>%
    # keep others row regardless of pc value
    filter(pc >20 | a == "others")
#>        a b pc
#> 1      A 8 36
#> 2      D 9 41
#> 3 others 5 23

Created on 2018-03-26 by the reprex package (v0.2.0).

Another way to do this would be to group by the a variable and use mutate and case_when to create the labels, as following:

suppressPackageStartupMessages(library(dplyr))

x <- data.frame(a = c("A", "B", "C", "D"), b = c(8, 3, 2, 9))

x %>% 
  mutate(pc = round(100 * b / sum(b))) %>% # Compute %
  group_by(a) %>% 
  mutate(type = case_when( # Create variable where
    pc > 20 ~ as.character(`a`), # equal to `a` if `pc`> 20
    TRUE ~ "Other") # otherwise as `Other`
    ) %>% 
  group_by(type) %>% # Group by new variable
  summarise_if(is.numeric, sum) # Compute values
#> # A tibble: 3 x 3
#>   type      b    pc
#>   <chr> <dbl> <dbl>
#> 1 A      8.00  36.0
#> 2 D      9.00  41.0
#> 3 Other  5.00  23.0

Created on 2018-03-26 by the reprex package (v0.2.0).

1 Like

Thanks for all the suggestions, which have solved my problem

Obviously , for this example, it would have been more sensible to have started with a tibble but you have catered for an issue that might arise

1 Like

Wondering if when you originally thought you’d seen a way of doing this before, you were perhaps remembering forcats::fct_lump or forcats::fct_other? They don’t quite apply to this case, but fct_other could be useful in the above solution if you have a lot more categories to lump in your real data.

1 Like

That absolutely was what I meant. :grinning:
I will use that for real-life examples with more categories
I currently only use fct_reorder() for charting purposes but there is a lot more to the forcats package

There’s nothing worse than that Twilight Zone feeling when you swear you remember reading something in the docs and now (cue creepy music) it has vanished! :fearful: