How can I delete a subgroup in a group in dplyr


#1

I make group based on the Mac. Again, I make group based on the UNIX_Time.x under the group of Mac. I want to delete a group based on the UNIX_Time.x. if TISD>=254 for one element of the group of UNIX_Time.x , I need to delete the group of UNIX_Time.x. I want to keep other rows of the sameMac. I tried the following code but it deletes the group based on the Mac. could you help me, please?

t %>%
  group_by_(.dots = c("Mac", "UNIX_Time.x"))%>% 
  arrange(Mac, UNIX_Time.x, UNIX_Time.y, TT) %>%
  mutate(TISD = (UNIX_Time.y)-lag(UNIX_Time.y))%>% 
  filter_all(any_vars(is.na(TISD) | TISD<=254)) %>%
  ungroup()

Data:

Mac UNIX_Time.x UNIX_Time.y Location TISD
A 1492674854 1492682179 P1P2 200
A 1492674854 1492682203 P1P2 400
A 1492699609 1492717562 P1P2 100
A 1492699609 1492717758 P1P2 196

Output:

Mac UNIX_Time.x UNIX_Time.y Location TISD
A 1492699609 1492717562 P1P2 100
A 1492699609 1492717758 P1P2 196

#2

Hi @Shahin,

I’m a little confused about your goal here, perhaps because the data you’ve included all shares the same value for Mac.

If you could turn this into a reprex (short for reproducible example), that would help us help you.

If you’ve never heard of a reprex before, you might want to start by reading the tidyverse.org help page.


#3

It looks like a misuse of filter_all and any_vars (though I’m not very familiar with them, they seem to be designed to operate on multiple columns at once).

You probably need the simpler dplyr::filter() and one of the all() or any() base functions. Try:

dplyr::filter(!any(TISD >= 254 | is.na(TISD)))

(Also, have a look at the default argument of lag() and at the na.rm arg of any() if you want to handle NAs differently).


#4

@Aurele, thanks for helping! now, I need to delete rows from TISD>254 whether TISD is less than or equal or greater that 254 in the following rows upto the end of the group. I am adding an example where I group by Mac & UNIX_time.x. Each subgroup is based on the UNIX_time.x. Your help is highly appreciated.

data:
Mac  UNIX-Time.x  UNIX_Time.y  TISD
1    1492671281    1492671668     12
1    1492671281    1492671672     04
1    1492671281    1492771972    300
1    1492671281    1492671982     10
1    1492671281    1492671995     13
1    1492671288    1492671668     12
1    1492671288    1492671672     04
1    1492671288    1492771972    300
1    1492671288    1492671982     10
1    1492671288    1492671995     13

output:
Mac UNIX-Time.x  UNIX_Time.y TISD
1   **1492671281**   1492671668    12
1   **1492671281**   1492671672    10
1   **1492671288**   1492671668    12
1   **1492671288**   1492671672    04


#5

One of many ways to do this could be with:

filter(!cumsum(TISD > 254))

#6

@Aurele, Thanks lot! I think first i have to produce NA at TISD>254, then your code is working. It requires to remove NA. I did.