Different results for filter with two conditions, using and.

Hello, I've been trying to understand my problem with filter and why it gives different outcomes. I tried to find the answer but without a luck, hopefully it's not a total novice mistake.

The dataset is here: FitBit Fitness Tracker Data | Kaggle

I work on file: dailyActivity_merged.csv

library(dplyr)
dailyActivity_mergedTEST %>%
filter(TotalSteps >= 50 & SedentaryMinutes >= 400) %>%
View()

This code doesn't consider both conditions and just filters out total steps that are equal or less than 50 AND separately filters our Sedentary minutes that are equal or less than 400.
It returns 844 rows.

library(dplyr)
dailyActivity_merged %>%
filter(!(TotalSteps <= 50 & SedentaryMinutes <= 400)) %>%
View()

This code considers both options.
It returns 936 rows.

I've checked manually and I know that the one with 936 rows works as intended. I work on rstudio cloud. Both columns are numeric.

Question is, why this way? What am I missing in my understanding here?

Edit: highlighted code

In the first case, each row has to meet both conditions.
filter(TotalSteps >= 50 & SedentaryMinutes >= 400)

In the second case, each row has to meet only one condition (hence, more rows returned).
filter(!(TotalSteps <= 50 & SedentaryMinutes <= 400))
This statement is equivalent to:
filter(!TotalSteps <= 50 | !SedentaryMinutes <= 400)

1 Like

Hey, thank you for your answer. I tried the code that you've written , and it sounds good, however the outcome was confusing.

In the first case, each row has to meet both conditions.
filter(TotalSteps >= 50 & SedentaryMinutes >= 400)

I reversed my command to check what is actually deleted by using :

filter(!(TotalSteps >= 50 & SedentaryMinutes >= 400))


It gave me 96 entries that are contradicting the assumption. To my understanding the 844 rows that I get through this command

filter((TotalSteps >= 50 & SedentaryMinutes >= 400))

are removing the 96 entries that are contradicting my understanding because some of the elements seem to be understood as OR, even though I use AND.
I am attaching jpeg with my outcomes. Take a look at first 10 that are standing out. I understand that AND in logical table returns true if p and q are true. In here first 10 entries are not fullfilling that condition, yet they are removed in this command :

filter(TotalSteps >= 50 & SedentaryMinutes >= 400)

Is my understanding flawed?

Edit: corrected image

The first 10 in the image would be removed using filter(TotalSteps >= 50 & SedentaryMinutes >= 400) because while TotalSteps is greater than 50 for each, SedentaryMinutes is less than 400 for each. Thus, if you are looking for only those records that exceed both thresholds, these records do not satisfy both conditions.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.