filter question

hi this may be a basic filter question, but my goal is to essentially remove rows that meet BOTH of the following two conditions:

  1. animal is not equal to cat and 2) there cannot be an NA in the "animal_numbers" column
dataframe %>%
filter(animal != "cat & !is.na(animal_numbers))

unfortunately, with that statement rows where animal != cat OR rows where !is.na(animal_numbers) are being removed. I want to only remove rows if both meet this condition. does this make sense? thank you!!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
mtcars %>% filter(cyl != 8 & !is.na(drat))
#>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Created on 2022-09-23 by the reprex package (v2.0.1)

1 Like

I am looking for ways to keep "cat" in a row if the "animal_numbers" has a value and not NA though

thank you

See the FAQ: How to do a minimal reproducible example reprex for beginners.

1 Like

thank u sir.

structure(list(animal = c("dog", "dog", "mouse", "cat", "dog",
"cat"), animal_numbers = c(4, 3, 21, 32, NA, 21)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))

in that example I copy and paste, only row 5 and 9 should be removed, since it meets both condition. animal is not "cat". and animal_numbers is NA

There is no row 9, but it appears you want to remove all animal_number rows with NA except for cats.

DF <- structure(list(animal = c(
  "dog", "dog", "mouse", "cat", "dog",
  "cat"
), animal_numbers = c(4, 3, 21, 32, NA, 21)), row.names = c(
  NA,
  -6L
), class = c("tbl_df", "tbl", "data.frame"))

# introduce a case in which cat is NA
DF[6,2] <- NA

DF[-which(DF[1] != "cat" & is.na(DF[2])),]
#> # A tibble: 5 × 2
#>   animal animal_numbers
#>   <chr>           <dbl>
#> 1 dog                 4
#> 2 dog                 3
#> 3 mouse              21
#> 4 cat                32
#> 5 cat                NA

The logic is easier in {base} in this case.

1 Like

Keep rows where animal_numbers is not NA or animal is "cat"? As usual, cats get a free pass.

library(tidyverse)

DF <- structure(list(animal = c(
  "dog", "dog", "mouse", "cat", "dog",
  "cat"
), animal_numbers = c(4, 3, 21, 32, NA, 21)), row.names = c(
  NA,
  -6L
), class = c("tbl_df", "tbl", "data.frame"))

# introduce a case in which cat is NA
DF[6,2] <- NA

DF |> filter(animal == "cat" | !is.na(animal_numbers))
#> # A tibble: 5 × 2
#>   animal animal_numbers
#>   <chr>           <dbl>
#> 1 dog                 4
#> 2 dog                 3
#> 3 mouse              21
#> 4 cat                32
#> 5 cat                NA

Created on 2022-09-23 with reprex v2.0.2

1 Like

I want to remove any instance where animal is "cat" and animal_numbers" is " NA" for a given row

I just want the filter to include both conditions must be a thing for a given row to be removed

DF |> filter(animal == "cat" | !is.na(animal_numbers))
#> # A tibble: 5 × 2

#>   animal animal_numbers
#>   <chr>           <dbl>
#> 1 dog                 4
#> 2 dog                 3
#> 3 mouse              21
#> 4 cat                32
#> 5 cat                NA
#> 6 dog                NA
#> 7 dog                NA

in the example above.. I only want row 5 gone. sorry I have not been clear.
I want a row with "cat' in animals gone if it also has "NA" in animal numbers.
what I want:

#>   animal animal_numbers
#>   <chr>           <dbl>
#> 1 dog                 4
#> 2 dog                 3
#> 3 mouse              21
#> 4 cat                32
#> 5 dog                NA
#> 6 dog                NA

thank you all so much!!

Is this what you want?

library(tidyverse)

DF <- structure(list(animal = c(
  "dog", "dog", "mouse", "cat", "dog",
  "cat"
), animal_numbers = c(4, 3, 21, 32, NA, 21)), row.names = c(
  NA,
  -6L
), class = c("tbl_df", "tbl", "data.frame"))

# introduce a case in which cat is NA
DF[6,2] <- NA

DF
#> # A tibble: 6 × 2
#>   animal animal_numbers
#>   <chr>           <dbl>
#> 1 dog                 4
#> 2 dog                 3
#> 3 mouse              21
#> 4 cat                32
#> 5 dog                NA
#> 6 cat                NA

DF |> filter(animal != "cat" | !is.na(animal_numbers))
#> # A tibble: 5 × 2
#>   animal animal_numbers
#>   <chr>           <dbl>
#> 1 dog                 4
#> 2 dog                 3
#> 3 mouse              21
#> 4 cat                32
#> 5 dog                NA

Created on 2022-09-23 with reprex v2.0.2

1 Like

it is. I am so dumb. I made that way more complicated./ thank you.

1 Like

The logic of "and" and "or" when filtering is not easy. Using "or" instead of "and" it takes just one TRUE to be kept. All animals other than cat have at least one TRUE and will be kept. Cats will get a FALSE for the first part and those with an NA for the second get another FALSE and will be dropped.

See section 5.2.2 of R for Data Science for logical operators and filtering:

https://r4ds.had.co.nz/

2 Likes

Of course, in hindsight the clearest solution would have been to just drop any row with animal = cat AND animal_numbers that is NA. From R for Data Science, De Morgan's Law is !(x & y) is the same as (!x | !y). A useful thing to remember.

library(tidyverse)

DF <- structure(list(animal = c(
  "dog", "dog", "mouse", "cat", "dog",
  "cat"
), animal_numbers = c(4, 3, 21, 32, NA, 21)), row.names = c(
  NA,
  -6L
), class = c("tbl_df", "tbl", "data.frame"))

# introduce a case in which cat is NA
DF[6,2] <- NA

DF |> filter(!(animal == "cat" & is.na(animal_numbers)))
#> # A tibble: 5 × 2
#>   animal animal_numbers
#>   <chr>           <dbl>
#> 1 dog                 4
#> 2 dog                 3
#> 3 mouse              21
#> 4 cat                32
#> 5 dog                NA

Created on 2022-09-23 with reprex v2.0.2

1 Like