How do I subset rows of data based off of one variable?

Hi,

I have a data frame (df) and the columns are labelled with the following variables:

Image, id, sex, alt, brood

I want to create a subset that omitts certain image ids, these ids are "19N0004" "19N0010" etc. I have around 30 I want to remove.

Is there a way I can filter or remove these observations using these id numbers? I tried this but it didn't work:
subset.df <- df[df$id %in% -c( "19N0040", "19N0081", "19N0083", "19N0099", "19N0109", "19N0150", "19N0160",...

Recommended to use dplyr / tidyverse package filter function

subset.df <- df[!df$id %in% c( "19N0040", "19N0081", "19N0083", "19N0099", "19N0109", "19N0150", "19N0160",...

Hi I tried this and I got this error message:

Error in [.data.frame(df, !df$id %in% c("19N0040", "19N0081", "19N0083", :
undefined columns selected

Hi, I thought about using filter but I thought filter would only take the information you wanted to keep, rather than what you wanted to omit. Can I use it to create a subset without the questionable data?

yes.
logical statements can be negated
requiring tokeep x > 3 is the same as throwing away x<=3
filter as a dplyr verb describes what to keep. if you have an idea of what to throw away, just negate that.
the ! makes it convenient as ! x> 3 is x<=3

1 Like

You need a comma at the end of your c() vector:

subset.df <- df[!df$id %in% c(...), ]
1 Like

Thank you that's worked a treat!

I didn't know that, thank you so much!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.