How to filter a dataframe based on a list of values from one column

Hi everyone, I am new to RStudio. I am working with a dataframe that consists of 5 columns: SampleID; chr; pos; ref; mut. These are variant calls from a large cohort of samples (>900 unique SampleIDs).

I want to filter this dataframe and create a new dataframe that includes rows only corresponding to a specific list of SampleIDs (~100 unique SampleIDs).

Is there a way I can import my list of desired SampleIDs, filter the original dataframe and create a new dataframe that consists only of the data from my list of SampleIDs?

Thank you so much in advance for your time.


### asssuming the csv is like
# SampleID
# 342
# 377
# 899

sample_ids <- read.csv(file=#path to your csv within quote marks "myfile.csv"
)

library(tidyverse)

my_second_df <- filter(my_first_df,
                       SampleID %in% sample_ids$SampleID)
1 Like

Thank you so much @nirgrahamuk for the quick reply!

When I try this,

df2 <- filter(df1, df1$sampleID %in% sample_ids$sampleID)

I receive the error message,

Error: Problem with filter() input ..1.
x Input ..1 must be of size 5157308 or 1, not size 0.
i Input ..1 is df1$sampleID %in% sample_ids$sampleID.
Run rlang::last_error() to see where the error occurred.

this message implies that df1 does not contain a column/variable called precisely sampleID

you are recommended to check the column names like so

names(df1)

This worked, I had mistakenly typed "sampleID" when it was really "sampleId" in df1.
Thank you so so much!!! :smile:

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.