I am still very new to R and looking for help with a problem I'm trying to solve. I have a large data set I'm working with and I then have a subset of that data, Subset A. I want to be able to subtract Subset A from the main data to create a second subset of what's left - so essentially I would have Subset A and Subset B, and when you put them together, you would have the whole data set.
To do this, I've merged the main dataset and Subset A to identify and filter out the duplicates (based on duplicate values in a specific column; the whole rows are not completely identical). However, after playing around with duplicate(), distinct(), and unique(), the problem I keep running into is that I can't filter out or account for ALL duplicates. The functions will always leave me with one unique row to represent each duplicate value.
So for example, in the below data frame:
name = c("Mary", "Frank", "Tom", "Frank", "Mary")
color = c("red", "red", "red", "red", "red")
data = data.frame(name, color)
data
name color
1 Mary red
2 Frank red
3 Tom red
4 Frank red
5 Mary red
I would want to subset so that only the row with "Tom" remains.
name color
1 Tom red
I do NOT want a subset that returns:
name color
1 Mary red
2 Frank red
3 Tom red
Does that make sense? I should note that in this dataset, I have a lot of values to account for. Am I going about this the completely wrong way? Or am I just missing something? I know how I can do this in Excel, but I would like to learn how to do it in R.