Making a database from one uncleaned database minus one cleaned database


I'd like to know if there is a command on RStudio for the following question.
I have a big database and I cleaned it thanks to certain indicators. So now I have two databases : the one which is not cleaned and the one cleaned.
The thing is that I want to make another database which will be the big database minus the cleaned one. So at the end, in the new database, I'll just have the data not selected in the cleaned database.

Can somoene help me?

Does your data have a "primary key" (unique row identifier)? If so you can simply filter out the "cleaned" rows by the key column.

What you are describing is an "anti joint" operation, dplyr has a function for that as well

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

Thanks a lot for your time.

I don't have any "premary key". The cleaned database has no real filter. It was made line by line on an excel database.
I just want to delete from de big database on excel all the lines that I put on the cleaned database.
For a simple example, if I have database on Excel with 4 lines with 4 names (Mathieu, Isabella, Ronald, Marie), and tanks to that database I have created a database on excel with just 2 lines (Ronald, Marie), I would like to make a database with just Mathieu, Isabella with a command.

Is it clear enough or I you'd like more specifications ?

The explanation was clear enough from the beginning, as I said what you are describing is an anti join operation (and I already told you what function you can use for that).

As per your second example, the names would be the primary key since they uniquely identify each row and you can simply filter the first data frame by names not present in the second one.

If you need help with your specific code, please read the guide on the link I gave you and try to provide a proper reproducible example.

1 Like

Thanks a lot! I am going to try with what you told me.