Hi All,
I'm creating a data frame and I need to delete all the rows where at least two columns have the same content (text). Empty cells ( NA ) shouldn't be considered duplicates. For example, in the following data frame, I would need to cancel only the first and the second rows.
df = cbind(A = c('a', 'b', 'c', 'd','NA'),
B = c('a', 'c', 'd', 'e','g'), C = c('e', 'b', 'a', 'f','NA'))
df <- data.frame (df)
But I have more than 10'000 rows, therefore I would need to find a code that allows me to detect the rows where some cells have the same contents and delete them. How could I do?
Another solution could be to concatenate all the 25 columns contents in one cell (per row) and ask R to delate the rows where the string in that cell has a name repeated twice.
Hope to have been the clearer as possible, in case ask me for clarification.
Many thanks