How can I delete a duplicate values of a data frame without lose information and gouping it from another variable?

This is the data.frame with I am working to

I am trying to delete duplicate values from the conflict_id variable but without losing any information for the variable side_b. My only idea was to delete it this way.

data2 <- data[!duplicated(data$conflict_id),]

But the result delete all the information of the column side_b. R takes all the values that are not the same in coflict_id without grouping the values of side_b

My question is, how can I delete the duplicate values in conflict_id and group the values of side_b without losing any information of this variable and grouping it in each conflict_id result?

At the same time, how can I delete the values of side_b that are duplicate if they are together. Example, one of the 267conflict_id is "EPDM, EPRP, TPLF", but there is not separation between then. How can I do this "EPDM"," EPRP"," TPLF" for all the values in the data.frame?

I haven't tested this because I don't have your actual data, but this may work to solve your second problem:

library(tidyverse)

data %>%
  mutate(text_list = str_split(side_b, ",")) %>% 
  unnest_longer(text_list)

If you run dput(head(data, 25)) and paste the results, it will be easier to help you :grinning_face_with_smiling_eyes:

As far as your first question goes, I'm really having trouble understanding what you need. But perhaps solving the second problem will help with the first!

Thanks for the answer EeethB, realy apreciate.

I know that my english is not good and I apologice for that, anyway I going to try explain it again.

I want to delete the all the rows with the same conflict_id without lose the information of side_b, I mean, my final expected result should be all the diferent conflict_id with all the values for each conflict_id without lose any information.

The second doubt is already solved with your code, very helpfull honestly. Thaks.

So does distinct(data, conflict_id, side_b) do what you're looking for?

1 Like

I call side_b_separadoto the mutate that we did yesterday called text_line. I use your code of distinctbut I change the side_b for side_b_separado and looks like it works. I use .keep.all =TRUE function to mantain all the variables. It works!!!Thanks EeethB!!!!

1 Like

:tada: Woo-hoo! So glad I could help. Would you mind marking one of the answers as a solution so that others know the question is closed down? Thanks for the good question and your persistence in explaining what you needed!

Sorry for the delay, I will do it now.

Thaks for all @EeethB

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.