Hi, I have a tibble that looks like this.
id title genres
<dbl> <chr> <chr>
1 19995 Avatar Action
2 19995 Avatar Adventure
3 19995 Avatar Fantasy
4 19995 Avatar Science Fiction
5 285 Pirates of the Caribbean: At World's End Adventure
6 285 Pirates of the Caribbean: At World's End Fantasy
7 285 Pirates of the Caribbean: At World's End Action
8 206647 Spectre Action
9 206647 Spectre Adventure
10 206647 Spectre Crime
I want to create dummy variables based on genres. I tried dummy_cols from fastDummies package. However, I would get this.
id title genres genres_Action genres_Adventure genres_Fantasy `genres_Science~ genres_Crime genres_Drama
<dbl> <chr> <chr> <int> <int> <int> <int> <int> <int>
1 19995 Avat~ Action 1 0 0 0 0 0
2 19995 Avat~ Adven~ 0 1 0 0 0 0
3 19995 Avat~ Fanta~ 0 0 1 0 0 0
4 19995 Avat~ Scien~ 0 0 0 1 0 0
5 285 Pira~ Adven~ 0 1 0 0 0 0
6 285 Pira~ Fanta~ 0 0 1 0 0 0
7 285 Pira~ Action 1 0 0 0 0 0
8 206647 Spec~ Action 1 0 0 0 0 0
9 206647 Spec~ Adven~ 0 1 0 0 0 0
10 206647 Spec~ Crime 0 0 0 0 1 0
As you can see, there are repeating titles. How could I merge the same title either based on id or title? Or I should have done something before I dummify the column?
I have a feeling that this is really easy to solve, but I just can't remember anything that would help. I couldn't find an answer through googling too.
Thank you.