Dear colleagues, I have a dataset where there is duplicity in first column, and other columns need not be duplicated. For example;
> data # A tibble: 3 × 2 name votes <chr> <dbl> 1 Avatar 26 2 special 26 24 3 Avatar 23
As we can see the name Avatar is repeated. My goal is to just read the first data row in case name is repeated, i.e.
Avatar
> data # A tibble: 3 × 2 name votes <chr> <dbl> 1 Avatar 26 2 special 26 24
Can I kindly get some help here? Appreciated.
Will this work?
library(dplyr) tbl <- tibble(name = c("Avatar", "special 26", "Avatar"), votes = c(26, 24, 23)) tbl <- tbl %>% group_by(name) %>% filter(row_number() == 1) %>% ungroup()
Here's another way using base R:
df<-data.frame(name=c("Avatar", "special 26", "Avatar", "Madmen", "Avatar", "Madmen"), votes=c(26,24,23,22,21,25)) df[!duplicated(df$name),]
Let's create some dummy data:
d = tibble(my_lbl = rep(c('A', 'B', 'C'), c(3, 3, 3)), my_val = seq(1, 9)) d # A tibble: 9 x 2 my_lbl my_val <chr> <int> 1 A 1 2 A 2 3 A 3 4 B 4 5 B 5 6 B 6 7 C 7 8 C 8 9 C 9
Then we can do like so:
d %>% group_by(my_lbl) %>% slice(1) %>% ungroup # A tibble: 3 x 2 my_lbl my_val <chr> <int> 1 A 1 2 B 4 3 C 7
Thanks for the different solutions/ideas. It is always useful to know more than one ways to solve a problem.
thanks