jayant
July 14, 2018, 3:41pm
1
Dear colleagues,
I have a dataset where there is duplicity in first column, and other columns need not be duplicated. For example;
> data
# A tibble: 3 × 2
name votes
<chr> <dbl>
1 Avatar 26
2 special 26 24
3 Avatar 23
As we can see the name Avatar
is repeated. My goal is to just read the first data row in case name is repeated, i.e.
> data
# A tibble: 3 × 2
name votes
<chr> <dbl>
1 Avatar 26
2 special 26 24
Can I kindly get some help here? Appreciated.
Will this work?
library(dplyr)
tbl <- tibble(name = c("Avatar", "special 26", "Avatar"),
votes = c(26, 24, 23))
tbl <- tbl %>%
group_by(name) %>%
filter(row_number() == 1) %>%
ungroup()
tcash
July 14, 2018, 4:32pm
3
Here's another way using base R:
df<-data.frame(name=c("Avatar", "special 26", "Avatar", "Madmen", "Avatar", "Madmen"), votes=c(26,24,23,22,21,25))
df[!duplicated(df$name),]
1 Like
Leon
July 14, 2018, 4:50pm
4
Let's create some dummy data:
d = tibble(my_lbl = rep(c('A', 'B', 'C'), c(3, 3, 3)), my_val = seq(1, 9))
d
# A tibble: 9 x 2
my_lbl my_val
<chr> <int>
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
7 C 7
8 C 8
9 C 9
Then we can do like so:
d %>% group_by(my_lbl) %>% slice(1) %>% ungroup
# A tibble: 3 x 2
my_lbl my_val
<chr> <int>
1 A 1
2 B 4
3 C 7
2 Likes
jayant
July 16, 2018, 1:08am
5
Thanks for the different solutions/ideas. It is always useful to know more than one ways to solve a problem.
thanks