Choose First Instance of Data by Month

dlsweet · July 27, 2018, 1:44pm

I have code which looks like the following and I am trying to get the highest rank for each publisher each month:

Rank   Title   Publisher  Year  Month   Date
1        a1       a       2000  April  Apr 2000
2        b1       b       2000  April  Apr 2000
3        a2       a       2000  April  Apr 2000
1        a3       a       2000  May    May 2000

So I would want a new dataset that had every row except row three since that is publisher a's second highest rated book for that month.

I don't know how to go about doing this so any suggestions would be appreciated. Thank you all!

martin.R · July 27, 2018, 1:50pm

If the Rank is always 1, then:
library(dplyr)

df %>% 
  group_by(Publisher) %>% 
  filter(Rank == 1) %>% 
  ungroup()

If the rank varies, then:

df %>% 
  group_by(Publisher) %>% 
  arrange(Rank) %>% 
  filter(row_number() == 1L) %>% 
  ungroup()

jonspring · July 27, 2018, 2:26pm

I think you will also want to group by month while you're at it, if you want to get each publisher's #1 separately for each month they're in the data set.

df %>% 
  group_by(Publisher, Date) %>% 
  filter(Rank == 1L) %>% 
  ungroup()

martin.R · July 27, 2018, 2:30pm

Correct, I overlooked that.

dlsweet · July 27, 2018, 5:38pm

This is the code that I tried but it was just returning the #1 seller for each month, not the best ranking for each publisher for each month.

plt1.dat <- Overall.Sales %>%
  group_by(Publisher2, Date) %>%
  filter(Rank.in.Units == 1L) %>%
  ungroup()

EDIT:

I got it to work!! The 1L was throwing it off! This code works:

plt1.dat <- Overall.Sales %>%
  group_by(Publisher2, Date) %>%
  filter(Rank.in.Units == min(Rank.in.Units)) %>%
  ungroup()