Subset first row of each group

I have a dataframe of about 3 000 acquisition bids.

Variables:

  • Company ID (PERMNO)
  • Bid ID (ID)
  • Announcement date (ANDATE)
  • Bidding round (ROUND_ID)
structure(list(PERMNO = c(NA, 80094L, 61866L, 82285L, 83696L, 
84048L), ID = c(101652020, 103223020, 20100020, 706474020, 713118020, 
673419020), ANDATE = c("20/06/1988", "09/08/1988", "11/12/1987", 
"30/10/1997", "26/11/1997", "24/06/1997"), ROUND_ID = c("50000", 
"50000", "50000", "11401", "11401", "50003")), row.names = c(NA, 
6L), class = "data.frame")

Bids are in competition with each other and grouped by ROUND_ID.

Problem:
I would like to write a code (probably a for loop?) that runs through each ROUND_ID and saves the first observation (by ANDATE) in a separate dataframe. I.e. I would like to subset the first bid of each bidding round.

Can anyone help me with this?

I think I solved it myself, just creating a number sequence per ROUND_ID and then subsetting the ones with ID = 1.

library(data.table)

sample$ANDATE <- as.Date(sample$ANDATE, format = "%d/%m/%Y")
sample$ROUND_ID <- as.numeric(sample$ROUND_ID)

sample <- sample %>% arrange(ROUND_ID, ANDATE)

CC <- as.data.table(sample)
CC[, N := sequence(.N), by = ROUND_ID]

CC <- subset(CC, N == 1)
CC_first <- subset(sample, ID %in% CC$ID)

Assuming you assign your structure to DT, and with some help from lubridate, you can arrive at a more data.table centric way of doing it:

setDT(DT)
DT[,ANDATE:= lubridate::dmy(ANDATE)
   ][,lapply(.SD,min),.SDcols ="ANDATE", ROUND_ID]

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.