Following is a sample of the data set that I am currently working on:
C_ID M_ID max_ad_date_id max_DIS_DATE SER_C_DATE_ID CL_CCT_IND
102472781 3874931 24/05/2015 28/05/2015 24MAY2015:00:00:00 N
102472781 3874931 24/05/2015 28/05/2015 24MAY2015:00:00:00 N
102472781 3874931 24/05/2015 28/05/2015 25MAY2015:00:00:00 N
102472781 3874931 24/05/2015 28/05/2015 25MAY2015:00:00:00 N
102472781 3874931 24/05/2015 28/05/2015 25MAY2015:00:00:00 N
103011920 3998668 28/01/2015 28/01/2015 28JAN2015:00:00:00 Y
I need to create a new dataset where I will have distinct of the first 4 columns & calculate a new variable LOS which will be a count of distinct SER_C_DATE_ID where CL_CCT_IND='Y' else LOS is a difference between the max_ad_date_id and max_DIS_DATE.
I am new to the R environment & tried using the following code to do the same:
AllData$max_ad_date_id<- as.Date(AllData$max_ad_date_id,"%d%b%Y")
AllData$max_DIS_DATE<- as.Date(AllData$max_DIS_DATE,"%d%b%Y")
DummyData <-AllData %>%
group_by(C_ID, M_ID,
IND) %>%
summarise(max_ad_date=max(max_ad_date_id),
max_DIS=max(max_DIS_DATE),LOS=n_distinct(SER_C_DATE[CL_CCT_IND=="Y"] ,
LOS=(max_ad_date
-max_ad_date)[CL_CCT_IND=="N"])
The code works fine until the difference calculation.
Error in summarise_impl(.data, dots) :
Column LOS
must be length 1 (a summary value), not 5
I clearly understand that this is a coding error & the difference calculation should probably not be a part of the summarize function. Also tried using mutate function but couldn't use it correctly & tried ifelse as well:
LOS = ifelse(CL_CCT_IND == 'Y', length(unique(SER_C_ID)), max_ad_date_id - max_DIS_DATE))
Can I get some help around this.Thanks in advance