I have a data set running into a few million rows which has daily data for each item. It is required to find the first date wherein the item has crossed a certain threshold and tag that date as "launch_date" and everything before that can be treated as just insignificant noise. On creating a launch_date for each item, we should be able to find the days_since_launch which gives an indication of the age of the item.
unfortunately, I am unable to find the launch_date for each item since my understanding of the if_else command seems lacking. I'm trying to find out how to find the oldest date for each particular item. The if_else command is performing a row-wise operation which is giving me an incorrect result. seeking advice from the forum on how to proceed with this.
reprex is as below.
test <- tibble::tribble(
~item, ~date, ~sales,
"a", "01-10-2020", 1L,
"b", "01-10-2020", 100L,
"c", "01-10-2020", 1000L,
"a", "02-10-2020", 10L,
"b", "02-10-2020", 200L,
"c", "02-10-2020", 1500L,
"a", "03-10-2020", 1000L,
"b", "03-10-2020", 300L,
"c", "03-10-2020", 400L,
"a", "04-10-2020", 1500L,
"b", "04-10-2020", 400L,
"c", "04-10-2020", 1500L,
"a", "05-10-2020", 2000L,
"b", "05-10-2020", 500L,
"c", "05-10-2020", 2000L
)
test$date <- as.Date(test$date, format = "%d-%m-%Y")
test <- test %>%
arrange(item,date) %>%
group_by(item,date) %>%
mutate(launch_date =
if_else (sales>500,
as.Date(date), NULL)) %>%
mutate(days_since_launch = date - launch_date) %>%
view()
unfortunately, the output is as below:
required output in excel needs to look something like this (of course, desired output is in excel)
edit: apologies in advance for any late reverts