Finding oldest date as per certain condition for each item

I have a data set running into a few million rows which has daily data for each item. It is required to find the first date wherein the item has crossed a certain threshold and tag that date as "launch_date" and everything before that can be treated as just insignificant noise. On creating a launch_date for each item, we should be able to find the days_since_launch which gives an indication of the age of the item.

unfortunately, I am unable to find the launch_date for each item since my understanding of the if_else command seems lacking. I'm trying to find out how to find the oldest date for each particular item. The if_else command is performing a row-wise operation which is giving me an incorrect result. seeking advice from the forum on how to proceed with this.

reprex is as below.

test <- tibble::tribble(
          ~item,        ~date, ~sales,
            "a", "01-10-2020",     1L,
            "b", "01-10-2020",   100L,
            "c", "01-10-2020",  1000L,
            "a", "02-10-2020",    10L,
            "b", "02-10-2020",   200L,
            "c", "02-10-2020",  1500L,
            "a", "03-10-2020",  1000L,
            "b", "03-10-2020",   300L,
            "c", "03-10-2020",   400L,
            "a", "04-10-2020",  1500L,
            "b", "04-10-2020",   400L,
            "c", "04-10-2020",  1500L,
            "a", "05-10-2020",  2000L,
            "b", "05-10-2020",   500L,
            "c", "05-10-2020",  2000L
          )
test$date <- as.Date(test$date, format = "%d-%m-%Y")

test <- test %>% 
  arrange(item,date) %>% 
  group_by(item,date) %>% 
  mutate(launch_date = 
           if_else (sales>500, 
                    as.Date(date), NULL)) %>%
  mutate(days_since_launch = date - launch_date) %>% 
  view()

unfortunately, the output is as below:
image

required output in excel needs to look something like this (of course, desired output is in excel)
image

edit: apologies in advance for any late reverts

Is this close enough to your desired output?

library(dplyr, warn.conflicts = FALSE)
test <- tibble::tribble(
  ~item,        ~date, ~sales,
  "a", "01-10-2020",     1L,
  "b", "01-10-2020",   100L,
  "c", "01-10-2020",  1000L,
  "a", "02-10-2020",    10L,
  "b", "02-10-2020",   200L,
  "c", "02-10-2020",  1500L,
  "a", "03-10-2020",  1000L,
  "b", "03-10-2020",   300L,
  "c", "03-10-2020",   400L,
  "a", "04-10-2020",  1500L,
  "b", "04-10-2020",   400L,
  "c", "04-10-2020",  1500L,
  "a", "05-10-2020",  2000L,
  "b", "05-10-2020",   500L,
  "c", "05-10-2020",  2000L
)
test <- test %>% mutate(date = lubridate::dmy(date))
test  <- test %>% 
  filter(sales >= 500) %>% 
  group_by(item) %>% 
  summarize(LaunchDate = min(date)) %>% 
  right_join(test, by ="item") %>% 
  mutate(Days_since_Launch = date - LaunchDate) %>% 
  arrange(item, date)
#> `summarise()` ungrouping output (override with `.groups` argument)
test
#> # A tibble: 15 x 5
#>    item  LaunchDate date       sales Days_since_Launch
#>    <chr> <date>     <date>     <int> <drtn>           
#>  1 a     2020-10-03 2020-10-01     1 -2 days          
#>  2 a     2020-10-03 2020-10-02    10 -1 days          
#>  3 a     2020-10-03 2020-10-03  1000  0 days          
#>  4 a     2020-10-03 2020-10-04  1500  1 days          
#>  5 a     2020-10-03 2020-10-05  2000  2 days          
#>  6 b     2020-10-05 2020-10-01   100 -4 days          
#>  7 b     2020-10-05 2020-10-02   200 -3 days          
#>  8 b     2020-10-05 2020-10-03   300 -2 days          
#>  9 b     2020-10-05 2020-10-04   400 -1 days          
#> 10 b     2020-10-05 2020-10-05   500  0 days          
#> 11 c     2020-10-01 2020-10-01  1000  0 days          
#> 12 c     2020-10-01 2020-10-02  1500  1 days          
#> 13 c     2020-10-01 2020-10-03   400  2 days          
#> 14 c     2020-10-01 2020-10-04  1500  3 days          
#> 15 c     2020-10-01 2020-10-05  2000  4 days

Created on 2020-10-27 by the reprex package (v0.3.0)

2 Likes

slaps table

right_join!!

that's a good solution. i've used it on the test data and it seems to be working correctly. Thank you so much @FJCC!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.