I have a big databases of orders(1 order may have many articles bought), Order table contains the following ellements: -Article_id -Order_Id -Time(Timestamp datatype)
I have to find all articles ordered within a specifc period -+300s from the first Article_id bigger than or equal a givven value.
More clearly:
Suppose we have only 1 order that containes 255 articles with article ID is between 0 and 254. each article is bought at a specific time t,
Suppose the value we are looking for is 66.5
-1st we have to get the first value greater than 66.5 (here it is 67)
-2nd we take the timestamp of the found value (67) t, and we filter by it so we get all articles bought within -/+ 300s within this timestamp t
***I use : Sparklyr (Spark with R)
Im new to this, Thanks
This what I tried:
library(dplyr)
order_tbl %>%
group_by(order_id) %>%
arrange(time) %>%
filter(first(Article_id)>=66.5) %>%
filter(between(time, TIME_FOUND_VALUE - 300, TIME_FOUND_VALUE + 300))
Data look like this:
ARTICLE_ID, ORDER_ID, TIME
2567, 1112, 2019-01-16 20:40:00.0
2670, 1117, 2019-01-16 21:40:00.0
2569, 1112, 2019-01-16 20:45:00.0