Fuzzy joining tables with multiple date ranges

I am trying to join two data frames to create a tidy dataset. One table is a time series of days, the other table has date ranges and corresponding prices. Ideally the output table would be a column of dates, and then prices for that day on different indices. I am pretty certain the fuzzyjoin package is the best way to do this, however it's resulting in multiple rows in my dataset for each day. Here's a reprex of what I am seeing:


#df1 is a vector of dates
dates <- as_date(c(as_date("2020-01-01"):as_date("2020-12-31")))
dates <- tibble(
    OPR_DATE = dates)

#df2 is a tibble of start dates, end dates, index names, and prices
prices <- tibble(startdate = as_date(c("2020-01-01", "2020-01-01", "2020-01-10")),
                 enddate = as_date(c("2020-01-31", "2020-01-09", "2020-01-31")),
                 indexname = c("A", "B", "B"),
                 price = c(2.25, 2.10, 2.15))

#pivoting and joining the tables to create a tidy dataset
wide_prices <- prices %>%
  pivot_wider(id_cols = c(startdate, enddate),
              names_from = indexname,
              values_from = price) %>%

#Join the now wide price table to the dates vector.  This should create a 3 column tibble with OPR_DATE, Index A, Index B with prices as the values
join <- dates %>%
  fuzzy_inner_join(y = wide_prices,
                  by = c("OPR_DATE" = "startdate", "OPR_DATE" = "enddate"),
                  match_fun = list(`>=`, `<=`)) %>%
  select(OPR_DATE, A, B)

It's similar to this link, but I want a nice tibble of 3 columns with a price value for each date covered. Instead I get a group method error.

Feeling pretty embarrassed I can't crack this nut.

Is it possible you're missing the rest of the tibble here? Right now the parentheses are unclosed, but you also reference variables that aren't in prices (if one tries to reproduce from this) in your subsequent steps. Using the reprex package will help you ensure that someone else can run your code in a fresh session, and help you out!

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

What I have done in situations similar to you is to use tidyr::complete() to expand the wide_prices table (using something like tidyr::complete(wide_prices, start_date = tidyr::full_seq(start_date, 1)), followed by a tidyr::fill(wide_prices, everything(), .direction = "down")) (and ignored the enddate column), and then just a normal dplyr::left_join() with dates -- or even a wide_prices %>% dplyr::filter(startdate %in% dates)

You might also want to check out the funneljoin package, which certainly has some similarities to fuzzyjoin, but might work better - I meant to use it next time this situation arises, but I haven't yet had an opportunity.

Yeah it would have helped if I had copied the tibble I was building properly into the comment box.. That's been amended. The idea was to make this a reprex by making it a simple tibble.

I ran your code and didnt see a group method error, did you omit a further step perhaps ?

something like

 %>% group_by(OPR_DATE) %>% summarise_all(max,na.rm=TRUE) 

should finish it (or I'm missing some detail)

That did it! Thanks for your help.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.