Does furrr::future_map_dfr preserve order?

Hi.

I am wondering if I pass in a list of requests (xhr) that return rows of data into a dataframe (1 row per request), does future_map_df guarantee the same order in the final dataframe rows as the list passed in as argument 1?

I read the documentation: https://rdrr.io/cran/furrr/man/future_map.html to no avail and did an internet search and found the following, which didn't give me the confirmation I wanted:

I'd appreciate it if anyone knows the answer to this?

it seems to respect the ordering.

library(furrr)
library(purrr)
library(dplyr)
library(tictoc)
plan(multisession, workers = 2)


(toDo <- sample.int(30,size=30,replace=TRUE))


slowcalc <- function (x) {
  Sys.sleep(x*x/1000)
  x*x/1000
}

tic()
  map_dbl(toDo,
          slowcalc) 
toc()

tic()
  future_map_dbl(toDo,
                 slowcalc,
                 .options = furrr_options( chunk_size = 5)) 
toc()

Hi,

Thanks for your answer. If I understand correctly this demonstrates you see same output order for n number of runs. This can happen with a SQL query where there is no guaranteed order for results unless you specify Order By as implementation is set based. I guess I am wondering if the same thing is possible here (not that it is set based but rather, you can appear to have order preserved until it isn't) ? Assuming I have understood you correctly, is there any definitive proof please? I have just read here that " The default is for the processing strategy to be ‘sequential’ which results in library(furrr) working identically to library(purrr)" - though, I guess, that does not mean return order is same.

we can see in the inners of the code furrr:::furrr_template that there is code that arranges the collated result

if (!is.null(order)) {
        order_inv <- vector("integer", length = n)
        idx <- seq2(1L, n)
        order_inv[.subset(order, idx)] <- idx
        out <- out[order_inv]
    }

my example also demonstrates that this must be the case, that ordering is happening on final collation, because collation can only happen when every chunk is processed. And then either the quickest chunks are at the beginning and the later ones at the end,[i.e. it would be easy to see that it became disordered), or an order is applied to make them match with how they were requested, which is the case.

I suppose I would rely on it, and if I found there was a bug, I would file that as an issue.

1 Like

Thank you for that :slight_smile: Seems I also needed to adjust to future::plan(future::multisession, workers = 2) - perhaps something was masked.

Yes, the original ordering is maintained

1 Like

Thank you for your reply. Appreciate it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.