furr future_map is slower than purrr map

I have a grouped dataframe grouped by gene (~20 rows) and I did a group_by(gene) and nest() to have a list column data.

I recently learned about furrr and tried to run future_map but it is running slower than purrr map. After reading the document on furrr I ungrouped my input before the call to future_map. But the sequential is still faster than the multicore plan.

I am not really sure what's going on.

Sequential plan

library(tidyverse)
library(furrr)
#sequential time
# input is grouped datframe with list-column data from previous group_by, nest 

#sequential plan
plan(sequential)
t1 <- proc.time()

result_series <- exposure_response_tpm_sample %>%  #  first 1000 rows of original input (~20k) rows
  mutate(wilcox_result = map(data, wilcox_baseline_tpm_reponse),
         log2FC_baseline=map(data, calculate_TPM_foldchange_baseline))

#t1 sequential time is 570 seconds

Multicore plan


# multicore plan

plan(multicore, workers=8)
t2 <- proc.time()

result_series <- exposure_response_tpm_sample %>% 
  ungroup() %>% #furrr doc said to ungroup
  mutate(wilcox_result = future_map(data, wilcox_baseline_tpm_reponse),
         log2FC_baseline=future_map(data, calculate_TPM_foldchange_baseline))

#t2 multicore time is 624 user seconds

1st comment.
You can at least validate the correctness in terms of calculation acfuracy by comparing your results between the two approaches, as the assumption must be that they would be the same whether you wee a speed benwfit or speed regression. Probably its wrong because you lost the groups. Probably ungrouping is a mistake in so far as it wasnt replaced by an alternative that can chunk your data into the required groupings. Possibly nesting or splitting would be the way to go.

2nd comment is a standar one that i will reproduce aboit the benefits of making a reprex that should make your issue demoable by forum users to help them to help you with it.

Thanks for providing code. Could you kindly take further steps to make it easier for other forum users to help you? Share some representative data that will enable your code to run and show the problematic behaviour.

How do I share data for a reprex?

You might use tools such as the library datapasta, or the base function dput() to share a portion of data in code form, i.e. that can be copied from forum and pasted to R session.

Reprex Guide

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.