bind_rows on a list of database connection tibbles

Hello,

I have a list of tibbles that can change in length over time. Each tibble has the same columns. If I wanted to bind_rows() on my list of connections I get the following error:

dplyr::union_all() would work if there are 2 tables, but I need it to work with a list of variable length. Any guidance on how to do this?

I wanted to try to avoid collecting each tibble into memory because this feels like it would be more efficient, although that is an assumption.

Appreciate the time!

Kyle

I was reading through other Qs and came across this post, Looping an addition of column with part of a file name - tidyverse - RStudio Community. I somehow have never come across purrr::reduce()!

Calling the following did the trick!

my_list %>% 
    purrr::map(...) %>%
    purrr::reduce(dplyr::union_all) %>% 
    dplyr::collect()

However, I tested the speed against collecting each dataset per iteration and it is slightly faster.

my_list %>% 
    purrr::map(~ {
    ... %>%
        dplyr::collect(my_data)
    }) 

Trying this on a list that has 28 DB tibbles resulted in 10.89 seconds for the reduce method on 9.7 seconds for the collecting per iteration. I think my assumption of binding rows on the DB side would be faster. Still not quite sure how its slower though. Apologies for the lack of reprex. I haven't had to make a DB-related reprex before :sweat_smile:

If there is a better method I would love to see it!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.