I seem to frequently find myself wanting to
unnest list-columns that contain vectors, because they should really be their own columns. Often if we use
map to iterate we end up with a function that returns a vector, such as with
quantile below. We could imagine wanting to iterate over many different vectors of distributions with different parameters and getting quantiles. However, in order to use
unnest to get multiple columns out, we need a one-row data frame. The most "obvious" way of doing it with tidyverse functions that I could see was
enframe and then
enframe is supposed to be the standard function for creating a tibble from a vector. However,
spread is not fast and calling it for every row can quickly become undesirable.
Here I benchmarked a few different alternatives that I could think of, mostly running through
matrix. I'm not the best at profiling and am not too sure why the saving of one
names<- call gets such a boost, but all of these options are much, much faster than the seemingly "neat" method using
The question is: Am I missing some other method that would be faster?
The discussion part is: Should this operation be made easier, or approached in some other manner?
set.seed(1) named_vec <- quantile(rnorm(1000), c(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95)) named_vec #> 5% 10% 25% 50% 75% 90% #> -1.72695999 -1.33933368 -0.69737322 -0.03532423 0.68842795 1.32402975 #> 95% #> 1.74398317 library(tidyverse) bench::mark( enframe(named_vec) %>% spread(name, value), as_tibble(matrix(named_vec, nrow = 1, dimnames = list(NULL, names(named_vec)))), data.frame(matrix(named_vec, nrow = 1)) %>% `names<-`(names(named_vec)), as.data.frame(matrix(named_vec, nrow = 1)) %>% `names<-`(names(named_vec)), as.data.frame(matrix(named_vec, nrow = 1, dimnames = list(NULL, names(named_vec)))) ) #> # A tibble: 5 x 10 #> expression min mean median max `itr/sec` mem_alloc n_gc #> <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 enframe(n… 1.36ms 1.56ms 1.52ms 2.15ms 640. 634KB 10 #> 2 as_tibble… 262.67µs 300.5µs 287.38µs 663.3µs 3328. 0B 13 #> 3 data.fram… 138.9µs 166.06µs 162.3µs 402.71µs 6022. 280B 11 #> 4 as.data.f… 73.77µs 86.93µs 84.42µs 313.09µs 11503. 280B 14 #> 5 as.data.f… 16.22µs 19.55µs 18.92µs 120.47µs 51151. 0B 8 #> # … with 2 more variables: n_itr <int>, total_time <bch:tm>
Created on 2019-04-25 by the reprex package (v0.2.1)