purrr equivalent to apply

mjandrews · August 8, 2019, 1:08pm

I know how purrr effectively replaces the {l,v,s,m}apply functionals, but I wonder about the apply function itself. I can see how if we have a 2d array what is done by apply when MARGIN=2, could be done by purrr::map_dbl or even dplyr::summarize_all, and when MARGIN=1, this could be done by purrr:pmap.

library(purrr)
library(tibble)
library(dplyr)
set.seed(123)
n <- 5
Df <- tibble(x = rnorm(n), y = rnorm(n), z = rnorm(n))
apply(Df, 2, sum)
         x          y          z 
 0.9678513 -0.2215949  1.5395087 
map_dbl(Df, sum)
         x          y          z 
 0.9678513 -0.2215949  1.5395087 
summarise_all(Df, sum)
# A tibble: 1 x 3
      x      y     z
  <dbl>  <dbl> <dbl>
1 0.968 -0.222  1.54
apply(Df, 1, sum)
[1]  2.3786711  0.5905525  0.6944185 -0.5056617 -0.8722154
pmap_dbl(Df, sum)
[1]  2.3786711  0.5905525  0.6944185 -0.5056617 -0.8722154

But is there a purrr way to do the following for example?

> apply(Titanic, c(2, 3), sum)
        Age
Sex      Child Adult
  Male      64  1667
  Female    45   425

mishabalyasin · August 8, 2019, 1:36pm

Multi-dimensional dataframes are quite exotic to have. The only thing that comes to mind is something like tensors where you actually need to have multiple (as in, more than 2) dimensions.

That being said, I don't have an answer to your question, but most of the time you'll probably flatten this object to have two more columns (one for Age and one for Sex) and then do all of the things you want to do using group_by + summarize.

mjandrews · August 8, 2019, 1:50pm

I agree. I should have made clear that my question was more of academic than practical interest. I don't use multidimensional arrays in R, and can't see the any advantage of e.g. the 4 x 2 x 2 x 2 Titanic table over a data frame with the same information.

CorradoLanera · August 18, 2019, 4:58am

I am not sure your operation is functional nor vectorized. So, I would not suggest trying using purrr for this computation. On the other hand, it seems to me it is more a data manipulation problem, stated for a not a data frame object. Fortunately, dplyr can convert a multi-dimensional table easily in a suitable tibble Next, the game is easy

library(dplyr, warn.conflicts = FALSE)

as_tibble(Titanic) %>%
    group_by(Sex, Age) %>%
    summarize(n = sum(n))
#> # A tibble: 4 x 3
#> # Groups:   Sex [2]
#>   Sex    Age       n
#>   <chr>  <chr> <dbl>
#> 1 Female Adult   425
#> 2 Female Child    45
#> 3 Male   Adult  1667
#> 4 Male   Child    64

^{Created on 2019-08-18 by the reprex package (v0.3.0)}

If for the final result you must have a matrix (or a matrix-like data frame) you can find many solutions here

cderv · August 18, 2019, 9:37am

In purrr, you can deal with arrays by using purrr::array_tree() or purrr::array_branch().

You would not get exactly the same format than with apply, but you can get a list or vector with the solutions

library(purrr)

# create a tree with dimension following the margin
Titanic %>%
  array_tree(c(2, 3)) %>%
  map_depth(2, sum)
#> $Male
#> $Male$Child
#> [1] 64
#> 
#> $Male$Adult
#> [1] 1667
#> 
#> 
#> $Female
#> $Female$Child
#> [1] 45
#> 
#> $Female$Adult
#> [1] 425

# create a flat list from the dimension. The names are lost here
Titanic %>%
  array_branch(c(2, 3)) %>%
  map_dbl(sum)
#> [1]   64   45 1667  425

# ...unless you recreate them
Titanic %>% {
  tab <- .
  array_branch(tab, c(2, 3)) %>%
    set_names(nm = map_chr(cross(dimnames(tab)[c(2, 3)]), paste, collapse = "_"))
} %>%
  map_dbl(sum)
#>   Male_Child Female_Child   Male_Adult Female_Adult 
#>           64           45         1667          425

^{Created on 2019-08-18 by the reprex package (v0.3.0)}

So this is possible with purrr, but no so easy. Using dplyr with tbl_cube structure seems better.

system · September 8, 2019, 9:37am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.