I'm learning how to use purrr and thought it would be useful in keeping track of some calculations.
However, I'm not sure why I can't do a particular operation using purrr::pmap involving the following components:
- List with each element of length n
- Vector of length 1
- Vector of length 1
- Vector of length n
1., 2., and 3. are all in the same data frame (named 'operations_df'). 4. is outside of the dataframe but is a vector of the same length of each list element (which are all the same length). So the function call basically involves multiplying each element in the vectors of 1. by each element in 4., and then doing adding / subtracting the resulting 1 element vectors with 2 and 3.
This works fine if I break things up by map2 functions. But I'm wondering how I can get this to work in one line with pmap?
library(purrr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# generate data
data <- rbeta(n = 10, shape1 = 80, shape2 = 80)
prob_k1 <- rbeta(n = 10, shape1 = 80, shape2 = 10)
prob_k2 <- 1-prob_k1
# perform operations on prob_k and data in a data.frame
operations_df <- tibble(components = c('1', '2'),
probability = list(prob_k1, prob_k2)) %>%
# sum over list column
mutate(n = map_dbl(probability, sum)) %>%
# mean for each row, using list column and a single 1-element vector
mutate(mu = map2_dbl(probability, n, ~ (1/.y) * sum(data * .x)))
operations_df
#> # A tibble: 2 x 4
#> components probability n mu
#> <chr> <list> <dbl> <dbl>
#> 1 1 <dbl [10]> 8.93 0.504
#> 2 2 <dbl [10]> 1.07 0.506
# this doesn't work
# variance for each row, using list column, and two 1-element vectors
operations_df %>%
mutate(var = pmap_dbl(probability, n, mu, ~ (1/(..2-1)) * sum(..1 * data^2) - ..3^2))
#> Result 1 must be a single double, not NULL of length 0
# this does work
(1/(operations_df$n[1]-1)) * sum(operations_df$probability[[1]] * data^2) - operations_df$mu[1]^2
#> [1] 0.0342961
(1/(operations_df$n[2]-1)) * sum(operations_df$probability[[2]] * data^2) - operations_df$mu[2]^2
#> [1] 3.800814
# breaking it up into two map2 calls works:
operations_df %>%
mutate(var = map2_dbl(n, probability, ~ (1/(.x-1)) * sum(.y * data^2))) %>%
mutate(var = map2_dbl(var, mu, ~ .x - .y^2))
#> # A tibble: 2 x 5
#> components probability n mu var
#> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <dbl [10]> 8.93 0.504 0.0343
#> 2 2 <dbl [10]> 1.07 0.506 3.80