Handling a varying number of function arguments in dplyr::mutate

timcole · November 21, 2019, 9:21pm

Regression models in the gamlss package have an associated distribution family such as NO or BCCG or BCT. Here is a BCCG model of height vs age and the fitted centiles:

suppressPackageStartupMessages({
library(dplyr)
library(tibble)
library(gamlss)
library(glue)
library(AGD)
})
data <-AGD::boys7482[, c('age', 'hgt')]
data <- na.omit(data[sample(nrow(data), 500), ])
object <- gamlss(hgt ~ pbm(age), sigma.form=~pb(age), family=BCCG,
                 data=data, trace=FALSE)
centiles(object, data$age, xlab='age', ylab='height')

#> % of cases below  0.4 centile is  0.4081633 
#> % of cases below  2 centile is  1.836735 
#> % of cases below  10 centile is  10.61224 
#> % of cases below  25 centile is  24.69388 
#> % of cases below  50 centile is  47.55102 
#> % of cases below  75 centile is  75.5102 
#> % of cases below  90 centile is  89.59184 
#> % of cases below  98 centile is  97.7551 
#> % of cases below  99.6 centile is  99.79592

The parameters for each model family are a subset of c('mu', 'sigma', 'nu', 'tau') and are extracted as follows:

newage <- tibble(age = 0:20)
(pars <- as_tibble(predictAll(object, newage, data=data)))
#> new prediction 
#> New way of prediction in pbm()  (starting from GAMLSS version 5.0-3) 
#> new prediction 
#> New way of prediction in pb()  (starting from GAMLSS version 5.0-3)
#> # A tibble: 21 x 3
#>       mu  sigma    nu
#>    <dbl>  <dbl> <dbl>
#>  1  53.5 0.0394 0.719
#>  2  77.1 0.0395 0.719
#>  3  88.9 0.0397 0.719
#>  4  97.8 0.0398 0.719
#>  5 105.  0.0399 0.719
#>  6 112.  0.0400 0.719
#>  7 120.  0.0401 0.719
#>  8 127.  0.0403 0.719
#>  9 133.  0.0404 0.719
#> 10 139.  0.0405 0.719
#> # … with 11 more rows

For BCCG the parameters are c('mu', 'sigma', 'nu'), i.e. names(pars).

There are then functions to process the parameters, with the general name [dpqr]family such as:

# qNO(p, mu, sigma)
# pBCCG(q, mu, sigma, nu)
# rBCT(n, mu, sigma, nu, tau)

but they all have different numbers of arguments. I'd like to write code to handle these cases generally in a dplyr thread. I can extract e.g. the qfamily function as follows, and this works for all families:

qfamily <- eval(parse(text = glue('gamlss.dist::q{object$family[[1]]}')))

Then I can process the data like this, so long as I explicitly name the parameters:

bind_cols(newage, pars) %>%
  mutate(p10 = qfamily(0.1, mu=mu, sigma=sigma, nu=nu))
#> # A tibble: 21 x 5
#>      age    mu  sigma    nu   p10
#>    <int> <dbl>  <dbl> <dbl> <dbl>
#>  1     0  53.5 0.0394 0.719  50.8
#>  2     1  77.1 0.0395 0.719  73.2
#>  3     2  88.9 0.0397 0.719  84.4
#>  4     3  97.8 0.0398 0.719  92.8
#>  5     4 105.  0.0399 0.719  99.8
#>  6     5 112.  0.0400 0.719 107. 
#>  7     6 120.  0.0401 0.719 114. 
#>  8     7 127.  0.0403 0.719 121. 
#>  9     8 133.  0.0404 0.719 126. 
#> 10     9 139.  0.0405 0.719 132. 
#> # … with 11 more rows

This works for a BCCG object but not for NO or BCT.

So what I'm looking for is a mutate incantation, possibly including vars(names(pars)), that will replace the mu=mu, sigma=sigma, nu=nu list in the call and generalise it to all families.

^{Created on 2019-11-22 by the reprex package (v0.3.0)}

technocrat · November 21, 2019, 9:32pm

Hi! This sounds like an interesting question, but it's hard to follow without a reproducible example, called a reprex. Could you post one?

timcole · November 22, 2019, 1:38pm

I've reposted it as a reprex.

Max · November 22, 2019, 2:02pm

Parsing text is almost always a bad idea. Perhaps there's a better one, but you could contruct the call and evaluate it using rlang::call2():

> q_fam <- function(p, mu, sigma, nu, obj) {
+     fam <- object$family[[1]]
+     quant <- paste0("q", fam)
+     q_call <-
+         rlang::call2(
+             .ns = "gamlss.dist",
+             .fn = quant,
+             p = p,
+             mu = mu,
+             sigma = sigma,
+             nu = nu
+         )
+     rlang::eval_tidy(q_call)
+ }
> 
> bind_cols(newage, pars) %>%
+     mutate(
+         p10 = qfamily(0.1, mu = mu, sigma = sigma, nu = nu),
+         p10_expr = q_fam(0.1, mu = mu, sigma = sigma, nu = nu, obj = object)
+     )
# A tibble: 21 x 6
     age    mu  sigma    nu   p10 p10_expr
   <int> <dbl>  <dbl> <dbl> <dbl>    <dbl>
 1     0  53.4 0.0364 0.127  51.0     51.0
 2     1  77.1 0.0355 0.127  73.6     73.6
 3     2  89.2 0.0348 0.127  85.3     85.3
 4     3  97.7 0.0348 0.127  93.5     93.5
 5     4 105.  0.0357 0.127 101.     101. 
 6     5 112.  0.0374 0.127 107.     107. 
 7     6 119.  0.0400 0.127 113.     113. 
 8     7 125.  0.0431 0.127 118.     118. 
 9     8 132.  0.0462 0.127 124.     124. 
10     9 138.  0.0487 0.127 129.     129. 
# … with 11 more rows

rlang::exec() is a similar option

timcole · November 22, 2019, 2:44pm

OK, get(glue('gamlss.dist::q{object$family[1]}')) is probably tidier.

Thanks for your q_fam suggestion. But it is still restricted to c(mu, sigma, nu) and won't work for a general subset of c(mu, sigma, nu, tau). But how might it look if instead of passing the individual parameters to the function, one passed the set of parameters as a character string, e.g. pars = names(pars)? How could you set up the call2 call with the elements in pars? That is the tricky bit.

Note that the pars string is also available in object as object$parameters, so if object is an arg then the pars arg is not needed.

Max · November 22, 2019, 8:32pm

Can you make a general qfamily function if the arguments will be different between families (since the argument names will have to change)? You would have to change the call inside of the mutate() to have the right parameters so why not just use the actual function?

technocrat · November 22, 2019, 11:23pm

Great.

For the general problem of mapping functions with variable numbers of arguments see Map over multiple inputs simultaneously.

However, for gamlss::predictAll, you don't need it, because unlike gamlss:predict, it lacks a what argument.

Modifying the example in the help:

suppressPackageStartupMessages(library(gamlss))
data(aids)
# changed example family from PO to BCT
a<-gamlss(y~poly(x,3)+qrt, family=BCT, data=aids) # 
#> GAMLSS-RS iteration 1: Global Deviance = 400.7461 
#> GAMLSS-RS iteration 2: Global Deviance = 399.9772 
#> GAMLSS-RS iteration 3: Global Deviance = 399.8287 
#> GAMLSS-RS iteration 4: Global Deviance = 399.7042 
#> GAMLSS-RS iteration 5: Global Deviance = 399.5705 
#> GAMLSS-RS iteration 6: Global Deviance = 399.4257 
#> GAMLSS-RS iteration 7: Global Deviance = 399.2902 
#> GAMLSS-RS iteration 8: Global Deviance = 399.1767 
#> GAMLSS-RS iteration 9: Global Deviance = 399.0939 
#> GAMLSS-RS iteration 10: Global Deviance = 399.0422 
#> GAMLSS-RS iteration 11: Global Deviance = 399.0137 
#> GAMLSS-RS iteration 12: Global Deviance = 398.9978 
#> GAMLSS-RS iteration 13: Global Deviance = 398.9896 
#> GAMLSS-RS iteration 14: Global Deviance = 398.9846 
#> GAMLSS-RS iteration 15: Global Deviance = 398.9825 
#> GAMLSS-RS iteration 16: Global Deviance = 398.9811 
#> GAMLSS-RS iteration 17: Global Deviance = 398.9807
newaids<-data.frame(x=c(45,46,47), qrt=c(2,3,4))
# comment to example this `predict` has mu as the default 'what`
ap <- predict(a, newdata=newaids, type = "response")
#> Warning in predict.gamlss(a, newdata = newaids, type = "response"): There is a discrepancy  between the original and the re-fit 
#>  used to achieve 'safe' predictions 
#> 
ap
#> [1] 398.9276 403.0023 396.8515
# now getting all the parameters
predictAll(a, newdata=newaids)
#> Warning in predict.gamlss(object, newdata = newdata, what = "mu", type = type, : There is a discrepancy  between the original and the re-fit 
#>  used to achieve 'safe' predictions 
#> 
#> $mu
#> [1] 398.9276 403.0023 396.8515
#> 
#> $sigma
#> [1] 0.1096488 0.1096488 0.1096488
#> 
#> $nu
#> [1] 0.8255283 0.8255283 0.8255283
#> 
#> $tau
#> [1] 2.191327 2.191327 2.191327
#> 
#> attr(,"family")
#> [1] "BCT"       "Box-Cox t"
# added to the example
cbind(newaids,ap)
#>    x qrt       ap
#> 1 45   2 398.9276
#> 2 46   3 403.0023
#> 3 47   4 396.8515

^{Created on 2019-11-22 by the reprex package (v0.3.0)}

timcole · November 24, 2019, 9:25am

Thanks @Max and @technocrat. I've realised I omitted to say something significant - I'm using gamlss for simulation, where I summarise objects in terms of their mu sigma nu tau values rather than the object itself.. That's why I need the [dpqr]{family} functions.

I think I now have the solution in function dpqrFAMILY. It works with models of varying complexity - see below - and is flexible enough in principle to handle the [dpqr] functions for all 114 (!) distribution families in gamlss.

dpqrFAMILY <- function(object, dpqr = c('d', 'p', 'q', 'r'), npq, data, ...) {
  stopifnot(is.gamlss(object))
  fun <- get(paste0(dpqr, object$family[[1]]))
  with(data,
       switch(length(object$parameters),
              fun(npq, mu = mu, ...),
              fun(npq, mu = mu, sigma = sigma, ...),
              fun(npq, mu = mu, sigma = sigma, nu = nu, ...),
              fun(npq, mu = mu, sigma = sigma, nu = nu, tau = tau, ...),
       )
  )
}

suppressPackageStartupMessages({
library(dplyr)
library(tibble)
library(gamlss)
library(glue)
library(AGD)
})
data <-AGD::boys7482[, c('age', 'hgt')]
data <- na.omit(data[sample(nrow(data), 500), ])

    families <- c('NO', 'BCCG', 'BCT')
    newage <- tibble(age = 0:20)
    setNames(
      lapply(families, function(family) {
        object <- gamlss(hgt ~ pbm(age), sigma.form = ~pb(age),
                         family = family, data = data, trace = FALSE)
        predictAll(object, newage, data = data) %>%
          as_tibble() %>%
          bind_cols(newage, .) %>%
          mutate(p10 = dpqrFAMILY(object, 'q', 0.1, .))
      }),
    families)
    #> new prediction 
    #> New way of prediction in pbm()  (starting from GAMLSS version 5.0-3) 
    #> new prediction 
    #> New way of prediction in pb()  (starting from GAMLSS version 5.0-3) 
    #> new prediction 
    #> New way of prediction in pbm()  (starting from GAMLSS version 5.0-3) 
    #> new prediction 
    #> New way of prediction in pb()  (starting from GAMLSS version 5.0-3) 
    #> new prediction 
    #> New way of prediction in pbm()  (starting from GAMLSS version 5.0-3) 
    #> new prediction 
    #> New way of prediction in pb()  (starting from GAMLSS version 5.0-3)
    #> $NO
    #> # A tibble: 21 x 4
    #>      age    mu sigma   p10
    #>    <int> <dbl> <dbl> <dbl>
    #>  1     0  53.9  2.48  50.7
    #>  2     1  76.8  2.76  73.3
    #>  3     2  89.8  3.07  85.8
    #>  4     3  98.0  3.40  93.6
    #>  5     4 106.   3.74 101. 
    #>  6     5 114.   4.08 109. 
    #>  7     6 120.   4.45 114. 
    #>  8     7 125.   4.86 119. 
    #>  9     8 131.   5.31 125. 
    #> 10     9 138.   5.80 130. 
    #> # … with 11 more rows
    #> 
    #> $BCCG
    #> # A tibble: 21 x 5
    #>      age    mu  sigma    nu   p10
    #>    <int> <dbl>  <dbl> <dbl> <dbl>
    #>  1     0  53.8 0.0397  1.27  51.0
    #>  2     1  76.8 0.0382  1.27  73.0
    #>  3     2  89.8 0.0372  1.27  85.5
    #>  4     3  98.0 0.0366  1.27  93.4
    #>  5     4 106.  0.0363  1.27 101. 
    #>  6     5 114.  0.0364  1.27 109. 
    #>  7     6 120.  0.0371  1.27 114. 
    #>  8     7 125.  0.0382  1.27 119. 
    #>  9     8 131.  0.0397  1.27 125. 
    #> 10     9 138.  0.0415  1.27 130. 
    #> # … with 11 more rows
    #> 
    #> $BCT
    #> # A tibble: 21 x 6
    #>      age    mu  sigma    nu   tau   p10
    #>    <int> <dbl>  <dbl> <dbl> <dbl> <dbl>
    #>  1     0  53.8 0.0365  1.36  20.9  51.1
    #>  2     1  76.9 0.0357  1.36  20.9  73.2
    #>  3     2  89.6 0.0353  1.36  20.9  85.4
    #>  4     3  97.9 0.0350  1.36  20.9  93.4
    #>  5     4 106.  0.0351  1.36  20.9 101. 
    #>  6     5 114.  0.0354  1.36  20.9 109. 
    #>  7     6 120.  0.0360  1.36  20.9 114. 
    #>  8     7 125.  0.0371  1.36  20.9 119. 
    #>  9     8 131.  0.0384  1.36  20.9 125. 
    #> 10     9 138.  0.0400  1.36  20.9 130. 
    #> # … with 11 more rows

<sup>Created on 2019-11-24 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

system · December 1, 2019, 9:36am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.