Building a parsnip model for {circular}

Hello, I'm a relatively new R programmer. I've been using the circular package to create regression models with circular outcomes.

I wanted to add a parsnip model for this to load with a package that I wrote to help me with my work. However, I find that even when the parameters are kept the same the traditional and parsnip methods do not produce remotely the same results.

The MWE below contains:

  1. Setting up the parsnip model and specifications
  2. Traditional circular::lm.circular() function run using the same parameters from the toy dataset. This function doesn't use formulas, but uses a y-vector and x-matrix for outcomes/predictors. It also requires an initialization of a vector to identify the number of columns of the predictor matrix.
  3. The circular_reg() approach (using parsnip model specification) with the same parameters. The fitting process per tidymodels suggest I should be able to declare a formula and it will pre-process it appropriately.
### MINIMAL WORKING EXAMPLE

set.seed(1234)
library(circular)
library(tidyverse)
library(tidymodels)

# Start making new model
parsnip::set_new_model("circular_reg")

# Add parsnip models to another package
parsnip::set_model_mode(model = "circular_reg", mode = "regression")
parsnip::set_model_engine("circular_reg", mode = "regression", eng = "circular")
parsnip::set_dependency("circular_reg", eng = "circular", pkg = "circular")

# Arguments = type
parsnip::set_model_arg(
    model = "circular_reg",
    eng = "circular",
    parsnip = "pattern",
    original = "type",
    func = list(pkg = "circular", fun = "lm.circular"),
    has_submodel = FALSE
)

# Arguments = init
parsnip::set_model_arg(
    model = "circular_reg",
    eng = "circular",
    parsnip = "initial",
    original = "init",
    func = list(pkg = "circular", fun = "lm.circular"),
    has_submodel = FALSE
)

# Arguments = tol
parsnip::set_model_arg(
    model = "circular_reg",
    eng = "circular",
    parsnip = "tolerance",
    original = "tol",
    func = list(pkg = "circular", fun = "lm.circular"),
    has_submodel = FALSE
)

# Encoding
parsnip::set_encoding(
    model = "circular_reg",
    eng = "circular",
    mode = "regression",
    options = list(
        predictor_indicators = "traditional",
        compute_intercept = TRUE,
        remove_intercept = FALSE,
        allow_sparse_x = TRUE
    )
)

# Fit
parsnip::set_fit(
    model = "circular_reg",
    eng = "circular",
    mode = "regression",
    value = list(
        interface = "matrix",
        protect = c("x", "y"),
        func = c(pkg = "circular", fun = "lm.circular"),
        defaults = list(verbose = TRUE)
    )
)

# Prediction
parsnip::set_pred(
    model = "circular_reg",
    eng = "circular",
    mode = "regression",
    type = "numeric",
    value = list(
        pre = NULL,
        post = NULL,
        func = c(fun = "predict"),
        args = list(
            object = quote(object$fit),
            new_data = quote(new_data),
            type = "numeric"
        )
    )
)

# Official parsnip model spec
circular_reg <- function(mode = "regression", pattern = NULL, initial = NULL, tolerance = NULL) {

    # Check correct mode
    if(mode != "regression") {
        stop("`mode` should be 'regression'", call. = FALSE)
    }

    # Capture arguments
    args <- list(
        pattern = rlang::enquo(pattern),
        initial = rlang::enquo(initial),
        tolerance = rlang::enquo(tolerance)
    )

    # Model specs / slots
    parsnip::new_model_spec(
        "circular_reg",
        args = args,
        mode = mode,
        eng_args = NULL,
        method = NULL,
        engine = NULL
    )
}

### MODELING SETUP

head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
df <- mtcars[c("mpg", "wt", "hp")]

# Formula to eventually be used
f <- mpg ~ wt + hp

# Create the appropriate matrices. The x-matrix has an intercept
# MPG is going to be in "degrees" as a "circular" object
mat <- model.frame(f, data = mtcars)
x <- model.matrix(f, data = mat)
y <- circular::circular(mat[["mpg"]], units = "degrees")

### TRADITIONAL MODEL

m_trad <- circular::lm.circular(y = y, x = x, type = "c-l", init = rep(0, ncol(x)), tol = 1e-1, verbose = TRUE)
#> Iteration  1 :    Log-Likelihood =  114.0438 
#> Iteration  2 :    Log-Likelihood =  114.0441

### PARSNIP MODEL

# Making sure the y vector is `circular` in class
df$mpg <- circular::circular(df$mpg, units = "degrees")

# Making it the parsnip way
set.seed(1234)
m_parsnip <-
  circular_reg(pattern = "c-l", initial = rep(0, 3), tolerance = 1e-1) %>%
  set_engine("circular") %>%
  fit(f, data = df)
#> Warning in as.circular(x): an object is coerced to the class 'circular' using default value for the following components:
#>   type: 'angles'
#>   units: 'radians'
#>   template: 'none'
#>   modulo: 'asis'
#>   zero: 0
#>   rotation: 'counter'
#> conversion.circularyradians0counter2pi
#> Iteration  1 :    Log-Likelihood =  2.478283 
#> Iteration  2 :    Log-Likelihood =  2.769604 
#> Iteration  3 :    Log-Likelihood =  3.066381 
#> Iteration  4 :    Log-Likelihood =  3.165462 
#> Iteration  5 :    Log-Likelihood =  3.205179

### COMPARE

print(m_trad)
#> 
#> Call:
#> lm.circular.cl(y = ..1, x = ..2, init = ..3, verbose = TRUE,     tol = 0.1)
#> 
#> 
#>  Circular-Linear Regression 
#> 
#>  Coefficients:
#>        Estimate Std. Error t value Pr(>|t|)    
#> [1,]  1.498e-01  1.334e-02  11.228  < 2e-16 ***
#> [2,] -3.394e-02  5.280e-03   6.429 6.42e-11 ***
#> [3,] -2.772e-04  7.517e-05   3.689 0.000113 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>  Log-Likelihood:  114 
#> 
#>  Summary: (mu in radians)
#>   mu:  20.09 ( 0.4582 )  kappa:  539.6 ( 134.8 )
#> p-values are approximated using normal distribution
print(m_parsnip)
#> parsnip model object
#> 
#> Fit time:  6ms 
#> 
#> Call:
#> lm.circular.cl(y = ..2, x = ..1, init = ..3, verbose = TRUE,     tol = ..4)
#> 
#> 
#>  Circular-Linear Regression 
#> 
#>  Coefficients:
#>      Estimate Std. Error t value Pr(>|t|)  
#> [1,] -1.79253    1.17066   1.531   0.0629 .
#> [2,]  1.38814    0.74510   1.863   0.0312 *
#> [3,] -0.01965    0.01238   1.587   0.0563 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>  Log-Likelihood:  3.205 
#> 
#>  Summary: (mu in radians)
#>   mu:  2.736 ( 0.4096 )  kappa:  0.6579 ( 0.2703 )
#> p-values are approximated using normal distribution

The specific questions and areas of explanation/exploration are:

  1. Why do the results differ between runs of the model?
  2. Does the conversion of fit() to the correct matrix format for lm.circular() take away the attributes of the circular y vector (response variable)?
  3. If I do not know the formula that I will be using for the model until the time of fitting, how can I pass the number of covariates to the fit() function? Is there a way to include a function, such as initial = ncol(x) after x has been made during the fitting process?

I appreciate any help from both a statistics and parsnip approach.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.