Hello, I'm a relatively new R programmer. I've been using the circular
package to create regression models with circular outcomes.
I wanted to add a parsnip model for this to load with a package that I wrote to help me with my work. However, I find that even when the parameters are kept the same the traditional and parsnip methods do not produce remotely the same results.
The MWE below contains:
- Setting up the parsnip model and specifications
- Traditional
circular::lm.circular()
function run using the same parameters from the toy dataset. This function doesn't use formulas, but uses a y-vector and x-matrix for outcomes/predictors. It also requires an initialization of a vector to identify the number of columns of the predictor matrix. - The
circular_reg()
approach (using parsnip model specification) with the same parameters. The fitting process per tidymodels suggest I should be able to declare a formula and it will pre-process it appropriately.
### MINIMAL WORKING EXAMPLE
set.seed(1234)
library(circular)
library(tidyverse)
library(tidymodels)
# Start making new model
parsnip::set_new_model("circular_reg")
# Add parsnip models to another package
parsnip::set_model_mode(model = "circular_reg", mode = "regression")
parsnip::set_model_engine("circular_reg", mode = "regression", eng = "circular")
parsnip::set_dependency("circular_reg", eng = "circular", pkg = "circular")
# Arguments = type
parsnip::set_model_arg(
model = "circular_reg",
eng = "circular",
parsnip = "pattern",
original = "type",
func = list(pkg = "circular", fun = "lm.circular"),
has_submodel = FALSE
)
# Arguments = init
parsnip::set_model_arg(
model = "circular_reg",
eng = "circular",
parsnip = "initial",
original = "init",
func = list(pkg = "circular", fun = "lm.circular"),
has_submodel = FALSE
)
# Arguments = tol
parsnip::set_model_arg(
model = "circular_reg",
eng = "circular",
parsnip = "tolerance",
original = "tol",
func = list(pkg = "circular", fun = "lm.circular"),
has_submodel = FALSE
)
# Encoding
parsnip::set_encoding(
model = "circular_reg",
eng = "circular",
mode = "regression",
options = list(
predictor_indicators = "traditional",
compute_intercept = TRUE,
remove_intercept = FALSE,
allow_sparse_x = TRUE
)
)
# Fit
parsnip::set_fit(
model = "circular_reg",
eng = "circular",
mode = "regression",
value = list(
interface = "matrix",
protect = c("x", "y"),
func = c(pkg = "circular", fun = "lm.circular"),
defaults = list(verbose = TRUE)
)
)
# Prediction
parsnip::set_pred(
model = "circular_reg",
eng = "circular",
mode = "regression",
type = "numeric",
value = list(
pre = NULL,
post = NULL,
func = c(fun = "predict"),
args = list(
object = quote(object$fit),
new_data = quote(new_data),
type = "numeric"
)
)
)
# Official parsnip model spec
circular_reg <- function(mode = "regression", pattern = NULL, initial = NULL, tolerance = NULL) {
# Check correct mode
if(mode != "regression") {
stop("`mode` should be 'regression'", call. = FALSE)
}
# Capture arguments
args <- list(
pattern = rlang::enquo(pattern),
initial = rlang::enquo(initial),
tolerance = rlang::enquo(tolerance)
)
# Model specs / slots
parsnip::new_model_spec(
"circular_reg",
args = args,
mode = mode,
eng_args = NULL,
method = NULL,
engine = NULL
)
}
### MODELING SETUP
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
df <- mtcars[c("mpg", "wt", "hp")]
# Formula to eventually be used
f <- mpg ~ wt + hp
# Create the appropriate matrices. The x-matrix has an intercept
# MPG is going to be in "degrees" as a "circular" object
mat <- model.frame(f, data = mtcars)
x <- model.matrix(f, data = mat)
y <- circular::circular(mat[["mpg"]], units = "degrees")
### TRADITIONAL MODEL
m_trad <- circular::lm.circular(y = y, x = x, type = "c-l", init = rep(0, ncol(x)), tol = 1e-1, verbose = TRUE)
#> Iteration 1 : Log-Likelihood = 114.0438
#> Iteration 2 : Log-Likelihood = 114.0441
### PARSNIP MODEL
# Making sure the y vector is `circular` in class
df$mpg <- circular::circular(df$mpg, units = "degrees")
# Making it the parsnip way
set.seed(1234)
m_parsnip <-
circular_reg(pattern = "c-l", initial = rep(0, 3), tolerance = 1e-1) %>%
set_engine("circular") %>%
fit(f, data = df)
#> Warning in as.circular(x): an object is coerced to the class 'circular' using default value for the following components:
#> type: 'angles'
#> units: 'radians'
#> template: 'none'
#> modulo: 'asis'
#> zero: 0
#> rotation: 'counter'
#> conversion.circularyradians0counter2pi
#> Iteration 1 : Log-Likelihood = 2.478283
#> Iteration 2 : Log-Likelihood = 2.769604
#> Iteration 3 : Log-Likelihood = 3.066381
#> Iteration 4 : Log-Likelihood = 3.165462
#> Iteration 5 : Log-Likelihood = 3.205179
### COMPARE
print(m_trad)
#>
#> Call:
#> lm.circular.cl(y = ..1, x = ..2, init = ..3, verbose = TRUE, tol = 0.1)
#>
#>
#> Circular-Linear Regression
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> [1,] 1.498e-01 1.334e-02 11.228 < 2e-16 ***
#> [2,] -3.394e-02 5.280e-03 6.429 6.42e-11 ***
#> [3,] -2.772e-04 7.517e-05 3.689 0.000113 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Log-Likelihood: 114
#>
#> Summary: (mu in radians)
#> mu: 20.09 ( 0.4582 ) kappa: 539.6 ( 134.8 )
#> p-values are approximated using normal distribution
print(m_parsnip)
#> parsnip model object
#>
#> Fit time: 6ms
#>
#> Call:
#> lm.circular.cl(y = ..2, x = ..1, init = ..3, verbose = TRUE, tol = ..4)
#>
#>
#> Circular-Linear Regression
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> [1,] -1.79253 1.17066 1.531 0.0629 .
#> [2,] 1.38814 0.74510 1.863 0.0312 *
#> [3,] -0.01965 0.01238 1.587 0.0563 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Log-Likelihood: 3.205
#>
#> Summary: (mu in radians)
#> mu: 2.736 ( 0.4096 ) kappa: 0.6579 ( 0.2703 )
#> p-values are approximated using normal distribution
The specific questions and areas of explanation/exploration are:
- Why do the results differ between runs of the model?
- Does the conversion of
fit()
to the correct matrix format forlm.circular()
take away the attributes of the circulary
vector (response variable)? - If I do not know the formula that I will be using for the model until the time of fitting, how can I pass the number of covariates to the
fit()
function? Is there a way to include a function, such asinitial = ncol(x)
afterx
has been made during the fitting process?
I appreciate any help from both a statistics and parsnip
approach.