While I don't have an answer, this is pretty interesting.
First, always set the seed when running ksvm with a RBF kernel and no specific value of sigma. It uses sigest() to estimate it. I thought that this would be the issue but it is not.
Here is my detective work that makes me think that it is a kernlab issue:
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
tidymodels_prefer()
theme_set(theme_bw())
rec_svr <-
recipe(mpg ~ ., data = mtcars)
spec_svr <-
svm_rbf(mode = "regression") %>%
set_engine(engine = "kernlab", scaled = TRUE)
rec_svr2 <-
recipe(mpg ~ ., data = mtcars) %>%
step_normalize(all_numeric_predictors())
spec_svr2 <-
svm_rbf(mode = "regression") %>%
set_engine(engine = "kernlab", scaled = FALSE)
set.seed(1)
res_1 <-
workflow(rec_svr, spec_svr) %>%
fit(mtcars)
set.seed(1)
res_2 <-
workflow(rec_svr2, spec_svr2) %>%
fit(mtcars)
# Was the issue different sigma estimates?
# These should be the same now that we set the seeds:
res_1$fit$fit$fit@kernelf@kpar$sigma
#> [1] 0.07258094
res_2$fit$fit$fit@kernelf@kpar$sigma
#> [1] 0.07258094
# Are predictions equal?
all.equal(
predict(res_1, mtcars),
predict(res_2, mtcars)
)
#> [1] "Component \".pred\": Mean relative difference: 0.07870634"
# Nope!
# The data that go into the model have different rows.
res_1 %>% extract_fit_engine() %>% pluck("xmatrix") %>% dim()
#> [1] 28 10
res_2 %>% extract_fit_engine() %>% pluck("xmatrix") %>% dim()
#> [1] 32 10
# waldo::compare() shows that res_2 is missing rows 1, 2, 7, and 9 of the
# original data.
# Here are the data coming out of the recipe:
rec_svr2 %>% prep() %>% bake(new_data = NULL, all_predictors()) %>% dim()
#> [1] 32 10
# The data in the workflow objects have the same number of rows:
dim(res_1$pre$mold$predictors)
#> [1] 32 10
dim(res_2$pre$mold$predictors)
#> [1] 32 10
## At this point ¯\\_(ツ)_/¯
Created on 2021-10-28 by the reprex package (v2.0.0)
tl;dr The same data are given to ksvm and, at some point after estimating sigma, it dropped four rows and gets different parameter estimates.