How to get predictions with multiple outcomes with knn model using tidymodels?

I am working to get more familiar with {tidymodels} and have run into an issue with a knn model using {kknn}. I posted this on StackOverflow yesterday, but it's not getting any interest over there (<20 views).

I would like to use knn regression to predict multiple outcomes, but I get a single output variable. Here's a reprex:

library(tidyverse)
library(tidymodels)

y1 <- rnorm(100, 5)
y2 <- rnorm(100, 6)
y3 <- rnorm(100, 7)

x1 <- rnorm(100, 8)
x2 <- rnorm(100, 12)
x3 <- rnorm(100, 6)

dat <- tibble(y1, y2, y3, x1, x2, x3)

data_split <- initial_split(dat)

train_data <- training(data_split)

form <- formula(y1 + y2 + y3 ~ x1 + x2 + x3)

rec <- recipe(form, data = dat) %>% 
  step_normalize(all_predictors()) %>% 
  prep()

model <-
  nearest_neighbor() %>%
  set_engine("kknn") %>%
  set_mode("regression") %>%
  set_args(dist_power = 2, neighbors = 5)


wflow <- workflow() %>% 
  add_model(model) %>% 
  add_recipe(rec)

fit1 <- fit(wflow, train_data)

predict(fit1, new_data = testing(data_split))

# Version info

── Attaching packages ──────────────────────────────────────────────── tidymodels 0.1.1 ──
✓ broom     0.7.2      ✓ recipes   0.1.14
✓ dials     0.0.9      ✓ rsample   0.0.8 
✓ infer     0.5.3      ✓ tune      0.1.1 
✓ modeldata 0.1.0      ✓ workflows 0.2.1 
✓ parsnip   0.1.4      ✓ yardstick 0.0.7 

I was attempting to predict y1, y2, y3 rather than a single output. Any suggestions?

Hello @jameshwade,

I suspect you're not getting any reaction on that side because you're fundamentally not adhering to how KNN works. KNN will typically only work with one dependent variable as outcome (if you create this as a multiclass factor, continous variable etc that is up to you and your specific problem at hand, i.e. you don't only have to predict one class at a time but you can't have multiple columns or variables acting as your dependent variables).

1 Like

Hmm... thanks for the feedback.

I am working with a coworker on a shiny deployment of their python model that uses a knn for a model similar to the reprex. I will dig into that documentation to see what might be going on under the hood.

1 Like

It depends on the underlying model. parsnip supports multivariate models but tune and other packages do not (yet, at least).

Here's an example:

library(tidyverse)
library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0           ✓ recipes   0.1.14.9000
#> ✓ dials     0.0.9           ✓ rsample   0.0.8.9000 
#> ✓ infer     0.5.2           ✓ tune      0.1.1.9001 
#> ✓ modeldata 0.1.0           ✓ workflows 0.2.1      
#> ✓ parsnip   0.1.3.9000      ✓ yardstick 0.0.7
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> x scales::discard() masks purrr::discard()
#> x dplyr::filter()   masks stats::filter()
#> x recipes::fixed()  masks stringr::fixed()
#> x dplyr::lag()      masks stats::lag()
#> x yardstick::spec() masks readr::spec()
#> x recipes::step()   masks stats::step()

y1 <- rnorm(100, 5)
y2 <- rnorm(100, 6)
y3 <- rnorm(100, 7)

x1 <- rnorm(100, 8)
x2 <- rnorm(100, 12)
x3 <- rnorm(100, 6)

dat <- tibble(y1, y2, y3, x1, x2, x3)

data_split <- initial_split(dat)

train_data <- training(data_split)

form <- formula(y1 + y2 + y3 ~ x1 + x2 + x3)

rec <- recipe(form, data = dat) %>% 
   step_normalize(all_predictors()) %>% 
   prep()

model <-
   linear_reg() %>%
   set_engine("lm") %>%
   set_mode("regression") 


wflow <- workflow() %>% 
   add_model(model) %>% 
   add_recipe(rec)

fit1 <- fit(wflow, train_data)

predict(fit1, new_data = testing(data_split))
#> # A tibble: 25 x 3
#>    .pred_y1 .pred_y2 .pred_y3
#>       <dbl>    <dbl>    <dbl>
#>  1     4.95     6.02     6.88
#>  2     5.02     5.87     6.87
#>  3     4.55     5.78     7.29
#>  4     5.04     6.23     7.13
#>  5     5.30     5.94     6.89
#>  6     5.15     6.11     6.76
#>  7     5.32     5.87     7.21
#>  8     4.89     5.91     7.22
#>  9     4.75     6.26     6.83
#> 10     4.90     6.03     6.70
#> # … with 15 more rows

Created on 2020-10-30 by the reprex package (v0.3.0)

2 Likes

Thanks, Max. I'll need to dig into the underlying model to see why it only provides one predictor. In case future readers are curious, this is the package I was attempting to copy this functionality KNeighborsRegressor.

I don't see anything that would indicate that the model predictions are changed by the different outcome columns. In other words, this would probably be equivalent to doing separate models for each outcome.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.