Recipes: Excluding dummy variables from centering

Hi,

Does anyone know of a good way to exclude certain calculated variables from later steps in recipes? My specific use case is that I create a dummy variable out of a character variable, and then I want to center all numeric variables. However, I don't want to center the dummy variable. Look at the example below:

library(dplyr, warn.conflicts = FALSE)
library(recipes, warn.conflicts = FALSE)
library(nycflights13)

small_df <- nycflights13::flights %>%
  select(dep_delay, arr_delay, air_time, origin)

head(small_df)
#> # A tibble: 6 x 4
#>   dep_delay arr_delay air_time origin
#>       <dbl>     <dbl>    <dbl> <chr> 
#> 1         2        11      227 EWR   
#> 2         4        20      227 LGA   
#> 3         2        33      160 JFK   
#> 4        -1       -18      183 JFK   
#> 5        -6       -25      116 LGA   
#> 6        -4        12      150 EWR

rec <- recipe(air_time ~ ., data = small_df)

rec2 <- rec %>%
  step_dummy(origin) %>%
  step_center(all_predictors())

prepped_small <- prep(rec2, small_df) %>% juice()

head(prepped_small)
#> # A tibble: 6 x 5
#>   dep_delay arr_delay air_time origin_JFK origin_LGA
#>       <dbl>     <dbl>    <dbl>      <dbl>      <dbl>
#> 1    -10.6       4.10      227     -0.330     -0.311
#> 2     -8.64     13.1       227     -0.330      0.689
#> 3    -10.6      26.1       160      0.670     -0.311
#> 4    -13.6     -24.9       183      0.670     -0.311
#> 5    -18.6     -31.9       116     -0.330      0.689
#> 6    -16.6       5.10      150     -0.330     -0.311

Created on 2019-10-24 by the reprex package (v0.3.0)

origin_JFK should have values 0 and 1, not -0.33 and 0.67.

Is there a direct way to do it in recipes?

Thanks,

Hi @AJFm

Yes, you can do this in recipes. One way to do it would be to flip the order of your steps, and only center the numeric data, excluding the outcome (you may want to include the outcome in the centering).

rec2 <- rec %>%
  step_center(all_numeric(), -all_outcomes()) %>% 
  step_dummy(origin)
  

prepped_small <- prep(rec2, small_df) %>% juice()

head(prepped_small)
# A tibble: 6 x 5
  dep_delay arr_delay air_time origin_JFK origin_LGA
      <dbl>     <dbl>    <dbl>      <dbl>      <dbl>
1    -10.6       4.10      227          0          0
2     -8.64     13.1       227          0          1
3    -10.6      26.1       160          1          0
4    -13.6     -24.9       183          1          0
5    -18.6     -31.9       116          0          1
6    -16.6       5.10      150          0          0
1 Like

You can also get rid of the dummy variables if you have to do it after normalization:

library(dplyr, warn.conflicts = FALSE)
library(recipes, warn.conflicts = FALSE)
library(nycflights13)

small_df <- nycflights13::flights %>%
  select(dep_delay, arr_delay, air_time, origin)

head(small_df)
#> # A tibble: 6 x 4
#>   dep_delay arr_delay air_time origin
#>       <dbl>     <dbl>    <dbl> <chr> 
#> 1         2        11      227 EWR   
#> 2         4        20      227 LGA   
#> 3         2        33      160 JFK   
#> 4        -1       -18      183 JFK   
#> 5        -6       -25      116 LGA   
#> 6        -4        12      150 EWR

rec <- recipe(air_time ~ ., data = small_df)

rec2 <- rec %>%
  step_dummy(origin) %>%
  step_center(all_predictors(), -starts_with("origin"))

prepped_small <- prep(rec2, small_df) %>% juice()

head(prepped_small)
#> # A tibble: 6 x 5
#>   dep_delay arr_delay air_time origin_JFK origin_LGA
#>       <dbl>     <dbl>    <dbl>      <dbl>      <dbl>
#> 1    -10.6       4.10      227          0          0
#> 2     -8.64     13.1       227          0          1
#> 3    -10.6      26.1       160          1          0
#> 4    -13.6     -24.9       183          1          0
#> 5    -18.6     -31.9       116          0          1
#> 6    -16.6       5.10      150          0          0

Created on 2019-10-24 by the reprex package (v0.3.0)

4 Likes

Thanks @Max and @mattwarkentin! I appreciate your help! I had gotten so caught up in trying to use the role = argument in step_dummy() that I lost sight of simpler methods :smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.