Hi there,
I'm having trouble with the recipes::recipe() function when using a wide set of input predictor features: I get an error saying "cannot allocate vector of size XX Gb."
I've worked up a reproducible example below. Any suggestions or workarounds would be greatly appreciated!
library(AmesHousing)
library(tidyverse)
library(recipes)
## make a small tibl from the ames housing package
ames <-
make_ames() %>%
select(Sale_Price, Longitude, Latitude) %>%
## make outcome be binary indicator of sale price
## being above $150,000
dplyr::mutate(Sale_Price =
factor(sign(Sale_Price>150000)) %>%
fct_inseq()
)
## make a recipe with small p/few predictors
rec <- recipe(Sale_Price ~ .,
data = ames
) # works no problem!
rec
Up to this point everything runs smoothly, but if I try to add many more columns to the Ames data I can't get the same script to run:
## add large p nxp matrix to ames
p <- 500000
set.seed(32798)
big.dat <- matrix(runif(n = nrow(ames) * p),
nrow = nrow(ames), ncol = p) %>%
as_tibble()
big.ames <- ames %>%
bind_cols(big.dat)
## make recipe with large p ames dataset
rec <- recipe(Sale_Price ~ .,
data = big.ames
) ## this never completes!
## > Error: cannot allocate vector of size 3017.5 Gb
## > Execution halted
I'm using a machine with quite a lot of RAM so feel like the command must be getting hung up somewhere unnecessarily, but I'm not sure.