stratification should not be performed on continuous variables, its undefined in that context.
You could choose a cutoff(s) to discretise Sale Price into strata, and then attempt to achieve a balanced sample with respect to that. for example
library(tidyverse)
library(tidymodels)
data("ames")
ames2 <- mutate(ames,sp_gt_160000=Sale_Price>160000)
# no strata
set.seed(123)
ames_split1 <- initial_split(ames2, prob = 0.80)
ames_train1<- training(ames_split1) %>% mutate(id = "train")
ames_test1 <- testing(ames_split1) %>% mutate(id = "test")
bind_df_no_strata <- bind_rows(ames_train1,ames_test1)
# set strata
set.seed(123)
ames_split2 <- initial_split(ames2, prob = 0.80, strata = sp_gt_160000 )
ames_train2 <- training(ames_split2) %>% mutate(id = "train")
ames_test2 <- testing(ames_split2) %>% mutate(id = "test")
bind_df_strata <- bind_rows(ames_train2,ames_test2)
table(ames2$sp_gt_160000)
table(bind_df_no_strata$id,bind_df_no_strata$sp_gt_160000)
table(bind_df_strata$id,bind_df_strata$sp_gt_160000)
#> table(ames2$sp_gt_160000)
FALSE TRUE
1467 1463
> table(bind_df_no_strata$id,bind_df_no_strata$sp_gt_160000)
FALSE TRUE
test 349 384
train 1118 1079
> table(bind_df_strata$id,bind_df_strata$sp_gt_160000)
FALSE TRUE
test 367 366
train 1100 1097
see how the source data set
has a close to even balance of TRUE/FALSE on our variable sp_gt_160000; 1467 - 1463
the no strata splits vary a lot; 349-384
compared to the stratified splits; 367-366