Internal resampling specifying a different proportion of the dichotomous dependent variable than in the original dataset.

Dear Braintrust,
I'm facing a specific challenge. I've worked on a logistic regression model to predict a dependent variable y.
I've my table with my prediction probability (table$pred) and my dependent variable which is either absent y=0 or present y=1.
just a short example here:
table<- data.frame(pred=c(0.31009564, 0.63558793, 0.49152436, 0.55208678, 0.65151313, 0.61936015, 0.14106961, 0.16343966, 0.53500583, 0.12506695, 0.63486000, 0.21074987, 0.26063249, 0.53500583),
y=c(1,1,1,1,1,1,0,0,0,0,0,0,0,0))

I'm interested in making a random sample with replacement of the same dimension than the initial dataset (here n=14), but specifying the proportion of the target condition (example: I want 50% of y=1 or 3 of 14 cases with y=1). I've started to look for various packages to make internal resampling but I'm still confused on the way to specify the proportion y=1 I want to obtain.

table<- data.frame(pred=c(0.31009564, 0.63558793, 0.49152436, 0.55208678, 0.65151313, 0.61936015, 0.14106961, 0.16343966, 0.53500583, 0.12506695, 0.63486000, 0.21074987, 0.26063249, 0.53500583),
                   y=c(1,1,1,1,1,1,0,0,0,0,0,0,0,0))

(y_rows1 <- which(table$y == 1))
(y_rows0 <- which(table$y == 0))

#if you want 14 rows ;  7 of which are y==1 and 7 of which are y==0 
set.seed(42)
(y_rows_1_s <- sample(y_rows1 ,size=7,replace=TRUE))
(y_rows_0_s <- sample(y_rows0 ,size=7,replace=TRUE))

(table_s <- table[c(y_rows_1_s,y_rows_0_s),
                  ])

#if you want 14 rows 3 of which are y==1 and 11 of which are y==0 
set.seed(42)
(y_rows_1_s <- sample(y_rows1 ,size=3,replace=TRUE))
(y_rows_0_s <- sample(y_rows0 ,size=11,replace=TRUE))

(table_s <- table[c(y_rows_1_s,y_rows_0_s),])

Thank you very much for your reply.
in the mean time I used a simillar approach splitting the table in 2 tables with or without the target condition and then I used the dplyr::sample_n() function.
your solution is working perfectly too.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.