Dear Braintrust,
I'm facing a specific challenge. I've worked on a logistic regression model to predict a dependent variable y.
I've my table with my prediction probability (table$pred) and my dependent variable which is either absent y=0 or present y=1.
just a short example here:
table<- data.frame(pred=c(0.31009564, 0.63558793, 0.49152436, 0.55208678, 0.65151313, 0.61936015, 0.14106961, 0.16343966, 0.53500583, 0.12506695, 0.63486000, 0.21074987, 0.26063249, 0.53500583),
y=c(1,1,1,1,1,1,0,0,0,0,0,0,0,0))
I'm interested in making a random sample with replacement of the same dimension than the initial dataset (here n=14), but specifying the proportion of the target condition (example: I want 50% of y=1 or 3 of 14 cases with y=1). I've started to look for various packages to make internal resampling but I'm still confused on the way to specify the proportion y=1 I want to obtain.
Thank you very much for your reply.
in the mean time I used a simillar approach splitting the table in 2 tables with or without the target condition and then I used the dplyr::sample_n() function.
your solution is working perfectly too.