The information you provided is not really sufficient for us to be able to recreate the problem. It's best to build a reprex so we can take a closer look at what's causing the issue. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:
I don't know if it might help in your case, but a quick search online brought me to this forum post:
What data type is Personal_loan_master? It should be a vector according to the createDataPartition function, so if it's something else, make sure to address that. It cold be you're providing it with another data type like a table and the function then tries to generate all possible combinations (in converting it into a vector) resulting in the error... Of course this is a guess, so please provide more info
My data is of class data.frame. This is a list of customer surveys for a personal loan offer with results 0 and 1.
I want to break this table list into 2 for preparing a training and a testing set. Am trying to use CreateDatapartition to break my dataset equal propotion of responses 1 or 0 from the customers.
Can you please help me how can go about resolving this in R?
You still did not provide with with a Reprex I could work with, but here is a general example:
library(caret)
library(dplyr)
set.seed(1) #Just to make sure the outcome of random functions is reproducible for this example
#Create some data with outcome 0 - 1 (80% 0 , 20% 1)
myData = data.frame(x = 1:100, y = runif(100),
outcome = sample(0:1, 100, replace = T, prob = c(0.8,0.2)))
#Split the data into two sets (70 - 30%) keeping the outcome distribution
dataSplit = createDataPartition(myData$outcome, p = 0.7, list = F)
#Assign traning and testing set
trainingData = myData %>% slice(dataSplit)
testingData = myData %>% slice(-dataSplit)
#Check the distribution of the outcome
sum(trainingData$outcome) / nrow(trainingData) # % 1 in training
[1] 0.1857143
sum(testingData$outcome) / nrow(testingData) # % 1 in testing
[1] 0.1666667
Take note that the first argument of the createDataPartition function takes a vector, and not a data frame (in this case myData$outcome). It ensures then that the two datasets it creates have roughly the same distribution as seen in that vector.