Error while executing CreateDataPartition function

Am trying to run this command:

trainIndex <- createDataPartition(Personal_loan_Master, p = .6,list = TRUE,times = 1)

Personal_loan_master is the data object.

Am getting below error while running this command: Error in table(y) : attempt to make a table with >= 2^31 elements

What could be the reason and any resolution please?

Hello,

The information you provided is not really sufficient for us to be able to recreate the problem. It's best to build a reprex so we can take a closer look at what's causing the issue. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

I don't know if it might help in your case, but a quick search online brought me to this forum post:

What data type is Personal_loan_master? It should be a vector according to the createDataPartition function, so if it's something else, make sure to address that. It cold be you're providing it with another data type like a table and the function then tries to generate all possible combinations (in converting it into a vector) resulting in the error... Of course this is a guess, so please provide more info :slight_smile:

Hope this helps,
PJ

Thanks PJ :slight_smile:

My data is of class data.frame. This is a list of customer surveys for a personal loan offer with results 0 and 1.

I want to break this table list into 2 for preparing a training and a testing set. Am trying to use CreateDatapartition to break my dataset equal propotion of responses 1 or 0 from the customers.

Can you please help me how can go about resolving this in R?

Hi,

You still did not provide with with a Reprex I could work with, but here is a general example:

library(caret)
library(dplyr)

set.seed(1) #Just to make sure the outcome of random functions is reproducible for this example 

#Create some data with outcome 0 - 1 (80% 0 , 20% 1)
myData = data.frame(x = 1:100, y = runif(100),
                    outcome = sample(0:1, 100, replace = T, prob = c(0.8,0.2)))

#Split the data into two sets (70 - 30%) keeping the outcome distribution
dataSplit = createDataPartition(myData$outcome, p = 0.7, list = F)

#Assign traning and testing set
trainingData = myData %>% slice(dataSplit)
testingData = myData %>% slice(-dataSplit)

#Check the distribution of the outcome
sum(trainingData$outcome) / nrow(trainingData) # % 1 in training
[1] 0.1857143
sum(testingData$outcome) / nrow(testingData) # % 1 in testing
[1] 0.1666667

Take note that the first argument of the createDataPartition function takes a vector, and not a data frame (in this case myData$outcome). It ensures then that the two datasets it creates have roughly the same distribution as seen in that vector.

Hope this helps,
PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.