Error in data splitting with R

Data Preprocessing Template

Importing the dataset

dataset = read.csv('Data.csv')

Splitting the dataset into the Training set and Test set

install.packages('caTools')
library(caTools)
set.seed(101)
sample = sample.split(dataset$DependentVariable, SplitRatio = 0.75)
training_set = subset(dataset, sample == TRUE)
test_set = subset(dataset, sample == FALSE)

Feature Scaling

training_set = scale(training_set)
test_set = scale(test_set)

I am getting an error:
test_set = subset(dataset, split == FALSE)
Fehler in split == FALSE :
Vergleich (1) ist nur für atomare und Listentypen möglich

test_set = scale(test_set)
Fehler in scale(test_set) : Objekt 'test_set' nicht gefunden

The above part of your code makes sense. Then when you quote the error, it says

test_set = subset(dataset, split == FALSE)
Fehler in split == FALSE :
Vergleich (1) ist nur für atomare und Listentypen möglich

What is split? Shouldn't that be sample? If split is not a vector, that would account for the error.

Data Preprocessing Template

Importing the dataset

dataset = read.csv('Data.csv')

Splitting the dataset into the Training set and Test set

install.packages('caTools')

library(caTools)
set.seed(123)
split = sample.split(dataset$DependentVariable, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

#Feature Scaling
training_set = scale(training_set)
test_set = scale(test_set)

I am getting the following error:

test_set = scale(test_set)
Fehler in scale(test_set) : Objekt 'test_set' nicht gefunden

Running the error message through translate, it says
Error in scale (test_set): Object 'test_set' not found

This is strange, since you seem to create test_set earlier in the code.

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

Now I am having this error:

Fehler in sample.split(dataset$DependentVariable, SplitRatio = 0.8) :
Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range

training_set = subset(dataset, split == TRUE)
Fehler in split == TRUE :
Vergleich (1) ist nur für atomare und Listentypen möglich

Please post a reproducible example as requested above by Mara. It is very difficult to debug code without data and the full actual code. Here is a reproducible example of the type of thing you are trying to do that works for me.

library(caTools)
#> Warning: package 'caTools' was built under R version 3.5.2
df <- data.frame(X = runif(100, 0, 5), 
                 DependentVar = rnorm(100))
split = sample.split(df$DependentVar, SplitRatio = 0.8)
training_set = subset(df, split == TRUE)
test_set = subset(df, split == FALSE)


training_set = scale(training_set)
test_set = scale(test_set)
head(training_set)
#>           X DependentVar
#> 1 -1.435215    0.7215806
#> 3  1.459590    0.2844358
#> 4 -1.505967    0.5092594
#> 6  0.811956    0.5417502
#> 7  1.219653    1.2789982
#> 8 -1.511057   -1.2552037

Created on 2019-05-14 by the reprex package (v0.2.1)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.