Error in data splitting with R

trishita · May 13, 2019, 3:29pm

Data Preprocessing Template

Importing the dataset

dataset = read.csv('Data.csv')

Splitting the dataset into the Training set and Test set

install.packages('caTools')
library(caTools)
set.seed(101)
sample = sample.split(dataset$DependentVariable, SplitRatio = 0.75)
training_set = subset(dataset, sample == TRUE)
test_set = subset(dataset, sample == FALSE)

Feature Scaling

training_set = scale(training_set)
test_set = scale(test_set)

I am getting an error:
test_set = subset(dataset, split == FALSE)
Fehler in split == FALSE :
Vergleich (1) ist nur für atomare und Listentypen möglich

test_set = scale(test_set)
Fehler in scale(test_set) : Objekt 'test_set' nicht gefunden

FJCC · May 13, 2019, 5:00pm

The above part of your code makes sense. Then when you quote the error, it says

test_set = subset(dataset, split == FALSE)
Fehler in split == FALSE :
Vergleich (1) ist nur für atomare und Listentypen möglich

What is split? Shouldn't that be sample? If split is not a vector, that would account for the error.

trishita · May 14, 2019, 10:12am

Data Preprocessing Template

Importing the dataset

dataset = read.csv('Data.csv')

Splitting the dataset into the Training set and Test set

install.packages('caTools')

library(caTools)
set.seed(123)
split = sample.split(dataset$DependentVariable, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

#Feature Scaling
training_set = scale(training_set)
test_set = scale(test_set)

I am getting the following error:

test_set = scale(test_set)
Fehler in scale(test_set) : Objekt 'test_set' nicht gefunden

mara · May 14, 2019, 10:15am

Running the error message through translate, it says
Error in scale (test_set): Object 'test_set' not found

This is strange, since you seem to create test_set earlier in the code.

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

trishita · May 14, 2019, 10:23am

Now I am having this error:

Fehler in sample.split(dataset$DependentVariable, SplitRatio = 0.8) :
Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range

training_set = subset(dataset, split == TRUE)
Fehler in split == TRUE :
Vergleich (1) ist nur für atomare und Listentypen möglich

FJCC · May 14, 2019, 4:49pm

Please post a reproducible example as requested above by Mara. It is very difficult to debug code without data and the full actual code. Here is a reproducible example of the type of thing you are trying to do that works for me.

library(caTools)
#> Warning: package 'caTools' was built under R version 3.5.2
df <- data.frame(X = runif(100, 0, 5), 
                 DependentVar = rnorm(100))
split = sample.split(df$DependentVar, SplitRatio = 0.8)
training_set = subset(df, split == TRUE)
test_set = subset(df, split == FALSE)


training_set = scale(training_set)
test_set = scale(test_set)
head(training_set)
#>           X DependentVar
#> 1 -1.435215    0.7215806
#> 3  1.459590    0.2844358
#> 4 -1.505967    0.5092594
#> 6  0.811956    0.5417502
#> 7  1.219653    1.2789982
#> 8 -1.511057   -1.2552037

^{Created on 2019-05-14 by the reprex package (v0.2.1)}

system · June 4, 2019, 4:49pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.