Problems when creating SVM Model with Linear Kernel

Am new to R, tried to create a SVM Model with Linear Kernel.

Here is the code:

library(e1071)

svm.narrow.margin <- svm(Diagnosis~., 
                 data = biomed,
                 type = "C-classification",
                 cost = 1.0,
                 kernel = "linear")

However it returns this error message:

Error in if (any(as.integer(y) != y)) stop("dependent variable has to be of factor or integer type for classification mode.") : missing value where TRUE/FALSE needed In addition: Warning message: In svm.default(x, y, scale = scale, ..., na.action = na.action) : NAs introduced by coercion

I ran the same set of codes on R Studio Cloud and it works fine which is confusing.

Halp please!

As the error message states, your Diagnosis variable's data type is something other than factor or integer.

You can check its type by running class(biomed$Diagnosis) and then convert it to a type supported by the function.

It says the class type is character on my R Studio, but is classed as factor on R Studio. Why is it different when the data set used is the same?

Hard to answer that without seeing your code. Maybe the way in which the data was read into RStudio and RStudio Cloud was different?

A possibility is because of latest R update. Are you using 4.0.0 in your local system?

This update changed the default behaviour of stringsAsFactors, so it may have an effect depending on how you are using it. Check output of default.stringsAsFactors() in both sessions.

2 Likes

Yeah that was what I was thinking as well!

Here is my code:

setwd("C:\Users\archa\Desktop\Data Analytics\Data")
biomed <- read.csv("biomed.csv", header=T, na.strings = c('','.','NA'))
View(biomed)

library(e1071)

class(biomed$Diagnosis)

svm.narrow.margin <- svm(Diagnosis~., 
                 data = biomed,
                 type = "C-classification",
                 cost = 1.0,
                 kernel = "linear")

Yes I am using the 4.0.0 on my laptop. Hmmm but I did not use the stringsAsFactors function in both cases.

So @Yarnabrina's suspicion is probably right. If you add stringsAsFactors = TRUE to the read.csv() call on your laptop your code should work fine (just as it does on RStudio Cloud).

Yes I tried the default.stringsAsFactors() function for both sessions and it comes out as TRUE for R Studio Cloud but False on my local system.

Aside from always having to set stringsAsFactors = TRUE on my read.csv() is there a more permanent way of having my strings set as factors?

You can set options(stringsAsFactors = TRUE) to make this the default for your entire session. But you should be cautious when using this setting as it reduces the reproducibility of your code.

I'd resort to this approach only if my code had many functions where this option needed to be specified. If it's just one or two read.csv() calls, it's far safer to just add the stringsAsFactors = TRUE to the function calls.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.