SVM implementation Issue

ashishsmse14 · July 16, 2020, 4:12am

We have done below part -

To download the dataset,

heart_dataframe<-read.csv(url("https://dataaspirant.com/wp-content/uploads/2017/01/heart_tidy.csv"))

After which we have task -

Capture the attribute names of the dataset while executing the algorithm.
The training dataset should contain 60 percent data from the dataset.
The testing dataset should contain the remaining data from the dataset.

for above we did

rows<-sample(nrow(heart_dataframe))

heart_dataframe<-heart_dataframe[rows, ]

split <- round(nrow(heart_dataframe) * .60)

train<-heart_dataframe[1:split,]

test<-heart_dataframe[(split+1):nrow(heart_dataframe),]


svm_train<-train(X63 ~.,data=train,method="svmLinear",
trControl=train_control,preProcess=c("center","scale"),tuneLength=10)

Ensure that both the arguments of the confusion matrix are 'factors'.
Store the result of the confusion matrix in a "cm" variable.

facing issue with factor and confusion matrix please help us, also whether above step is wrong or not. Please let us know

After completing the above steps, execute the following code to store the output in the file:

total<-cm$table[1,1]+cm$table[1,2]+cm$table[2,1]+cm$table[2,2]
writeLines(toString(total),"output.txt")

nirgrahamuk · July 16, 2020, 3:42pm

Welcome to the forum.
I have some points for you.

Firstly, when setting up your workflow to perform a random sampling to split your data, one should be concerned with reproducibility. Therefore its a common practice to set a random seed via set.seed() function to ensure reproducibility of ones results.

Secondly, it is not clear what svm pacakge you are using/wish to use.
As provided your code simply fails to run, as there is no association for train() function.
Did you use an svm library or require call , but omitted it from your post ? please include.
futhermore, train_control is passed as a parameter but this is undefined in your code also.

ashishsmse14 · July 16, 2020, 3:55pm

Basically our task is

Capture the attribute names of the dataset while executing the algorithm.
The training dataset should contain 60 percent data from the dataset.
The testing dataset should contain the remaining data from the datas
Ensure that both the arguments of the confusion matrix are 'factors'.
Store the result of the confusion matrix in a "cm" variable.

hence we tried above solution and failed to succeed further.

nirgrahamuk · July 16, 2020, 3:57pm

so ... what do you make of what I wrote to you ?

ashishsmse14 · July 16, 2020, 3:59pm

We are using

library(caret)

nirgrahamuk · July 16, 2020, 4:06pm

very good. so having added that into my copy of your script, the other point I raised is highlighted.

Error in train.default(x, y, weights = w, ...) : 
  object 'train_control' not found

ashishsmse14 · July 16, 2020, 4:08pm

train_control<-trainControl(method="repeatedcv",number=10,repeats=3)

we using also.

All step tired nothing working

nirgrahamuk · July 16, 2020, 4:14pm

you shared snippets of code that mention 'cm' but not any code that would make or extract cm from anywhere. Can you share what code you wrote after the svm_train was fit (as this seemed to 'work') fine.

ashishsmse14 · July 16, 2020, 4:17pm

cm(testing_predictions,testing_data)

This is the code

nirgrahamuk · July 16, 2020, 4:18pm

that doesnt make sense to me. There is no cm function in caret.
There is a cm function in base R but it has to do with converting from inches to cm's (hence the name) for the purposes of graphics.

I believe that caret provides a confusionMatrix() function

ashishsmse14 · July 16, 2020, 4:19pm

our task was

Store the result of the confusion matrix in a "cm" variable.

nirgrahamuk · July 16, 2020, 4:20pm

results from functions are saved into variables with <- operator
before you save something into cm, you have to get it , right ?

ashishsmse14 · July 16, 2020, 4:21pm

So what would be complete solution here for our steps

nirgrahamuk · July 16, 2020, 4:38pm

I would do this

test_pred <- predict(svm_train,newdata = test)

(avg_test_x63 <- mean(test$X63))
X63_fac_over_avg <- factor(x=test$X63>avg_test_x63)
(test_pred_fac_over_avg <-factor(x=test_pred>avg_test_x63))

(cm<-confusionMatrix(test_pred_fac_over_avg,test$X63_fac_over_avg))

ashishsmse14 · July 17, 2020, 2:52am

We getting error -

(cm<-confusionMatrix(test_pred_fac_over_avg,test$X63_fac_over_avg))
Error in confusionMatrix.default(test_pred_fac_over_avg, test$X63_fac_over_avg) :
the data cannot have more levels than the reference

nirgrahamuk · July 17, 2020, 6:38am

How many levels do you have in those objects. I have only two. TRUE and FALSE

ashishsmse14 · July 17, 2020, 8:32am

> heart_dataframe<-read.csv(url("https://dataaspirant.com/wp-content/uploads/2017/01/heart_tidy.csv"))
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
> set.seed(2000)
> rows<-sample(nrow(heart_dataframe))
> heart_dataframe<-heart_dataframe[rows, ]
> split <- round(nrow(heart_dataframe) * .60)
> train<-heart_dataframe[1:split,]
> test<-heart_dataframe[(split+1):nrow(heart_dataframe),]
> train_control<-trainControl(method="repeatedcv",number=10,repeats=3)
> svm_train<-train(X63 ~.,data=train,method="svmLinear",
+ trControl=train_control,preProcess=c("center","scale"),tuneLength=10)
> test_pred <- predict(svm_train,newdata = test)
> (avg_test_x63 <- mean(test$X63))
[1] 54.45833
> X63_fac_over_avg <- factor(x=test$X63>avg_test_x63)
> (test_pred_fac_over_avg <-factor(x=test_pred>avg_test_x63))
   27    54   107   121    23   141   182    29   242    75   106   237    45
 TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
  292   171     1    30    12   181   277   105   241    19    70   196   236
FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE
  247   288    17   269   190    85   220   186   123    77   128    61    16
FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE
  207   147    94   202   293   193   200   146   130    92   209   138   114
FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
   98    78    41    63   245   224    39    64   100    90   218    28   184
FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE
  257   131    93   267   140   243   197    52     7   210   187   201   113
FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE
   71   124   111   221   217   162   194   143   154   118    82    55   180
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
  263     5   249   125   120   231   198   189   256   127    73    22    72
FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE
  246   129    50    13   226   160   132    31   173     3    43   104    81
FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
  136   135   109
 TRUE FALSE  TRUE
Levels: FALSE TRUE
>

nirgrahamuk · July 17, 2020, 8:54am

ok probably test$X63_fac_over_avg doesnt even exist as it was a typo in my code.
just X63_fac_over_avg

ashishsmse14 · July 23, 2020, 7:12am

We getting output as 120 which is wrong.. could not pass test

ashishsmse14 · July 23, 2020, 7:16am

now we tried to split data as -

split<-createDataPartition(y=heart_dataframe$V14, p=0.7, list=FALSE)

split<-createDataPartition(y=heart_dataframe$V14, p=0.7, list=FALSE)
Error in createDataPartition(y = heart_dataframe$V14, p = 0.7, list = FALSE) :
y must have at least 2 data points

any thing wrong