We have done below part -
To download the dataset,
heart_dataframe<-read.csv(url("https://dataaspirant.com/wp-content/uploads/2017/01/heart_tidy.csv"))
After which we have task -
Capture the attribute names of the dataset while executing the algorithm.
The training dataset should contain 60 percent data from the dataset.
The testing dataset should contain the remaining data from the dataset.
for above we did
rows<-sample(nrow(heart_dataframe))
heart_dataframe<-heart_dataframe[rows, ]
split <- round(nrow(heart_dataframe) * .60)
train<-heart_dataframe[1:split,]
test<-heart_dataframe[(split+1):nrow(heart_dataframe),]
svm_train<-train(X63 ~.,data=train,method="svmLinear",
trControl=train_control,preProcess=c("center","scale"),tuneLength=10)
Ensure that both the arguments of the confusion matrix are 'factors'.
Store the result of the confusion matrix in a "cm" variable.
facing issue with factor and confusion matrix please help us, also whether above step is wrong or not. Please let us know
After completing the above steps, execute the following code to store the output in the file:
total<-cm$table[1,1]+cm$table[1,2]+cm$table[2,1]+cm$table[2,2]
writeLines(toString(total),"output.txt")
Welcome to the forum.
I have some points for you.
Firstly, when setting up your workflow to perform a random sampling to split your data, one should be concerned with reproducibility. Therefore its a common practice to set a random seed via set.seed() function to ensure reproducibility of ones results.
Secondly, it is not clear what svm pacakge you are using/wish to use.
As provided your code simply fails to run, as there is no association for train() function.
Did you use an svm library or require call , but omitted it from your post ? please include.
futhermore, train_control is passed as a parameter but this is undefined in your code also.
Basically our task is
Capture the attribute names of the dataset while executing the algorithm.
The training dataset should contain 60 percent data from the dataset.
The testing dataset should contain the remaining data from the datas
Ensure that both the arguments of the confusion matrix are 'factors'.
Store the result of the confusion matrix in a "cm" variable.
hence we tried above solution and failed to succeed further.
so ... what do you make of what I wrote to you ?
very good. so having added that into my copy of your script, the other point I raised is highlighted.
Error in train.default(x, y, weights = w, ...) :
object 'train_control' not found
train_control<-trainControl(method="repeatedcv",number=10,repeats=3)
we using also.
All step tired nothing working
ashishsmse14:
cm
you shared snippets of code that mention 'cm' but not any code that would make or extract cm from anywhere. Can you share what code you wrote after the svm_train was fit (as this seemed to 'work') fine.
cm(testing_predictions,testing_data)
This is the code
that doesnt make sense to me. There is no cm function in caret.
There is a cm function in base R but it has to do with converting from inches to cm's (hence the name) for the purposes of graphics.
I believe that caret provides a confusionMatrix() function
our task was
Store the result of the confusion matrix in a "cm" variable.
results from functions are saved into variables with <- operator
before you save something into cm, you have to get it , right ?
So what would be complete solution here for our steps
I would do this
test_pred <- predict(svm_train,newdata = test)
(avg_test_x63 <- mean(test$X63))
X63_fac_over_avg <- factor(x=test$X63>avg_test_x63)
(test_pred_fac_over_avg <-factor(x=test_pred>avg_test_x63))
(cm<-confusionMatrix(test_pred_fac_over_avg,test$X63_fac_over_avg))
How many levels do you have in those objects. I have only two. TRUE and FALSE
> heart_dataframe<-read.csv(url("https://dataaspirant.com/wp-content/uploads/2017/01/heart_tidy.csv"))
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
> set.seed(2000)
> rows<-sample(nrow(heart_dataframe))
> heart_dataframe<-heart_dataframe[rows, ]
> split <- round(nrow(heart_dataframe) * .60)
> train<-heart_dataframe[1:split,]
> test<-heart_dataframe[(split+1):nrow(heart_dataframe),]
> train_control<-trainControl(method="repeatedcv",number=10,repeats=3)
> svm_train<-train(X63 ~.,data=train,method="svmLinear",
+ trControl=train_control,preProcess=c("center","scale"),tuneLength=10)
> test_pred <- predict(svm_train,newdata = test)
> (avg_test_x63 <- mean(test$X63))
[1] 54.45833
> X63_fac_over_avg <- factor(x=test$X63>avg_test_x63)
> (test_pred_fac_over_avg <-factor(x=test_pred>avg_test_x63))
27 54 107 121 23 141 182 29 242 75 106 237 45
TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE
292 171 1 30 12 181 277 105 241 19 70 196 236
FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE
247 288 17 269 190 85 220 186 123 77 128 61 16
FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
207 147 94 202 293 193 200 146 130 92 209 138 114
FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
98 78 41 63 245 224 39 64 100 90 218 28 184
FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
257 131 93 267 140 243 197 52 7 210 187 201 113
FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE
71 124 111 221 217 162 194 143 154 118 82 55 180
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE
263 5 249 125 120 231 198 189 256 127 73 22 72
FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE
246 129 50 13 226 160 132 31 173 3 43 104 81
FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
136 135 109
TRUE FALSE TRUE
Levels: FALSE TRUE
>
ok probably test$X63_fac_over_avg doesnt even exist as it was a typo in my code.
just X63_fac_over_avg
We getting output as 120 which is wrong.. could not pass test
now we tried to split data as -
split<-createDataPartition(y=heart_dataframe$V14, p=0.7, list=FALSE)
split<-createDataPartition(y=heart_dataframe$V14, p=0.7, list=FALSE)
Error in createDataPartition(y = heart_dataframe$V14, p = 0.7, list = FALSE) :
y must have at least 2 data points
any thing wrong