How to use pridect function with text gategories


#1

hello sir,

I have been having trouble with the predict function underestimating (or overestimating) the predictions for new text category (or it's class if thay sport...health....politcs)

firstly i import a tdm matrix of my corpus than

split my data training / test to use in my modele with knn algorithem

and it's works fine

now i need to import new text unknown gategory to pridect it but i did not how to do that

i don't how to use pridect function

library(tm) 
    # KNN model 
    library(class) 
    # Stemming words 
    library(SnowballC) 
    # CrossTable 
    library('gmodels') 

    # Read csv with columns: Document , Terms and category 
    PathFile <- read.csv(file.choose(), sep =";", header = TRUE) 
    PathFilenameUnk<-read.csv(file.choose(), sep =",", header = TRUE) 
    #Strectur of Csv file 
    str(PathFile) 

    # Split data by rownumber into two equal portions 
    train <- sample(nrow(PathFile), ceiling(nrow(PathFile) * .70)) 
    test <- (1:nrow(PathFile))[- train] 

    ##Show Training Data 
    train 
    ##Show Test Data 
    test 

    # Isolate classifier 
    cl <- PathFile[, "Category"] 

    # Create model data and remove "category" 
    modeldata <- PathFile[,!colnames(PathFile) %in% "Category"] 

    # Create model: training set, test set, training set classifier 
    knn.pred <- knn(modeldata[train, ], modeldata[test, ], cl[train], 70) 
    knn.pred 
    # Confusion matrix 
    conf.mat <- table("Predictions" = knn.pred, Actual = cl[test]) 
    conf.mat 

   CrossTable(x = cl[test], y = knn.pred, prop.chisq=FALSE) 

    predict(knn.pred,PathFilenameUnk) ### error here!!!! 

    # Accuracy 
    (accuracy <- sum(diag(conf.mat))/length(test) * 100) 

    # Create data frame with test data and predicted category 
    df.pred <- cbind(knn.pred, modeldata[test, ]) 
    write.table(df.pred, file="output.csv", sep=";")

and here is my csv file:

i know i had de the same step for unknown text and import theme as dtm matrix

but i some thing wrong !!!

thanks an advence

note:
TDM_2018_05_09_225323.csv this orignal file i use with this script

Predict_TDM_2018_05_09_225025.csv this file is what i need to pridect how to use it with pridect function

here is my file


#2

any help ....some one look at this post !!!