I'm new in R and tryining to predict the S&P500 stock price based on financial news with the help of support vector machines (svm). I have 2 datasets. One is the stock market data and the other the cleaned financial news corpus data. I converted the corpus into a Document Term Matrix and also applied sentiment analysis on it (once with SentimentAnalysis Package and once with tidytext package). And now I'm desperate to get this model running. I've found different approaches on how to use svm to predict the stock price, but nowhere with financial news. Or how can I combine the two data sets to create the model? My current code and actual situation is this:
docs <- Corpus(DirSource(directory = "D:/Financial_News_Prediction/Edgar filings_full text/Form 8-K", recursive = TRUE)) # Cleaning steps are not shown here # Creating DTM dtm <- DocumentTermMatrix(docs) dtm <- removeSparseTerms(dtm, 0.99) dtm <- as.matrix(dtm) # Sentiment analysis DTM dtm.sent <- analyzeSentiment(dtm) # Creating DTM Tidy Format dtm.tidy <- DocumentTermMatrix(docs) dtm.tidy <- tidy(dtm.tidy) # Sentiment analysis Tidy DTM sent.afinn <- dtm.tidy %>% inner_join(get_sentiments("afinn"), by = c(term = "word")) sent.bing <- dtm.tidy %>% inner_join(get_sentiments("bing"), by = c(term = "word")) sent.nrc <- dtm.tidy %>% inner_join(get_sentiments("nrc"), by = c(term = "word")) # Dats Split id_dtm <- sample(nrow(dtm),nrow(dtm)*0.70) dtm.train = dtm[id_dtm,] dtm.test = dtm[-id_dtm,] id_sp500 <- sample(nrow(SP500.Data),nrow(SP500.Data)*0.70) sp500.train = SP500.Data[id_sp500,] sp500.test = SP500.Data[-id_sp500,]
That is my status quo. Now I would like to run the svm model based on my two dataset described above. But I think I need to do some classification before. I have seen they worked with (-1 / +1) or something like that. My sentiment analysis provided me terms into positive and negative classes. But I just don't know how to put both sets together to build the model. I would be very happy if somebody could help me please! Thanks so much in advance!