I'm trying to do sentiment analysis on this dataset, containing 1.1 million newspaper headlines: https://www.kaggle.com/therohk/million-headlines
My machine has now been at it for 24 hours. The first attempt ended after a few minutes with RStudio aborting with a terminal error. I figured I ran out of RAM (there is 16 GB), so I repurposed an 80GB SSD as a swap disk. After some time, RStudio then gave an error about there not being enough space for a 617.5 GB vector. I am now using a 2 TB HDD as swap, but I am afraid it has stalled. It has been sitting at 1.2 TB swap for around ten hours, and there is barely any CPU or HDD activity, but I can hear the HDD working.
I know it is theoretically impossible to know how long it takes to compute something without actually computing it, but has anyone else tried something similar?
I have found out that I actually only need sentiment analysis on 434.000 headlines, but I am afraid of aborting the current job in case it is aaalmost finished.
Also, there is no way to speed things up, right? It seemed to only use one CPU core. There is no GPU acceleration either, right?
news <- read.csv("abcnews.csv") news$sentiment <- analyzeSentiment(news, rules = list("SentimentLM"=list(ruleSentiment, loadDictionaryLM())))