we are currently working on a LSTM, to classify tweets regarding the upcoming election in Germany.
Therefore, we have created an own dataset using TwitterR. Now we are facing the challenge of preparing our dataset for the further processing.
We need a dataset to show the words within the tweets as numeric values regarding their frequency.
And we also need the tweets classified as positive or negative (1,0).
Both should be separate columns in the data frame and the structure should be comparable to the often used IMDB data frame like in this YouTube video (LSTM Networks with R | Movie Review Sentiment Classification - YouTube).
Can anyone give recommendations for comparable projects, functions or codes to use or give in general tips for the further processing?
Thank you for your advice!