Hey HanOostdijk,
thank for your Help, but unfortunately that is not my solution.
My goal is that every word in the twitter dataset (Text) is replaced by their frequency rank of the whole twitter dataset (15000 tweets).
I already ranked all words by their frequency. The whole dataset has 18420 words, so every word in the text got a rank by their frequency.
Now I want that the words in the tweets are replaced by the frequency rank. So every word is replaced by the rank of the word frequency, between 1 and 18420.
For example the first tweets should look like this:
Original tweet:
Apropos #baerbockfails Von #CDU und #CSU ist die sogenannte "bürgerliche Mitte" Betrug und Trickserei gewöhnt.
Frequency rank of the words:
Apropos: 2890
baerbockfails: 2629
Von: 14
CDU: 8
und: 6
CSU: 48
ist: 13
die: 1
sogenannte: 1282
bürgerliche: 2460
Mitte: 972
Betrug: 1733
und: 6
Trickserei: 16698
gewöhnt: 11959
The transformed tweet should then look like this:
Words replaced by word frequency rank:
[2890] [2629] [14] [8] [6] [48] [13] [1] [1282] [2460] [972] [1733] [6] [16698] [11959]
I hope I could clarify my point and looking forward for every help! 