How create lexicons?

I am learning sentiment analysis on R by using Twitter data for almost month, but I am still lost and I have many questions don't have answers for it I hope found it here.
first I know how to get data and clean it from Punctuation m Hashtags, Mentions,.........etc.
second is sentiment analysis step, but I am working on Arabic language, so there isn't function do sentiment analysis like NRC, bing, AFINN.
so I need to build Arabic lexicon how can I do it what is the steps?
imagine there isn't any function do sentiment analysis in English, so how you will do it
I am actually writing two excel sheets, one has positive words and the other has negative words.
then I match between the tweets and sheets, for example, one tweet has 4 words positive and 7 negative that means the tweet is negative.
but I do not know is my work right?
and there is a lot of words a do not know where should I put it, like company , go, man .... etc.
in a positive file or negative or should I create other file or ignore them.

I hope you understand me and sorry for my poor English.

Hi @fatima_mb! Welcome!

You are right that the resources for doing non-English text mining are much sparser, but I think that’s been starting to change (disclaimer: I am far from an expert in this topic area!). Here are a couple of links that might be interesting/useful for you:

(The second one is obviously not about Arabic, but the answers may point you in some helpful directions)

While looking for the first link, I also noticed in passing quite a few YouTube videos about sentiment analysis of Arabic tweets, so you might want to try some web searches along the lines of “Arabic sentiment analysis” — people are definitely doing it!

In general, I’m afraid you will probably need to dig deeper into the conceptual and technical details of how this type of text mining works than if you were working in English (where more of the tool foundations have been laid).


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.