Reading in Data that is in a Foreign Language (Armenian)

Hello! Does anyone have any experience using data that comes in foreign languages? I have some data in an excel file that has columns with numbers and also separate columns with Armenian words, but when I read it into R studio as a CSV from Excel, the Armenian characters become strange symbols like this " �_______� _______"

Any advice on how to get R to read in the data and display the correct characters would be very helpful! Thank you in advance.

Very likely your data contains non ASCII characters so you have to specify the encoding of the file, could you provide sample data so we can give you more specific advice?

I had issues similar to yours when reading in Czech characters; I have found two ways to resolve it:

  • apply stringi::stri_encode(str, from, to) to the text once in R; it is a good practice to have the "to" encoding set to UTF-8 (if applicable; I am not familiar with Armenian)
  • use the Locale / Encoding buttons when doing the import manually from RStudio; you will need to pick the appropriate encoding for your language.
    You may find you need to experiment a little with the choices in drop down menu.

Thank you for the help! I ended up changing the way I encode data and manually entering it rather than using stri_encode. Also, I had converted my file from an excel one to a CSV earlier, and I tried again except using the excel file this time and it worked (though I don't think that was the problem)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.