Arabic Language in my CSV Goes Crazy

I am trying to bring data to analyze from a csv
i used the following
AOSD <- read.csv("AO Branch Sales.csv")

Some Columns in this file are Arabic, When it to try to view the AOSD it gives jumbled characters. as in the image.

what should i do?

i saved the csv as UTF-8 encoding
NO LUCK

i Changed Global Preferences as follows:
Tools>Global Options>Code> Saving > Default text encoding to WINDOWS-1252
NO LUCK

Can anyone tell me what to do?

Have you tried setting the encoding while reading the file? Anyways, this is definitely an encoding issue so in order to try possible solutions we are going to need a sample file, Can you provide a link to a sample file?

Thank you for you reply.

How to setup encoding while reading the file?

Sample file attached.

regards

This way

sample_file <- read.csv("https://www.dropbox.com/s/dtm17dr1mbtbhzl/AO%20Branch%20Sales.csv?dl=1",
                        fileEncoding = "utf8", row.names = NULL)
head(sample_file)
#>              State      City           Region             Store.Name Year
#> 1  المنطقة الغربية     مكة 1          الحرم 1      413-أبراج البيت 5 2019
#> 2  المنطقة الغربية     مكة 1          الحرم 1      413-أبراج البيت 5 2018
#> 3  المنطقة الغربية     مكة 1          الحرم 1      413-أبراج البيت 5 2016
#> 4  المنطقة الغربية     مكة 1          الحرم 1      413-أبراج البيت 5 2017
#> 5  المنطقة الغربية     مكة 1          الحرم 1       394-أبراج البيت2 2019
#> 6 المنطقة الشمالية المدينه 1 المدينة المنورة3 844فندق الانصار الذهبي 2018
#>   Quantity Total.Sales
#> 1    36513 6950064 SAR
#> 2    36121 6681969 SAR
#> 3    46632 6536810 SAR
#> 4    39711 6024433 SAR
#> 5    22454 4375455 SAR
#> 6    30185 3885951 SAR

Created on 2020-01-20 by the reprex package (v0.3.0.9000)

1 Like

Thank you for your support.

i did as you mentioned

AOSD <- read.csv("AOBranchSales.csv", fileEncoding = "utf8", row.names = NULL)

But i get the following Message:

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
invalid input found on input connection 'AOBranchSales.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'AOBranchSales.csv'

please note that iam not using the dropbox link, i have my project folder and there resides my csv.
file name is AOBranchSales.csv

In case i use the code you mentioned as is, i have the same warning message:
Warning messages:

1: In read.table(file = file, header = header, sep = sep, quote = quote, :
invalid input found on input connection 'https://www.dropbox.com/s/dtm17dr1mbtbhzl/AO%20Branch%20Sales.csv?dl=1'
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'https://www.dropbox.com/s/dtm17dr1mbtbhzl/AO%20Branch%20Sales.csv?dl=1'

How about this? It worked for me when the other one didn't.

library(tidyverse)
sample_file <- read_csv(file="https://www.dropbox.com/s/dtm17dr1mbtbhzl/AO%20Branch%20Sales.csv?dl=1", locale = locale(encoding = "UTF-8"))
head(sample_file)

> head(sample_file)
# A tibble: 6 x 7
  State            City      Region           `Store Name`            Year Quantity `Total-Sales`
  <chr>            <chr>     <chr>            <chr>                  <dbl>    <dbl> <chr>        
1 المنطقة الغربية  مكة 1     الحرم 1          413-أبراج البيت 5       2019    36513 6950064 SAR  
2 المنطقة الغربية  مكة 1     الحرم 1          413-أبراج البيت 5       2018    36121 6681969 SAR  
3 المنطقة الغربية  مكة 1     الحرم 1          413-أبراج البيت 5       2016    46632 6536810 SAR  
4 المنطقة الغربية  مكة 1     الحرم 1          413-أبراج البيت 5       2017    39711 6024433 SAR  
5 المنطقة الغربية  مكة 1     الحرم 1          394-أبراج البيت2        2019    22454 4375455 SAR  
6 المنطقة الشمالية المدينه 1 المدينة المنورة3 844فندق الانصار الذهبي  2018    30185 3885951 SAR  

Maybe the column names are a bit out though.

2 Likes

Thanks William,
Its displaying the required text with tidyverse

@andresrcs can you enlighten why the first one is not working, i mean your solution. This will help me in my learning process.

sample_file <- read.csv("xxx" fileEncoding = "utf8", row.names = NULL)

My solution works for me (as you can see in the reprex) because I test it on a Linux system, I didn't notice you were on Windows, encoding problems are very system dependent.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.