traying to convert Hindu/Arabic numbers to real numeric variables


#1

I am traying to convert Hindu/Arabic numbers to real numeric variables using "r" language.

See the example

a="10"
as.numeric(a)
[1] 10
aarabic="١٠"
as.numeric(aarabic)
[1] NA
Warning message:
NAs introduced by coercion

I know that R does not recognaize number 10 in arabic; written like " ١٠" , so it gave NA .

The reason to come to that issue is that I am trying to use strptime for dates that written in Arabic

١٠‏/٩‏/٢٠١٨، ١:٢٢:٤٩ ص


#2

Nice question. I'm not familiar with Arabic dates and I'm not sure whether Chrome is displaying the characters correctly. One possible method would be to convert the characters a list of to integers and then parse each element:

arabic2ints <- function(file) {
  x <- readr::read_lines(file) # for UTF-8
  lapply(x, function(xi) {
    ints <- utf8ToInt(xi)
    ints[ints >= 1632L & ints <= 1642L] - 1632L
  })
}
arabic2ints("~/sandbox/aarabic.txt")
[[1]]
[1] 1 0

[[2]]
 [1] 1 0 9 2 0 1 8 1 2 2 4 9

You can see 10 is easily deduced but I seem to have mangled the date. (The file aarabic.txt contains your aarabic object and the last line of your post as copied and pasted.)

If you like, you can provide a link to a .txt file containing the dates in the form you need to convert, rather than relying on copying and pasting from a webpage. Also how are dates written in Arabic? (For example, is it mm-dd-yyyy but with Arabic digits or is there anything else that's different?)


#3

I'm having fun learning how to write dates in Arabic, but @hughparsonage is right: the format is just as important as the numbers.

I suggest creating a function that uses regular expressions or strsplit() to break the dates into pieces, translate the pieces to be Hindu-Arabic numerals, recombine them into a date string, and then parse that string.

I've tried regex splitting, but have no idea if I'm doing it right, since I'm not familiar with Arabic numerals:

library(stringi)
library(magrittr)

"١٠/٩/٢٠١٨، ١:٢٢:٤٩ ص" %>%
  stri_match_first_regex("(.*?)/(.*?)/(.*?)\\s+(.*?)\\:(.*?)\\:(.*?)") %>%
  cat(sep = "\n")
# ١٠/٩/٢٠١٨، ١:٢٢:
# ١٠
# ٩
# ٢٠١٨،
# ١
# ٢٢