Hi, there.
I have a .csv file with some Cyrillic text. When I used readr::read_delim() to upload it to the R I faced an issue with encoding - the text is shown as <c0><e1>..... When loaded to google sheets it works just fine and converts this code into Cyrillic characters.
After some googling, I found that this hex code is matching neither ASCII nor MARC-8. Furthermore, R already thinks it is UTF-8.
Interestingly, readr::read_delim() and read.csv()differently read this file.
suppressWarnings(library(tidyverse))
rates <- suppressMessages(read_delim("rates.csv", skip = 2, delim = ";"))
x <- rates$SHORTNAME[[2]]
# x should be "АСКО" (Cyrillic)
x
#> [1] "<c0><d1><ca><ce>"
Encoding(x)
#> [1] "UTF-8"
rates2 <- read.csv("rates.csv", skip = 2, sep = ";")
y <- rates2$SHORTNAME[[2]]
# y also should be "АСКО" (Cyrillic)
y
#> [1] "ÀÑÊÎ"
Encoding(y)
#> [1] "unknown"
So, my question is how to convert this text into something readable?
@HanOostdijk, thank you for the suggestion. I don't know why, but it returns data.frame with just one raw and only first column has non-NA value (which was in English in original file). Without encoding = ...it loads everything.
file1 <- file("rates.csv", encoding="windows-1251")
rates <- read.csv(file1, skip=2, sep=";")
#> Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
#> invalid input found on input connection 'rates.csv'
#> Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
#> incomplete final line found by readTableHeader on 'rates.csv'
close(file1)
#> Error in close.connection(file1): invalid connection
I found different solution. Essentially convert values with iconv() after loading as is: