What is the state of UTF-8 support on Windows 10 with R 4.0?

There was an R blog post announcing UTF-8 support on Windows 10, starting with R 4.0.

It says:

In the experimental build of R, UTF-8 is the native encoding, so RGui will not use any \u , \U escapes when sending text to R and R will not embed any UTF-8 strings, because the native encoding is already UTF-8.

Since I just stumbled on one more UTF-8 related problem, I decided to upgrade to R 4.0.2 and the new toolchain hoping the problems will go away.

However, after installing, the default locale on my system (Windows 10) is:

Sys.getlocale()
# [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

And attempts to set the locale either in session or from .Rprofile via calls like

Sys.setlocale(category = "LC_CTYPE", locale = "en_US.UTF8")
Sys.setlocale(category = "LC_CTYPE", locale = "English_United States.utf8")
Sys.setlocale(category = "LC_COLLATE", locale = "English_United States.utf8")
Sys.setlocale(category = "LC_COLLATE", locale = "en_US.UTF8")

result in

Warning message:
In Sys.setlocale(category = "LC_CTYPE", locale = "en_US.UTF8") :
  OS reports request to set locale to "en_US.UTF8" cannot be honored

The problem arises in both RStudio and RGUI.

Maybe I was searching in bad places, but aside from the blog, the only other reference to UTF-8 on Windows and 4.0 I found is this Stack Overflow question: utf 8 - UTF-8 support in R on Windows - Stack Overflow where a user has the same problem as me and the only suggestion is to use specific locale categories (i.e. LC_CTYPE) instead of LC_ALL, which unfortunately makes no difference for me. All the other resources I could find refer to older R versions.

Did I misunderstood the blog post and the support for native UTF-8 is yet to come in a future version? Or am I missing some step needed to make UTF-8 work for me?

Thanks for any hints.

2 Likes

I think what you found on the R blog is just an experimental project for now. There is a new post about it that is from the 30/07/2020
https://developer.r-project.org/Blog/public/2020/07/30/windows/utf-8-build-of-r-and-cran-packages/index.html

I think it is not included in R 4.0.2 for now and that it is for future version. (but you can try the experimental binaries at you own risk :wink:)

About your issue, you can open another question in this forum to explain and maybe we can look into it with you.

2 Likes

Thanks very much for the link. That is unfortunate, but I can work the way I always did...

The issues I have are the usual - special characters having problems when used in code in RMarkdown files or console and AFAIK all the information I found on the topic was basically "this can't be solved". My current workaround is to load all UTF-8 strings from files which tends to work and is not very annoying.

Are you working in UTF8 encoded file ?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.