What is the state of UTF-8 support on Windows 10 with R 4.0?

There was an R blog post announcing UTF-8 support on Windows 10, starting with R 4.0.

It says:

In the experimental build of R, UTF-8 is the native encoding, so RGui will not use any \u , \U escapes when sending text to R and R will not embed any UTF-8 strings, because the native encoding is already UTF-8.

Since I just stumbled on one more UTF-8 related problem, I decided to upgrade to R 4.0.2 and the new toolchain hoping the problems will go away.

However, after installing, the default locale on my system (Windows 10) is:

# [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

And attempts to set the locale either in session or from .Rprofile via calls like

Sys.setlocale(category = "LC_CTYPE", locale = "en_US.UTF8")
Sys.setlocale(category = "LC_CTYPE", locale = "English_United States.utf8")
Sys.setlocale(category = "LC_COLLATE", locale = "English_United States.utf8")
Sys.setlocale(category = "LC_COLLATE", locale = "en_US.UTF8")

result in

Warning message:
In Sys.setlocale(category = "LC_CTYPE", locale = "en_US.UTF8") :
  OS reports request to set locale to "en_US.UTF8" cannot be honored

The problem arises in both RStudio and RGUI.

Maybe I was searching in bad places, but aside from the blog, the only other reference to UTF-8 on Windows and 4.0 I found is this Stack Overflow question: https://stackoverflow.com/questions/62726261/utf-8-support-in-r-on-windows where a user has the same problem as me and the only suggestion is to use specific locale categories (i.e. LC_CTYPE) instead of LC_ALL, which unfortunately makes no difference for me. All the other resources I could find refer to older R versions.

Did I misunderstood the blog post and the support for native UTF-8 is yet to come in a future version? Or am I missing some step needed to make UTF-8 work for me?

Thanks for any hints.

I think what you found on the R blog is just an experimental project for now. There is a new post about it that is from the 30/07/2020

I think it is not included in R 4.0.2 for now and that it is for future version. (but you can try the experimental binaries at you own risk :wink:)

About your issue, you can open another question in this forum to explain and maybe we can look into it with you.

Thanks very much for the link. That is unfortunate, but I can work the way I always did...

The issues I have are the usual - special characters having problems when used in code in RMarkdown files or console and AFAIK all the information I found on the topic was basically "this can't be solved". My current workaround is to load all UTF-8 strings from files which tends to work and is not very annoying.

Are you working in UTF8 encoded file ?

