RStudio can't deal with file names with unicode characters

I've suddenly started having a problem in RStudio when I've got file/folder names that contain unicode characters. I've been running the exact same code for months, but it stopped working this morning.

For example, when I run list.files() on a file path containing "Ø", it doesn't work. However, I can change my working directory to that folder and then run list.files():

# RUNNING IN RSTUDIO
list.files("S:/Spildevand/Lille ØU-sag")
#> character(0)


setwd("S:/Spildevand/Lille ØU-sag")
list.files()
#>  [1] "~$P WW analysis.docx"                                                         
#>  [2] "Anmeldelse til Compliance"                                                    
#>  [3] "Budgetter_2021.03.22.xlsx"
#>  etc

But it works fine if I run it from the R GUI:

# RUNNING IN R GUI
list.files("S:/Spildevand/Lille ØU-sag")
#>  [1] "~$P WW analysis.docx"                                                         
#>  [2] "Anmeldelse til Compliance"                                                    
#>  [3] "Budgetter_2021.03.22.xlsx"
#>  etc

Similarly, saving an RDS file containing a special character works in R GUI, but not RStudio:

x <- 1
saveRDS(x, "æ rstudio.RDS")  # or "æ rgui.RDS"

enter image description here

I don't understand why this has suddenly started happening, nor why it would matter whether or not I run it in RStudio. The session info is slightly different, RStudio has two extra bits: system code page: 65001 and tools_4.1.2. Don't know what they mean or whether they're relevant.

RStudio: 
--------
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.1.2 tools_4.1.2   



R GUI:
------
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.1.2

I also posted this at r - RStudio can't deal with file names with unicode characters - Stack Overflow

Any recent, (i.e. in the last 24 hours) to Windows? Windows is somewhat notorious as being hostile to non-ASCII characters and to spaces in file names/file paths.

I tried both of your examples in Ubuntu Linux with no problems.

Not that I'm aware of, but there could have been without me realising, it's a work computer.

But even so, I'm not sure why it would be different in RStudio than in the R GUI. In fact, a detail I forgot to mention is that it did work fine in RStudio when I was rendering the reprex. Don't know what that means.

Me neither but I am not a very sophisticated user and I have not used Windows in at least 10--12 years.

Maybe it's cured itself? Are you getting the same errors when you rerun the original analysis?

Checking my session.info() I see I use LC_CTYPE=en_US.UTF-8 . Might it be worth changing yours to a UTF-8 setting? Mind you I have no idea how to do that especially in Windows but it can be done.

That's the problem; the code page is used to map characters outside the regular ASCII range (A-Z). R doesn't support UTF-8 properly until R 4.2. If you want to use the 65001 code page, you can try daily build versions of R and RStudio; otherwise, change your code page to one specific to your locale, and that should get things working with your current config.

Alright, but why would it be fine if I use the R GUI, while it breaks in RStudio? Does RStudio choose which code page R uses? If that question even makes sense, I have no idea what I'm talking about.

I have also seen exactly this problem last few days - and same sessioninfo reporting “system code page: 65001”
(Also Danish user/language setting, windows 10, company/work computer)
Unsure how to change this and what to change to

It’s not only a problem when reading filename with special chars like æøå - but also in case of running SQL chunks with æøå in variable names

I haven’t found a workaround besides renaming my local files.
The similar SQL challenge when Unicode characters are present in columns is for now still a puzzle and problem

@Oliver1 Did you find a solution to this ?:blush:

Brgds Henrik

Info: Have access to another windows 10 pc also with Rstudio installed

On this pc sessioninfo doesn’t (yet) report codepage 65001 and Unicode characters don’t give any issues in Rstudio in file names or SQL chunks

No, I've also just renamed all my files for now, but I see that that won't work so easily with the SQL problem. Strange that it seems to be Denmark-specific so far.

Is it the same RStudio version? I can imagine mine might have updated via my organisation without me realising, but I have no idea what version I was on before. My current one with the problem is:

RStudio 2021.09.2+382 "Ghost Orchid" Release (fc9e217980ee9320126e33cdf334d4f4e105dc4f, 2022-01-04) for Windows
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36

This could be similar issue:

/Henrik

No - not exact the same

Updates are applied automatically so don’t know when “problem pc” is upgraded

The PC that works with Unicode : Rstudio 2021.09.1, build 372
The PC that doesn’t work with Unicode : Rstudio 2021.09.2, build 382

I downgraded to 2021.09.1 build 372

Sessioninfo still reports codepage 65001 and Unicode issue is also affecting previous version (build 372) in similar way

It seams codepage 65001 is the problem. My guess is this has changed with a recent windows update - to something Rstudio doesn’t like

Updated

I found a solution that at least on my PC works :slight_smile:

In windows controlpanel change region settings current system locale to "Latin (World)" and make sure "Beta: Use Unicode UTF-8...." in not checked.
I had to restart PC and login with adm. rights to make the change

After this Rstudio works as before. And sessionInfo() don't report codepage 65001 anymore

Brgds. Henrik

Good news! I don't have admin rights but I've put in a request to IT to try your solution, fingers crossed.

Like this in Danish Win 10 installation. After this Rstudio works just like it used to :slight_smile:

I had the same encoding problems with RStudio-2021.09.2-382.

But @hbach solution didn't work for me.

So I fall back to an older version of R-Studio (Version 1.4.1717-3).
With this Version sessionInfo() output is without the system code page: 65001 line and the encoding troubles are gone.

I switched work computers a few days ago and installed RStudio-2021.09.2-382 and had the same problem (but for Slovenian - čšž). My new computer is running Windows build 10.0.19043.

Turning off the beta for UTF-8 on Windows only changed the behaviour in RGui - if I had the beta on, the error happened in both RGui and RStudio, if off, only in RStudio.

Setting locale Sys.setlocale("LC_CTYPE", ".1252"), did change the output of say list.files(), but it was just wrong in a different way. I tried 1252, 1250, 1251, and some others from this list Code Page Identifiers - Win32 apps | Microsoft Docs , some of them even stalled RStudio and I had to force quit. Just adding this in case it helps anyone find the issue faster.

For me the only solution that works for now is reverting to an older version (1.4.1717).

I had this same problem after updating RStudio when using a script that reads data from a MySQL-Database. Before the update there were no problems.
When I downgraded it to build 372, everything is fine again.
IMHO this is something in RStudio that changed and not in a Windows Update (I am using WIndows 11).
Would be great if this could be resolved as I work with multiple languages (French, German, Italian and English) in my code.
Cheers
Renger

Same issue here working with Finnish, with the added complication that User is Käyttäjä in Finnish, so Rstudio loses access to Users/AppData/Local/Rstudio too (or Käyttäjä/AppData/Local/Rstudio, as it is..), so anything that looks at that doesn't work.

The @hbach beta-solution did not work. Trying downgrade but I think there are features after Version 1.4.1717-3. I'd like to use so would like a solution with the new version.

Doesn't really seem to work for others ? It was worth a try....

It still works on my PC just as before this problem started.
I don't have the courage to change settings back again :joy:
Hope a better solution can be found.

My settings/setup:
Win 10, Danish, non-unicode set to use codepage Latin (World), unchecked Unicode Beta
[note: use of Latin (World) is important]
Rstudio 2021.09.1 Build 372)

/Henrik