nchar(homeDir) : invalid multibyte string, element 1

TL;DR:
Rstudio cannot be used on Windows if you have multibyte characters in filepaths. The answers to the exact same bug has been unanswered for more than a year or nonchalantly brushed aside.

see exact same error

PROBLEM:
Latin characters does not get interpreted correctly by Rstudio and core functions in R.

This is unique to RStudio on Windows. I do not know if it is related to R itself. Rgui has no problem printing æøå out of the box when installed. I should test R on the command line.

**sessionInfo()**
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.2.2 tools_4.2.2

I am actually on Windows 11, but hey ...

Mind you, problems persist also under other settings for locale.

consider a completely fresh install of Rstudio 22.07.2 Build 576 with R 4.2.2 (2022-10-31 ucrt) :

normalizePath("~")
 [1] "C:\\Users\\userpath\\OneDrive - organisation name with�\\Dokumenter"
 Warning message:
 In normalizePath(path.expand(path), winslash, mustWork) : path[1]="C:/Users/userpath/OneDrive - organisation name with�/Dokumenter": The system cannot find the path specified

normalizePath("C:/Users/userpath/OneDrive - organisation name withø/Dokumenter")
 Error in normalizePath("C:/Users/userpath/OneDrive - organisation name with�/Dokumenter") : 
   file name conversion problem -- name too long?

path.expand("~")
 [1] "C:/Users/userpath/OneDrive - organisation name with\xf8/Dokumenter"
### or, depending on the locale settings:
 [1] "C:/Users/userpath/OneDrive - organisation name with�/Dokumenter"

print("æøå")
[1] "���"

REAL problems arise:

df <- read.csv2("C:/Users/userpath/OneDrive - organisation name withø/Dokumenter/csvtest.csv")`
 Error in file(file, "rt") : 
   invalid input 'C:/Users/userpath/OneDrive - organisation name withr�/csvtest.csv' in 'utf8towcs'

But thankfully, the above works when locale is sett to .UTF8. Not so if you try to use ~ e.g. to make things a bit more compact:

df <- read.csv2("~/csvtest.csv")
 Error in file(file, "rt") : cannot open the connection
 In addition: Warning message:
 In file(file, "rt") : cannot open file 'C:/Users/userpath/OneDrive - organisation name withr�/Dokumenter/csvtest.csv': Illegal byte sequence

Setting R_HOME to path without funny characters in Windows directly helps. Then I can place .Rprofile in "safe" place, so that I can set locale safely. It still does not help resolving directories outside ~ containing multibyte characters.

Basically, Norwegian is my language and language is a effin big part of the REALITY of a large part of the human population.

I am a bit sad and desperate in my tone because I find again and again that the common answer to questions why encoding gets messy is: "Do not use those characters" or "Encode them to ASCII". I also know that naming folders with funny o-s or other exiting characters is a non-safe habit - but 1) we all have to deal with other people, and reality 2) they are allowed according to ISO.

This last year, support for printing my language-specific characters have deteriorated significantly. There has always been problems - e.g. having to run two different processes to get a parameters with æøå to get correctly processed in markdown whether it is rendered interrogatively or through a script. Now however, I cannot continue using RStudio on Windows.

That sucks, as I have no other alternatives to Windows at work. R under WSL2 is a pain in the behinds.

So, either this be fixed or I have to ditch R altogether - or VScode for the most necessary scripting.

  • edited to be more to the point

You appear to be lumping many issues together here.

Do any of them relate to the RStudio IDE, rather than R? If so, you could check VS Code to see whether the same problems persist or only some of them.

Bugs in tidyverse packages should be reported on github, which shouldn't be much of a hoop to jump through.

What problems do you have dealing with Norwegian characters in csv files or databases (rather than file paths) which are not resolved by setting the locale or encoding format?

I probably won't be able to help resolve your problems, but you've got a better chance of people responding if you clearly separate the various issues. There are many people using non-English languages on Windows on here.

Thank you for answering. I have edited my posting.

Still, you come across as pretty dismissive. I would love if you also went to look at the other issue linked.

Further, I try to make clear my environment. I cannot exclude all other possible combinations of IDE, R and OS.

It is not a problem on WSL2, which is basically same as a Linux environment. Encoding is more straightforward.

VScode does not have these issues.

That RStudio cannot even print non-ASCII in a fresh install and that we have to dive through all kinds of detours and special procesures that might work possibly maybe or getting told off is pretty telling of what non-English users meet in such threads.

Sorry for being crass.

I was actually trying to help you get help from other forum members here who might have the appropriate experience, but it seems you'd rather rant. Maybe somebody else fancies being berated by you instead.

I am sorry you feel that way. I am not here to berate anyone - just to possibly get some helpful answers. I take you on your word that you tried to help and I am very sorry for having hurt you.

I guess I have to put my belief in that others might have some answers to the actual problem being - however badly - posited.

After yet some more digging around:

I have gotten rid of the errors by:
1.

  1. in Windows "Edit environment variables for your user" set R_USER to C:\Users\userpath\R
  2. edit .Rprofile in said place to include invisible(Sys.setlocale("LC_ALL", "nb-NO.UTF-8"))

Regarding 1: It is no use setting these variables in Renviron.site or .Rprofile (the last is probably because it is read after RStudio has found R_HOME) it seems and having them both in Windows environment variables and there leads to no solution.
Regarding 2: this is the format I have most success with. Other formats seemed to not work properly (NO.UTF8, Norwegian_Bokmål.UTF8, .UTF8).

As of me writing this, the locale parameters in Windows (UCRT - Universal C Runtime library in Windows 10 and 11) can be found at:
Locale names, languages, and country-region strings

I mark this as a solution, even though the core trouble still remains:

If your user home path includes funny characters, you must put your R_HOME in another place and have to mess with your OS. Also, RStudio locale settings does not work out of the box and it can be quite the journey to get it right (it has to be in .Rprofile and that does not get read if you do not have a safe place for R_HOME).

Also - I am so sorry for having offended you, @martin.R. There is a lot of non-help out there, and this so called solution I post here is just a gap measure. I know you meant good.

1 Like

You didn't offend me. It was simply that your approach was unlikely to elicit any responses from other forum members.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.