Unicode replacement character (�) issue in RStudio only within R markdown files

Hi, I'm running RStudio Version 1.1.442 on a Windows 10 machine with a system default of ISO-8859-1. This is more of an issue of curiosity, as this is not something that really affects my experience within RStudio, but for some reason the and characters that are, for instance, used in warning messages such as:

Warning messages:
1: package ‘ggplot2’ was built under R version 3.4.4 

show up as the unicode replacement character / question mark symbol, and linebreaks don't show up, but only in R notebooks and R Markdown files (regardless of encoding selected in 'Reopen with encoding')

package �ggplot2� was built under R version 3.4.4package �tibble� was built under R version 3.4.4package �tidyr� was built under R version 3.4.4package �readr� was built under R version 3.4.4package �stringr� was built under R version 3.4.4package �forcats� was built under R version 3.4.4

So loading the tidyverse in an .R script or simpy in the console looks like this:

But loading the tidyverse in an .RMD file looks like this:

I'm posting this in the RStudio IDE section, assuming that is where this belongs, rather than the R Markdown section.

Thanks in advance! :slight_smile:

I suspect that if you changed your locale to utf-8 (rather than ISO-8859-1), you wouldn't have the issue.

1 Like

Reason why:

Latin-1 encodes just the first 256 code points of the Unicode character set, whereas UTF-8 can be used to encode all code points

(from https://stackoverflow.com/questions/7048745/what-is-the-difference-between-utf-8-and-iso-8859-1)

1 Like

Isn't the Windows 10 system locale explicitly only used when displaying text in processes that don't support Unicode? My question was based on the understanding that there seems to be a discrepancy in how non-Unicode text is handled within RStudio, depending on whether it is inside an .Rmd file or not. Perhaps I'm mistaken?

Initially I thought it might have something to do with knitr::purl and source being used behind the scenes but that doesn't seem to be the case. Sourcing an .Rmd file in the following manner:

source(knitr::purl(here::here("foo.Rmd")))

Comes out fine and dandy, regardless of the encoding of the file.
image

I should add that I'm in the position of not being able to change the default system locale to UTF-8, or any other locale for that matter (company-wide IT thingamabob).

Could you explain how to do that? thanks

@kevinushey did a great write-up on string encoding and R that I can't recommend highly enough (it's short, too!)
https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/

3 Likes

The problem really seems to be the new R Studio version, in one of my files saved as utf 8 all german letters are lost and replaced by "?".

I have this question too. I just installed RStudio 1.1.456 on a Windows 10 machine and right out of the box I see Unicode replacement characters in package loading messages, like so:

�tseries� version: 0.10-45

Can you please help me @mara on how to solve this issue?

On Windows, this is often caused by attempts to run R in a different locale than the system locale. (You can check R's notion of the locale with Sys.getlocale()).

A simple fix for all locales is to disable so-called fancy quotes, with e.g.

options(useFancyQuotes = FALSE)

and this option can become part of your .Rprofile or similar so that it's applied to all R sessions.

4 Likes

Thanks @kevinushey. What are the steps to be taken to align the R locale with the system locale? That is, if I don't want to do the simple fix, but the more proper fix.

I have the same problem @samuel and the same question - I'd like to know how to align my R locale with my system locale please.
I don't think I should change my Windows locale (which all seems to be correctly set up for my language and region) even if I could (don't have admin privs). I guess my system is set as ISO-8859-1.
Sys.getlocale()
returns

"LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"

It seems like the right thing to do to work with RStudio & RMarkdown in UTF-8.
Is there any other way of fixing the misalignment of the two character sets on my system?
In the meantime, I'll apply @kevinushey's helpful fix for the fancy quotes :point_up::clap:

If you're not explicitly adjusting the R locale with Sys.setlocale(), it should just use whatever default you have for your system. It looks like that's true in your case so I'm a bit surprised you're still seeing issues here.

Ultimately, we'll need a reproducible example to best understand the problem as it's possible something else is doing things with encodings / locales under the hood.

1 Like

I bow to your greater knowledge and understanding, but it seems to me that my Windows locale ( Sys.getlocale()) is charset 1252 (which is not Unicode?) but I believe that in RStudio I am working in UTF-8 and that's where the issue arises.
I don't know very much about this stuff and am slightly guessing in the dark here really!