Unicode replacement character (�) issue in RStudio only within R markdown files

rmarkdown
rstudio

#1

Hi, I'm running RStudio Version 1.1.442 on a Windows 10 machine with a system default of ISO-8859-1. This is more of an issue of curiosity, as this is not something that really affects my experience within RStudio, but for some reason the and characters that are, for instance, used in warning messages such as:

Warning messages:
1: package ‘ggplot2’ was built under R version 3.4.4 

show up as the unicode replacement character / question mark symbol, and linebreaks don't show up, but only in R notebooks and R Markdown files (regardless of encoding selected in 'Reopen with encoding')

package �ggplot2� was built under R version 3.4.4package �tibble� was built under R version 3.4.4package �tidyr� was built under R version 3.4.4package �readr� was built under R version 3.4.4package �stringr� was built under R version 3.4.4package �forcats� was built under R version 3.4.4

So loading the tidyverse in an .R script or simpy in the console looks like this:

But loading the tidyverse in an .RMD file looks like this:

I'm posting this in the RStudio IDE section, assuming that is where this belongs, rather than the R Markdown section.

Thanks in advance! :slight_smile:


RStudio changes umlaut charcters ä to ä, ö to ö etc. in old and new program files
#2

I suspect that if you changed your locale to utf-8 (rather than ISO-8859-1), you wouldn't have the issue.


#3

Reason why:

Latin-1 encodes just the first 256 code points of the Unicode character set, whereas UTF-8 can be used to encode all code points

(from https://stackoverflow.com/questions/7048745/what-is-the-difference-between-utf-8-and-iso-8859-1)


#4

Isn't the Windows 10 system locale explicitly only used when displaying text in processes that don't support Unicode? My question was based on the understanding that there seems to be a discrepancy in how non-Unicode text is handled within RStudio, depending on whether it is inside an .Rmd file or not. Perhaps I'm mistaken?

Initially I thought it might have something to do with knitr::purl and source being used behind the scenes but that doesn't seem to be the case. Sourcing an .Rmd file in the following manner:

source(knitr::purl(here::here("foo.Rmd")))

Comes out fine and dandy, regardless of the encoding of the file.
image


#5

I should add that I'm in the position of not being able to change the default system locale to UTF-8, or any other locale for that matter (company-wide IT thingamabob).


#6

Could you explain how to do that? thanks


#7

@kevinushey did a great write-up on string encoding and R that I can't recommend highly enough (it's short, too!)
https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/


#8

The problem really seems to be the new R Studio version, in one of my files saved as utf 8 all german letters are lost and replaced by "?".


#9

I have this question too. I just installed RStudio 1.1.456 on a Windows 10 machine and right out of the box I see Unicode replacement characters in package loading messages, like so:

�tseries� version: 0.10-45

Can you please help me @mara on how to solve this issue?


#10

On Windows, this is often caused by attempts to run R in a different locale than the system locale. (You can check R's notion of the locale with Sys.getlocale()).

A simple fix for all locales is to disable so-called fancy quotes, with e.g.

options(useFancyQuotes = FALSE)

and this option can become part of your .Rprofile or similar so that it's applied to all R sessions.


#11

Thanks @kevinushey. What are the steps to be taken to align the R locale with the system locale? That is, if I don't want to do the simple fix, but the more proper fix.