Problem rendering foreign languages in Rmd

rmarkdown

#1

Based on the xaringan demo presentation, I know that Chinese and other foreign characters should work. However, I can't get console outputs to display correctly for me. I updated xaringan to the development version and updated RStudio to a version from yesterday, but the problem persists.

The console output within RStudio doesn't have any problems. Is this a xaringan or RStudio issue (or a problem with my locale)? Could this be Windows related?

Any guidance would be much appreciated! :slight_smile:

---
title: "Title"
subtitle: "Subtitle"
date: "`r Sys.Date()`"
output:
  xaringan::moon_reader:
    lib_dir: libs
    nature:
      highlightStyle: github
      highlightLines: true
      countIncrementalSlides: false
---
"中文", "עברית", "english"
```{r}
c("中文", "עברית", "english")
```

#2

Hi Irene!

To determine whether RStudio is involved at all, what happens if you call rmarkdown::render() on your sample .Rmd outside of RStudio?

To determine whether this is limited to xaringan, what happens if you render your sample document to a different format? xaringan doesn’t use pandoc, but html_document does, so it would be interesting to know what you get rendering to that format.

Other info that would be helpful to know:

  • OS version
  • sessionInfo() after rendering
  • RStudio version (I know you said you downloaded it yesterday, but that could mean you’re running any of: the latest 1.1 release, the latest 1.2 Preview release, or one of the Dailies)

(I will admit that text encoding problems make me :dizzy_face:! So I’m mostly trying to gather info so that somebody with a stronger stomach for them might be able to help :sweat_smile: )


#3

Thanks for the guidance, @jcblum!

html_document results in the same problem, so I guess it's not just xaringan. I get the same result when I knit to a word doc, but it fails with PDF.

My session info after knitting to html:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] beepr_1.3             reprex_0.2.1         
[3] devtools_1.13.6       conflicted_0.1.0.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18    audio_0.1-5.1   rprojroot_1.3-2
 [4] packrat_0.4.9-3 digest_0.6.15   crayon_1.3.4   
 [7] withr_2.1.2     backports_1.1.2 magrittr_1.5   
[10] evaluate_0.11   rlang_0.2.2     stringi_1.2.4  
[13] rstudioapi_0.7  fs_1.2.6        rmarkdown_1.10 
[16] tools_3.5.1     stringr_1.3.1   yaml_2.2.0     
[19] compiler_3.5.1  htmltools_0.3.6 memoise_1.1.0  
[22] knitr_1.20    

Running render::rmarkdown() directly gets me worse results:

I'm using RStudio version 1.2.1099 (downloaded today, but I think this is a daily from yesterday? ...not totally clear on the terminology)


#4

Thanks for the info!

To clarify one point: how exactly did you generate this document? E.g., were you running R via Rgui.exe?

Also, I’m assuming you are creating your files in RStudio — is that right? If you create a new sample file and explicitly save it with a UTF-8 encoding, are the rendered results any different?

It gets confusing, and it’s not always black or white! I call it a Daily if it came from https://dailies.rstudio.com/, and “the latest Preview release” if it came from https://www.rstudio.com/products/rstudio/download/preview/. As I understand it (from the outside!), things make it to the Preview release page if they’re thought to be in pretty stable shape, while anything goes with the dailies. Anyway, in the end the actual version number is the most informative thing, and unambiguous.


#5

I generated the rmarkdown document both in RStudio via rmarkdown::render() and outside of RStudio via Rgui.exe, with the same results.

Saving the Rmd with UTF-8 explicitly allows me to knit to PDF, but still gets me garbled outputs for pdf/word/html.

Great, thanks for the explanation about dailies/previews! It looks like I downloaded a daily, but it seems like a preview release is probably the better way to go in most cases.


#6

I think this is a problem with the knitr package, which is used under the hood by rmarkdown. You can reproduce with a similar document:

---
title: "Title"
output: html_document
---

"中文", "עברית", "english"

```{r}
c("中文", "עברית", "english")
```

Then, if you render this with:

knitr::knit("document.Rmd", encoding = "UTF-8")

a document will be generated called document.md. If you open that file (using UTF-8 encoding), you'll see the unicode escape codes there.

I'll see if I can learn a bit more and if it is indeed something that could be handled on the knitr side, I'll file an issue upstream.


#7

It looks like this is ultimately an issue with R's sink() function, which knitr uses behind the hood (through evaluate) to capture output. For example:

code <- c("中文", "עברית", "english")
conn <- textConnection("output", open = "w", local = TRUE)
sink(conn)
print(code)
close(conn)
sink(NULL)
print(output)

Running this, I get:

> print(output)
[1] "[1] \"<U+4E2D><U+6587>\" \"<U+05E2><U+05D1><U+05E8><U+05D9><U+05EA>\" \"english\""

which means the escapes are being produced when R attempts to write the text to the sink. Unfortunately, I can't think of a good away around this.


#9

Ah okay, I found the related issue: https://github.com/r-lib/evaluate/issues/59 At least it looks like there's some work in progress! Thanks @kevinushey for getting to the source of it.