Create pdf from Rnw: missing just one font character from eastern European language


#1

Here is a problem in producing Croatian character, capital Đ, from Rnw. Producing the same from Rmd is correct, but I need Sweave functionality to produce nicer presentations than it is possible from Rmd presentations. I need to produce pdf. and R compile to pdf with sweave and pdflatex.

  1. Example test.croatian.fromRnw.Rnw produces pdf where all but one Croatian characters are correct. Missing one is Đ (capital), that is in pdf black square.
\documentclass[utf8]{article}
\usepackage[utf8]{inputenc}
%\usepackage[T1]{fontenc}
\usepackage[OT2]{fontenc}
%\usepackage{currvita}
% default font cmr10 suports croatian caracters, I checked



\begin{document}
\SweaveOpts{concordance=TRUE}

<<>>=
library(reprex)
library(rgdal)
Sys.getlocale("LC_CTYPE")
getCPLConfigOption("SHAPE_ENCODING")
setCPLConfigOption("SHAPE_ENCODING", "UTF-8")
version
@

\dj \DJ \v{s} \v{S} \v{c} \v{C} \'c \'C \v{z} \v{Z}

đ Đ
š Š 
č Č
ć Ć

\end{document}
#> Error: <text>:2:1: unexpected input
#> 1: 
#> 2: \
#>    ^

Created on 2018-10-14 by the reprex package (v0.2.1)

sessionInfo()
#> R version 3.4.3 (2017-11-30)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.3 LTS
#> 
#> Matrix products: default
#> BLAS: /usr/lib/libblas/libblas.so.3.6.0
#> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=hr_HR.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=hr_HR.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=hr_HR.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=hr_HR.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.4.3  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
#>  [5] tools_3.4.3     htmltools_0.3.6 yaml_2.1.16     Rcpp_0.12.15   
#>  [9] stringi_1.1.6   rmarkdown_1.10  knitr_1.19      stringr_1.2.0  
#> [13] digest_0.6.15   evaluate_0.10.1
  1. Producing the same characters from Rmd is correct in pdf, even though in this reprex example I see some errors that I don't understand. :
title: "test.croatian"
#> Warning: NAs introduced by coercion
#> Error in title:"test.croatian": NA/NaN argument
output: pdf_document
#> Error in eval(expr, envir, enclos): object 'output' not found

đ Đ
š Š 
č Č
ć Ć
#> Error: <text>:2:4: unexpected symbol
#> 1: 
#> 2: đ Đ
#>       ^

Created on 2018-10-14 by the reprex package (v0.2.1)

  1. I am new in using reprex() and Community forum so excuse me if formatting of the question is little bit messy.

Regards,
Melita


#2

Hi Melita,

No worries about this :point_down:— since reprex uses rmarkdown::render(), it's nigh impossible to reprex a Sweave error (well, for me it would be).

Would it be possible for you to upload the .Rnw file as a gist or into a GitHub repo? I'm trying to reproduce your error, but since I don't use Sweave all that often, am having trouble getting it out of the chunks above.

FWIW, this is my personal favourite quick-ish guide to string encoding in R. I get the same thing you did when running Encoding('Đ') in through a reprex, though I get: #> "UTF-8" locally.

Encoding('Đ')
#> [1] "unknown"
text <- '\u0110'
text
#> [1] "Đ"
pryr::bits('Đ')
#> [1] "11000100 10010000"

Created on 2018-10-14 by the reprex package (v0.2.1.9000)


#3

I was able to reproduce this behavior and am not sure yet why it is happening, but it does appear to be something with Sweave:

  1. creating the corresponding tex document (removing R chunks) and compiling renders the characters correctly for me.

  2. Compiling with Sweave produces the black box for me.

  3. You can use knitr::Sweave2knitr('test.croatian.fromRnw.Rnw') and run pdflatex on the resulting tex file. This rendered the characters correctly, but I wonder if this provides you with the functionality you're looking for?


#4

Hi Mara, thanks for the effort, here is a gist link to both rnw and rmd examples:

Hi jrlewi, thanks for the comment on tex compiler, it could be a solution and I did test it, but you are right, I would like to avoid double compiling.

Regards,
Melita


#5

I don't have time to dig deep at the moment, but can confirm, I'm getting the same result as you and @jrlewi. There are a couple possible workarounds here (non ideal):

And it looks to me like you've used the suggested steps in this post, but just in case:


#6

Hi,

looking at your suggestions and testing all the possibilities, including encoding and rnw (sweave) compilers I can conclude that the encoding has to be T1. The problem is indeed in selecting Sweave to weave rnw file. When knitr is selected (in Tools/Global options/Sweave) all characters are represented correctly.

Thanks for the effort and comments that help in solving this issue.
Sincerely,
Melita


#7

has to be deleted from script also