Cannot print histograms with skimr


#1

I’m an happy Windows user of skimr 1.0.1. The spark graph (histogram) plot now works nicely:

> skim(mtcars)
Skim summary statistics
 n obs: 32 
 n variables: 11 

Variable type: numeric 
 variable missing complete  n   mean     sd    p0    p25 median    p75   p100     hist
       am       0       32 32   0.41   0.5   0      0      0      1      1    ▇▁▁▁▁▁▁▆
     carb       0       32 32   2.81   1.62  1      2      2      4      8    ▆▇▂▇▁▁▁▁
      cyl       0       32 32   6.19   1.79  4      4      6      8      8    ▆▁▁▃▁▁▁▇
     disp       0       32 32 230.72 123.94 71.1  120.83 196.3  326    472    ▇▆▁▂▅▃▁▂
     drat       0       32 32   3.6    0.53  2.76   3.08   3.7    3.92   4.93 ▃▇▁▅▇▂▁▁
     gear       0       32 32   3.69   0.74  3      3      4      4      5    ▇▁▁▆▁▁▁▂
       hp       0       32 32 146.69  68.56 52     96.5  123    180    335    ▃▇▃▅▂▃▁▁
      mpg       0       32 32  20.09   6.03 10.4   15.43  19.2   22.8   33.9  ▃▇▇▇▃▂▂▂
     qsec       0       32 32  17.85   1.79 14.5   16.89  17.71  18.9   22.9  ▃▂▇▆▃▃▁▁
       vs       0       32 32   0.44   0.5   0      0      0      1      1    ▇▁▁▁▁▁▁▆
       wt       0       32 32   3.22   0.98  1.51   2.58   3.33   3.61   5.42 ▃▃▃▇▆▁▁▂

However, when I combine skim with R Markdown, I lose the ability to show the histograms. According to skimr vignettes, for Markdown output I should combine skim with kable, with a chunk option of results='asis'. I tried the vignette suggestion, as well as kable(skim()) without the chunk option and skim without kable, but none of these prints the histograms:

---
title: "test"
subtitle: "Andrea Panizza"
author: "`r Sys.Date()`"
date: "_reading time: ? minutes_"
output:
  html_document:
   keep_md: true
   fig_caption: yes
params: 
  output_dir: "../output"
---
  
```{r setup, include=FALSE}
library(knitr)
library(skimr)
library(dplyr)
opts_chunk$set(warning = FALSE,
               message = FALSE,
               echo    = FALSE,
               fig.align  = "center",
               fig.width = 7.25,
               fig.height = 6)
```

## test 1
```{r}
skim(mtcars)
```

## test 2
```{r}
kable(skim(mtcars))
```

## test 3
```{r, results='asis'}
kable(skim(mtcars))
```

I attach the results as images. You can also recreate them yourself by running my R Markdown example:



As you can see, I can’t get the histograms. Am I doing something wrong? Should I open an issue in the skimr repo?


#2

It’s likely because the fonts being used doesn’t have the Unicode block element glyphs. You could set a font with CSS, though there may be a more direct route.


#3

Thanks for the tip! My CSS skills are more or less non-existant. Could you please show me how to do it? PS I guess that if this is a font issue, it wouldn’t make sense to open an issue in the skimr repository, right?


#4

Preface: Fonts may not be the issue; I can’t reproduce it, as the block elements display fine on my computer. Not all OSes handle Unicode equally well, though.

How CSS is interpreted seems to vary depending on how it’s inserted (as a separate stylesheet file, as a <style> tag, or as a style attribute to a <div> tag), though it’d take more digging to figure out why. Regardless, I managed to change the font by inserting

<style type="text/css"> 
code, pre { 
    font-family: Fira Code, Iosevka, Hack, monospace; 
} 
</style>

right before the code chunk.

The big caveat to any such approach is that anyone viewing the resulting HTML must have one of the listed fonts available, and I’m not sure if there’s one that everyone has that has block elements. I’m sure consistent display is possible via WOFF, but it may take non-negligible work to do so.


#5

The following is not working on my laptop (“not working” meaning that I cannot see the histograms in the HTML output):

---
title: "test"
subtitle: "Andrea Panizza"
author: "`r Sys.Date()`"
date: "_reading time: ? minutes_"
output:
  html_document:
   keep_md: true
   fig_caption: yes
params: 
  output_dir: "../output"
---
  
```{r setup, include=FALSE}
library(knitr)
library(skimr)
library(dplyr)
opts_chunk$set(warning = FALSE,
               message = FALSE,
               echo    = FALSE,
               fig.align  = "center",
               fig.width = 7.25,
               fig.height = 6)

```

<style type="text/css"> 
code, pre { 
    font-family: Fira Code, Iosevka, Hack, monospace; 
} 
</style>
## test 3
```{r, results='asis'}
kable(skim(mtcars))
```

Info on my session:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252   
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2         dplyr_0.7.4          skimr_1.0.1          knitr_1.17          
[5] RevoUtils_10.0.7     RevoUtilsMath_10.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13     bindr_0.1        magrittr_1.5     tidyselect_0.2.2 R6_2.2.2        
 [6] rlang_0.1.2      highr_0.6        stringr_1.2.0    tools_3.4.3      htmltools_0.3.6 
[11] yaml_2.1.14      rprojroot_1.2    digest_0.6.12    assertthat_0.2.0 tibble_1.3.4    
[16] tidyr_0.7.2      purrr_0.2.3      glue_1.1.1       evaluate_0.10.1  rmarkdown_1.8   
[21] stringi_1.1.6    compiler_3.4.3   pander_0.6.1     backports_1.1.1  pkgconfig_2.0.1 ```

#6

Do you have one of those fonts? Is the font changing? If yes to both, it may be something more pernicious like an encoding issue.


#7

I am having this same issue as Andrea. The documentation for skimr talks about being able to load the “DejaVu Sans” font using extrafont package. But This doesn’t seem to work for me either.

They recommend to run:

# install.packages(c("extrafont"))
extrafont::font_install("DejaVu Sans")

Then their YAML header looks like this:

---
title: "Untitled"
mainfont: DejaVu Sans
output:
  html_document: default
  pdf_document:
    latex_engine: xelatex
  word_document: default
font-family: Times New Roman
---

I try this and the solutions here. Still no luck


#8

I reproduced this document without issue on my Mac (10.13.2). In the output HTML, the histograms glyphs fell all the way back to monospace inside the code block (which, in Chrome for my Mac, is Courier). I also tested Courier New using the developer tools and it still looks fine.

I also checked out the histograms in the final test (outside a code block) and they printed in Helvetic Neue. Helvetica, Arial, Times New Roman and sans-serif (which falls back to Helvetica Neue on my Mac) all looked good.

Given @Andrea is getting good histograms in console output but is getting the problematic Windows Unicode output in the rendered code block, it seems like something is going wrong with the Unicode output in the chunk :confused:

@Andrea or @ejlatour, are you able to host your rendered HTML file (based on the initial post) somewhere so we can take a peek at it with the Developer Tools? Let’s verify whether the right glyphs are ending up in the output document, so that we can definitively cross the font issue off.


#9

@rensa thanks for the answer and apologies for the delay, but I’ve been very busy these days. Here is a Google Drive link with the HTML file, let me know if you can access it:

https://drive.google.com/file/d/1eQjmV_rVswnGGHJJvCKXtcb_pPXE5U7X/view?usp=sharing


#10

No sweat, @andrea :slight_smile:

Okay, yeah: going by your HTML output, it looks like the Unicode characters are being improperly printed by R, and then this is being put straight in the HTML file and the browser is interpreting the characters as HTML tags because they’re surrounded by angled brackets:

21

Since you can output the histograms in the console without a problem, it seems like something’s going wrong in the RMarkdown build process. Ista Zahn has documented the problems with Unicode and R on Windows, saying that:

On Windows there is a bug in print.data.frame that causes data.frame's with UTF-8 encoded columns to be displayed incorrectly in non UTF-8 locales.

My thinking is that maybe your locale is set correctly in your regular R sessions but the RMarkdown build session is using a different locale, causing your document to be built with this printing bug.

There are two options we could try, @Andrea and @ejlatour:

  1. Adding a df_print option to the document front matter at the top, so that instead of the (buggy) print.data.frame, we use something else. I don’t know if skimr works with kable or the other options, though, and this could have other side effects.
  2. Adding a chunk at the top of the document (after the front matter) that changes the locale with Sys.setlocale(). I can’t tell you what to set it to because I don’t have access to a Windows machine, but you can call Sys.getlocale() to see a list of available ones. Are there any UTF-8 ones? (Ideally, it’d be something like "en_US.UTF-8")

#11

That did get something to happen… I am working off a document similar to @Andrea right now thinking to just get one version of this markdown working with skimr.

The skimr documentation sometime a while back recommended to change the Sys.setlocale() to Sys.setlocale("LC_CTYPE", "Chinese"). When I made this change along with the df_print option, then I see histograms in one of@Andrea’s tests. Then I see weird characters in the other. Luckily, it doesn’t change the body text to Chinese or anything.

A SO discussion about Sys.setlocale() recommended using Sys.setlocale(category = "LC_CTYPE", locale = "English_United States.1252"). I tried this and it is the same as before.

Another RStudio community discussion mentions some known issues (toward the bottom) with windows encoding.

Here is a link to my html:
https://drive.google.com/file/d/1uJraEG0IIkNNQEz-jfgcDZ83w_V9Jg17/view?usp=sharing


#12

Yeah, 1252 is an ASCII-like Windows encoding—I don’t think that’s gonna help with Unicode characters :confused: I saw the documentation recommending switching to Chinese. I’m not sure that’s a fantastic idea, since it might have other unexpected effects (like how dates or currencies get formatted).

Still, I’m encouraged that we’re getting somewhere! :smiley: What other locales are listed when you run Sys.getlocale(), @ejlatour? Whoops, that just gets the current one.

The more I read, the more it seems like on Windows each language is associated with a single encoding, and that it might just not be possible to switch to UTF-8 without changing the language. Here’s a list of valid Windows locale strings, and here’s info on code pages in Windows that seems relevant.

Aha!

The set of available languages, country/region codes, and code pages includes all those supported by the Win32 NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8. If you provide a code page like UTF-7 or UTF-8, setlocale will fail, returning NULL.

How do you go if you leave the locale as is and just add df_print: kable or tibble?


#13

Here is a link to just df_print: kable and this one with just df_print: tibble

I did not change the Sys.setlocale() at all in thes versions


#14

Bum :frowning: The only other suggestion I have is from the skimr README, which mentions this issue precisely as a limitation: instead of using df_print: kable in the header, try wrapping output statements in knitr::kable(). But if that doesn’t work, I’m stumped :frowning: If you get really stuck you might have to use RStudio cloud or a cheap/free cloud compute instance to render your document :frowning:

Oh! If you’re on Windows 10, one other option is to enable the Windows Subsystem for Linux, install the Linux version of R on that and render your RMarkdown documents through it. Honestly, if I had a Windows machine, I’d probably take the (reportedly fairly small) performance penalty and use Linux R through the WSL over the native Windows R full-time, since ‘R is primarily written for Unix-alikes and is not therefore ‘Unicode’ in the Windows sense.’


#15

Hi @rensa @ejlatour thanks for the help. I got a little lost with the list of suggestions: if I understand correctly, everything suggested so far didn’t work and the only remaining possibility is the WSL for Windows 10, right? I don’t think I can do that (it seems I can’t install Ubuntu on my work laptop) but I’ll try.

PS If one of the suggestions was to use kable(skimr(mtcars)) without changing the locale, that’s exactly what I did in my initial post: doesn’t work.


#16

Footnote to this whole discussion: Those Unicode characters in the tiny histograms cause problems in other contexts, too. A co-worker using skimr inside RMarkdown tried to produce a PDF file. The software tools (e.g., LaTaX) choked and crashed on the Unicode, saying “Invalid character”. The only solution, of course, was to remove the tiny histograms from the table. Then everything ran OK.


#17

Yeah, sorry @Andrea: I don’t know what else to suggest :frowning:

@pteetor, if your only problem rendering them is in PDFs, you could try adding the following to your front matter:

output:
  pdf_document:
    latex_engine: xelatex

If xelatex uses XeTeX, that LaTeX engine is designed to add Unicode support. Maybe that’ll help?


#18

@rensa Thanks so much for your efforts digging into this!! I don’t know that I will go to the extent that you mention with Linux. I’ve tried with this package and many of the updates that they have put out. And after this latest endeavor, here are my parting thoughts (for now)

  1. Without the histogram skimr does give a nice set of summary information (i.e. big picture overview) of a data frame. It’s pipe-able and it give you the summary stats in tibble/data.frame. This is really useful.

  2. I am able to run commands like skimr::skim(iris) then take the output and copy-paste to a .txt file. This seems to be a fair compromise. The skimr out put isn’t something that I would use for a report, but it’s still a powerful piece of documentation. So documenting it in a .txt file isn’t too terribly off-putting. It’s not an easily run .Rmd and requires a manual step, but I’m sort of okay with this for the time being.

  3. The package has come a long way from the first iteration that I played with. And I am pleased that the developed have addressed things to get them to this point. So I am coming to terms with 2) above.

  4. Bin width selection for histograms is a fraught topic. And so these histograms are not definitive and may mislead simply due to the way the data get binned. They’re great for a first look, but by no means definitive and should be viewed cautiously.

It’s a great package that does so much so well!! It’s a bummer that there are the windows issues. I really appreciate your efforts looking into this. And thanks to @Andrea for bringing up something that I’ve toiled with for a bit.

Best!


#19

@rensa no problem, I appreciate your support! :grinning: @pteetor didn’t know about the PDF issues - please let us know if @rensa fix works for you.
@ejlatour I agree with your words of appreciation for skimr - I will keep using it, I think it’s simpler and more tidyverse-friendly than most of the alternatives.

I will let you know if I find a solution. Bye for now,

Andrea


#20

Hi all,

latest update. Thanks to the support of elinw (I don’t know her real name), I was able to make some progress but not to solve the issue. I performed the following steps:

My .Rmd is now:

---
title: "test"
mainfont: DejaVu Sans
author: "`r Sys.Date()`"
date: '_reading time: ? minutes_'
subtitle: Anon
output:
  html_document:
    fig_caption: yes
    keep_md: yes
font-family: Times New Roman
params:
  output_dir: ../output
---
  
```{r setup, include=FALSE}
library(knitr)
library(skimr)
library(dplyr)
library(extrafont)
opts_chunk$set(warning = FALSE,
               message = FALSE,
               fig.align  = "center",
               fig.width = 7.25,
               fig.height = 6)

```

## Sanity checks

Let's check we have indeed installed Deja Vu Sans 
```{r check_fonts}
fonts <- extrafont::fonttable()
any(grepl("Deja", fonts$FontName))
```

What about locale? Do we have a real UTF-8 locale, such as Chinese?
```{r check_locale}
sessionInfo()
```


## test 
```{r, results='asis'}
kable(skim(mtcars))
```

This results in this HTML file and the following “histograms”:

At this point, I think it’s just not possible to make this happen.