However, when I combine skim with R Markdown, I lose the ability to show the histograms. According to skimr vignettes, for Markdown output I should combine skim with kable, with a chunk option of results='asis'. I tried the vignette suggestion, as well as kable(skim()) without the chunk option and skim without kable, but none of these prints the histograms:
It's likely because the fonts being used doesn't have the Unicode block element glyphs. You could set a font with CSS, though there may be a more direct route.
Thanks for the tip! My CSS skills are more or less non-existant. Could you please show me how to do it? PS I guess that if this is a font issue, it wouldn't make sense to open an issue in the skimr repository, right?
Preface: Fonts may not be the issue; I can't reproduce it, as the block elements display fine on my computer. Not all OSes handle Unicode equally well, though.
How CSS is interpreted seems to vary depending on how it's inserted (as a separate stylesheet file, as a <style> tag, or as a style attribute to a <div> tag), though it'd take more digging to figure out why. Regardless, I managed to change the font by inserting
The big caveat to any such approach is that anyone viewing the resulting HTML must have one of the listed fonts available, and I'm not sure if there's one that everyone has that has block elements. I'm sure consistent display is possible via WOFF, but it may take non-negligible work to do so.
I am having this same issue as Andrea. The documentation for skimr talks about being able to load the "DejaVu Sans" font using extrafont package. But This doesn't seem to work for me either.
---
title: "Untitled"
mainfont: DejaVu Sans
output:
html_document: default
pdf_document:
latex_engine: xelatex
word_document: default
font-family: Times New Roman
---
I reproduced this document without issue on my Mac (10.13.2). In the output HTML, the histograms glyphs fell all the way back to monospace inside the code block (which, in Chrome for my Mac, is Courier). I also tested Courier New using the developer tools and it still looks fine.
I also checked out the histograms in the final test (outside a code block) and they printed in Helvetic Neue. Helvetica, Arial, Times New Roman and sans-serif (which falls back to Helvetica Neue on my Mac) all looked good.
Given @Andrea is getting good histograms in console output but is getting the problematic Windows Unicode output in the rendered code block, it seems like something is going wrong with the Unicode output in the chunk
@Andrea or @ejlatour, are you able to host your rendered HTML file (based on the initial post) somewhere so we can take a peek at it with the Developer Tools? Let's verify whether the right glyphs are ending up in the output document, so that we can definitively cross the font issue off.
@rensa thanks for the answer and apologies for the delay, but I've been very busy these days. Here is a Google Drive link with the HTML file, let me know if you can access it:
Okay, yeah: going by your HTML output, it looks like the Unicode characters are being improperly printed by R, and then this is being put straight in the HTML file and the browser is interpreting the characters as HTML tags because they're surrounded by angled brackets:
Since you can output the histograms in the console without a problem, it seems like something's going wrong in the RMarkdown build process. Ista Zahn has documented the problems with Unicode and R on Windows, saying that:
On Windows there is a bug in print.data.frame that causes data.frame's with UTF-8 encoded columns to be displayed incorrectly in non UTF-8 locales.
My thinking is that maybe your locale is set correctly in your regular R sessions but the RMarkdown build session is using a different locale, causing your document to be built with this printing bug.
Adding a df_print option to the document front matter at the top, so that instead of the (buggy) print.data.frame, we use something else. I don't know if skimr works with kable or the other options, though, and this could have other side effects.
Adding a chunk at the top of the document (after the front matter) that changes the locale with Sys.setlocale(). I can't tell you what to set it to because I don't have access to a Windows machine, but you can call Sys.getlocale() to see a list of available ones. Are there any UTF-8 ones? (Ideally, it'd be something like "en_US.UTF-8")
That did get something to happen... I am working off a document similar to @Andrea right now thinking to just get one version of this markdown working with skimr.
The skimr documentation sometime a while back recommended to change the Sys.setlocale() to Sys.setlocale("LC_CTYPE", "Chinese"). When I made this change along with the df_print option, then I see histograms in one of@Andrea's tests. Then I see weird characters in the other. Luckily, it doesn't change the body text to Chinese or anything.
A SO discussion about Sys.setlocale() recommended using Sys.setlocale(category = "LC_CTYPE", locale = "English_United States.1252"). I tried this and it is the same as before.
Yeah, 1252 is an ASCII-like Windows encoding—I don't think that's gonna help with Unicode characters I saw the documentation recommending switching to Chinese. I'm not sure that's a fantastic idea, since it might have other unexpected effects (like how dates or currencies get formatted).
Still, I'm encouraged that we're getting somewhere! What other locales are listed when you run Sys.getlocale(), @ejlatour? Whoops, that just gets the current one.
The set of available languages, country/region codes, and code pages includes all those supported by the Win32 NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8. If you provide a code page like UTF-7 or UTF-8, setlocale will fail, returning NULL.
How do you go if you leave the locale as is and just add df_print: kable or tibble?
Bum The only other suggestion I have is from the skimr README, which mentions this issue precisely as a limitation: instead of using df_print: kable in the header, try wrapping output statements in knitr::kable(). But if that doesn't work, I'm stumped If you get really stuck you might have to use RStudio cloud or a cheap/free cloud compute instance to render your document
Hi @rensa@ejlatour thanks for the help. I got a little lost with the list of suggestions: if I understand correctly, everything suggested so far didn't work and the only remaining possibility is the WSL for Windows 10, right? I don't think I can do that (it seems I can't install Ubuntu on my work laptop) but I'll try.
PS If one of the suggestions was to use kable(skimr(mtcars)) without changing the locale, that's exactly what I did in my initial post: doesn't work.
Footnote to this whole discussion: Those Unicode characters in the tiny histograms cause problems in other contexts, too. A co-worker using skimr inside RMarkdown tried to produce a PDF file. The software tools (e.g., LaTaX) choked and crashed on the Unicode, saying "Invalid character". The only solution, of course, was to remove the tiny histograms from the table. Then everything ran OK.
@rensa Thanks so much for your efforts digging into this!! I don't know that I will go to the extent that you mention with Linux. I've tried with this package and many of the updates that they have put out. And after this latest endeavor, here are my parting thoughts (for now)
Without the histogram skimr does give a nice set of summary information (i.e. big picture overview) of a data frame. It's pipe-able and it give you the summary stats in tibble/data.frame. This is really useful.
I am able to run commands like skimr::skim(iris) then take the output and copy-paste to a .txt file. This seems to be a fair compromise. The skimr out put isn't something that I would use for a report, but it's still a powerful piece of documentation. So documenting it in a .txt file isn't too terribly off-putting. It's not an easily run .Rmd and requires a manual step, but I'm sort of okay with this for the time being.
The package has come a long way from the first iteration that I played with. And I am pleased that the developed have addressed things to get them to this point. So I am coming to terms with 2) above.
Bin width selection for histograms is a fraught topic. And so these histograms are not definitive and may mislead simply due to the way the data get binned. They're great for a first look, but by no means definitive and should be viewed cautiously.
It's a great package that does so much so well!! It's a bummer that there are the windows issues. I really appreciate your efforts looking into this. And thanks to @Andrea for bringing up something that I've toiled with for a bit.
@rensa no problem, I appreciate your support! @pteetor didn't know about the PDF issues - please let us know if @rensa fix works for you. @ejlatour I agree with your words of appreciation for skimr - I will keep using it, I think it's simpler and more tidyverse-friendly than most of the alternatives.
I will let you know if I find a solution. Bye for now,
latest update. Thanks to the support of elinw (I don't know her real name), I was able to make some progress but not to solve the issue. I performed the following steps:
I added some lines to the YAML header of my .Rmd file following the skeleton.Rmd file referenced in the Using fonts vignette of skimr