Knit to pdf with css/html or convert html file to pdf file.

So I created a md file, a css file, a Rmd file and a R file.
All saved in the same directory.

Scenario 1 use R file
the R file used markdownToHTML to convert the md file into HTML file with the CSS code embedded from the css file. It has nice formatting such as borders and colors in its headers for tables. However I wanted to do the same to PDF with the same css code embedded but wasn't successful or know the code that would do that. Does anyone have any idea ?
To convert from md to html I use markdownToHTML("md file", "html file", stylesheet = "cssfile")

Scenario 2 use file.Rmd
with the file.Rmd tab or code I can Knit to html but the css codes are not embedded or apply (not sure why or how I can apply it), it however applies the file.md html codes giving it some table structures. I also Knit to PDF, however the table (md file) that works for knit to HTML doesn't apply to it, while I get a PDF file, it's not structure at all (css codes from the css file are not embedded either). I like to be able to Knit to PDF with the css codes embeded from the CSS file and apply any html codes giving it better formatting.

The md file has some html structure giving it a table like, which is why Knit to hmtl gives table structure but doesn't embed the css file which has better table formatting. Unfortunately the Knit to PDF doesn't apply any html (md file) or css (css file) so there is no table format at all.

Or since the first scenario has nice formatting, I can convert the html output to PDF from a browser ?

Can you advise for the two scenarios ?
(In short, I wanted to use two scenarios to convert or Knit to PDF with the nice html formatting onto PDF)

You can format anything anyway in LaTeX. It's just not how you want to spend the rest of the year.

Two suggestions. The new pagedown package converts HTML to pdf on a paged basis. I haven't used it and the illustrations show gang of four pages, but if it will spit out a sequence of single pages, that has the best prospect of getting you output near the HTML.

The other is the Official Swiss Army Knife™️ of document converters pandoc, which undergirds knitr and rmarkdown, with an optional layover in tex, where you can labor to get it "just right."

2 Likes

CSS will only apply to HTML output. PDF is resulting of PANDOC conversion to .tex then pdf. You need to provide styling for LaTex, CSS won't be taken into account.

With html, you can print you document using chrome to get a pdf that match the HTML rendering. It is what is used in pagedown to get pdf from the pagedjs rendering. You don't need pagedown if you want to get the them for classic rmarkdown. chrome_print function could help though if you want to print programatically. Otherwise, just open the document in chrome and print. You need to take care of dynamic element that won't render correctly in static format like pdf though.

2 Likes

Printing HTML directly causes one big headache -- page breaks. LaTeX let's you control page breaks, in a Faustian bargain for being tortured in keeping floats, like figures and tables, where they should be.

2 Likes

@ technocrat & cderv,

Thought I put my message in one response.
Thank you both for your reply and guidance.

I tried using Pandoc but wasn't successful in it.
this is my short code. system("pandoc -s file.html -o file.pdf")

Could you please provide an example that works with Pandoc, I just want to try and see ?
I only see the created html file in IE and not chrome.

It seems from reading your responses that LaTex is the best way to go for the best results in PDF formatting, however I think it may be a lot of work than to just create html and css then converting it to PDF from html.

Could someone provide an example of LaTex formatting in PDF, so I can see.

I think I might preferred the converting html to pdf methods since html and css is widely know and is easier this way then to learn and to use LaTex unless you disagree.

1 Like

I should have made clear that I was referring to command line pandoc

$  pandoc -f html -t latex -o example.pdf  MovieLens.html

does a reasonably good job, but you may have to take an intermediate stop in LaTeX to get things exactly as you want

$  pandoc -f html -t latex -o example.pdf  example.pdf

This [gist] (https://gist.github.com/technocrat/8adeaa4b649617adaa4eafa5c8178458) shows you how ugly tex can get. I really can't recommend it without a compelling reason.

I use this code from you and made a few little changes..

pandoc -f html -t latex -o example.pdf file.html

Took out this $ symbol and changed MovieLens.html to file.html. File is saved in the directory. I changed the directory in command prompt to where the file is saved. I ran the changed code and gotten message:

pandoc: Error producing PDF from TeX source. ! LateX Error: File 'article.cls' not found.

Like R, LaTeX has packages, and, like R, you are always running across some you don't have installed on your system. (Or, in come cases, do have installed, but not where they are expected to be found or, more rarely, conflict with other packages.)

https://www.latex-project.org is the best starting point for making sure that your installation is set up correctly.

Again, before embarking on this venture, checkout the R pagedown package to see if it will give you the appearance you are looking for when converted to pdf.

If you want to use pandoc directly, know that rmarkdown has wrapper around the executable.

rmarkdown::pandoc_convert() is equivalent of the command line you want to use here.

2 Likes

Printing HTML directly causes one big headache -- page breaks. LaTeX let's you control page breaks, in a Faustian bargain for being tortured in keeping floats, like figures and tables, where they should be.

pagedown is the remedy to this big headache :smile:
This is now easy to control page breaks in html

4 Likes

I had a similar situation. I ended up knitting my Rmd files to html, and then using wkhtmltopdf to convert them to PDF format. Works pretty well.

@tgiordano,

Thanks for sharing the wkhtmltopdf function.
I took a look into it and it looks like it need to install a software to use it or use it in command prompt.
Unless there is an easier way such as placing the code into the R program to use it otherwise is not as efficient. I am also having a difficult time using it to make it work too.

@cderv,
I tried your suggestion pandoc_convert with the below code, I have error message.
Please advise !

Error: pandoc document conversion failed with error 9

rmarkdown::pandoc_convert("f.html", to = "a.pdf")
pandoc_convert("f.html", to = "a.pdf")

If it was me, I would preferred to convert HTML file into PDF file than to use LaTex because
I like to code in HTML and CSS too. LaTex is new to me, it's the first time I've heard of it.

1 Like

If that's the case, then I would also suggest the pagedown package as mentioned above.

2 Likes

This is a wrapper around pandoc, and argument are close to what pandoc is waiting. to is the format argument for output, like in pandoc Manual. You can't pass the output file. You need to use output for that. Read ?pandoc_convert

Try rmarkdown::pandoc_convert("f.html", output = "a.pdf") or rmarkdown::pandoc_convert("f.html", to = "pdf") (output will be f.pdf here)

@john01

I will try to summarize the whole range of opportunities to convert from Rmd to pdf using HTML and CSS.
Disclaimer: this is an opinionated post.

First of all, keep in mind how R Markdown works:

  • the first step is to execute code chunks and transform the Rmd file to a md file: this is the job of knitr
  • the second step is to transform the md file to a HTML file: this is the job of Pandoc (Pandoc is an external software)

In order to produce a pdf, you need an extra step: convert HTML (with CSS) to pdf. There are many tools to achieve this task. IMO, they belong in two categories:

  • tier softwares: HTML/CSS to PDF engines
  • in-browser solutions (i.e. JavaScript libraries)

HTML/CSS to PDF engines

Here's a short list of softwares I know: html2pdf, wkhtmltopdf, PhantomJS, weasyprint, PrinceXML, PDFreactor, Antennahouse, Oxygen PDF Chemistry...

FMPOV, the main differences between these rendering tools are:

In-browser solutions

With you browser, you already can print to PDF.
In this category, you will find any client-side JavaScript libraries: jsPDF, html2canvas, Print.js, Vivliostyle.js, Paged.js...

FMPOV, the main differences are:

  • tools building PDF (or images) in browser vs. tools enhancing the HTML content in order to control the PDF generated by the browser
  • tools supporting the CSS Paged Media standard

Integration with R Markdown

Tier softwares

Pandoc has a native support for wkhtmltopdf, weasyprint and PrinceXML (you need to install these softwares). See https://pandoc.org/MANUAL#creating-a-pdf

As @cderv said, you can use them directly from R:

rmarkdown::pandoc_convert("file.html", output = "with_wkhtmltopdf.pdf", to = "html5")

rmarkdown::pandoc_convert("file.html", output = "with_weasyprint.pdf", to = "html5", options = c("--pdf-engine", "weasyprint"))

rmarkdown::pandoc_convert("file.html", output = "with_prince.pdf", to = "html5", options = c("--pdf-engine", "prince"))

I am opinionated about HTML/CSS to PDF converters and only consider tools that support CSS Paged Media standard (so, I don't use wkhtmltopdf). I developed the weasydoc package to ease my workflow with weasyprint and PrinceXML.

In-browser solutions

The main advantage of client-side libraries is that you can simply use your browser to create a PDF.

I will not develop the pros and cons of the different JavaScript libraries. In short, my recommendation is to use Paged.js (but feel free to test and use any other library).

Paged.js is used in the pagedown package and so, you have a native solution to create a PDF from R Markdown using HTML/CSS. I am much more comfortable with Paged.js than with any other tools or libraries, so I tend to recommend it.

10 Likes

Thanks for the informative message.
I use the below and gotten error message
Error: pandoc document conversion failed with error 41:

rmarkdown::pandoc_convert("file.html", output = "with_wkhtmltopdf.pdf", to = "html5")

Very nice, beautiful !
Your first code works like a charm.
However the table format from the HTML looks different than when is on PDF.
Your second code gives error:
Error: pandoc document conversion failed with error 9

This means to Pandoc did not fin wkhtmltopdf. You need to install it and wkhtmltopdf has to be in your PATH.

This will transform the html file to a LaTeX file (then the pdf is built with LaTeX)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.