Bookdown not exposing output files/output directory in post-processor

Hi,

I am wondering if anyone can help me work through the following two questions about bookdown. Let's use gitbook() as the example format, to make things clear.

  1. The post_processor() method for the gitbook format receives metadata, input, output, clean, and verbose as arguments, however, the output is not actually the final output file(s) but rather the HTML file before splitting (i.e. _main.html, by default). I am wondering if bookdown exposes the actual final filenames anywhere in the rendering process such that these split HTML files can be post-processed.

  2. The second issue I'm facing is that even exposing _main.html as output in the post_processor() is of limited use because this file actual lives in whatever directory is specified by bookdown::render_book(output_dir = ...), which is _book/, by default.

So even if I force the gitbook to be a single file by specifying gitbook(split_by = "none"), I can't safely post-process this file (output) without knowing the subdirectory it lives in. Does bookdown/render_book() expose any meta data that can be used in the post-processor to learn the output_dir and paste it together with output to form the proper relative path??

Thanks for your help!!

EDIT: Putting this code in the format$post_processor(): meta <- as.list(parent.frame()) does allow me to see a whole bunch of the rendering meta data, such as output_file and output_dir, however, output_dir is shown as ".", even though the HTML files are saved in _book/.

It would be interesting to understand what you want to do exactly with bookdown while tweaking those internal behavior, this would help answering relevantly to those questions. The current exposing way may not be the one want, but I think you can create new output if necessary.

The simple HTML book format could probably be a good start to provide a custom template for a new book style: 3.1 HTML | bookdown: Authoring Books and Technical Documents with R Markdown

Otherwise, bookdown::bs4_book() went deeper and is another example.

I'll try anyway to answer generally.

post_processor step is run after pandoc conversion on the resulting file, as with any rmarkdown output format. In bookdown, the pandoc conversion is done on a single file which is the split using the internal split_chapters() and a build function - the resulting file names are determined there and are split there too. Their exposition is inside those functions.
But it is not done in a way that you can post process those files during the rendering. But nothing prevent you to post process the HTML files after the rendering process when those files exists in the output dir

At the post processing step, the _main.html and other files have not yet been moved to the output dir. So I am not quite sure to understant the subdirectory issue.

Hope it helps

1 Like

Thank you very much for the detailed response, @cderv.

I actually quite like the gitbook format. I think after reading over your response I can now simplify what I am hoping to achieve - I have a function that I want to run once per file for every file AFTER the splitting has taken place. This function would take an HTML file as input, and return an HTML file after some processing has taken place. I am just looking for a programmatic way to hook into and loop over the final set of files.

But it is not done in a way that you can post process those files during the rendering.

Is there any desire/interest for bookdown to support a feature to hook into the files after splitting, such as with a method like format$post_splitting() or something? This seems like it would further enhance a developers ability to customize render aspects.

But nothing prevent you to post process the HTML files after the rendering process when those files exists in the output dir

But this would have to be done manually by the user after clicking knit, right? I am trying to build the format in such a way that clicking knit does everything behind the scenes, which requires the processing of the split files as part of the rendering process.

At the post processing step, the _main.html and other files have not yet been moved to the output dir.

Hmm, I will have to double check this. I had set up the function to run in the post_processor() and it was unable to find the file, but maybe I did something wrong.

If you want to build upon gitbook() which is another output format, I think the right way would be like any rmarkdown output format: build a new one using the other one as base.
This can be done with output_format() function. In there you can add a post processor step that would run after gitbook one I believe, so the splitting and moving would have take place.
Did you try this way ?

Maybe some things are missing in this case, and indeed the files would have already moved in the output_dir.
So maybe you have tried that already.

Post splitting process is quite specific. Between building on top of existing output formats, or creating a new output format (that would do the splitting / building), I don't know which way is the better.

However, know that if you want the Knit button to work in a bookdown project, it relies on render_site(). You could then create a new website format, different than bookdown::bookdown_site() which could call render_book(), then do the post processing after that.
This would only require to change the site: YAML key with a custom site generator I believe.

At command, users would have to use rmarkdown::render_site() and not bookdown::render_book()

Would that be something interesting to you?

I think the right way would be like any rmarkdown output format: build a new one using the other one as base.
This can be done with output_format() function. In there you can add a post processor step that would run after gitbook one I believe, so the splitting and moving would have take place.
Did you try this way ?

I haven't tried this yet but it seems to me that the names of the split files and the output dir are never exposed anywhere so there doesn't seem to be a safe way for knowing the names of the files and the relative paths to where they live for the post_processor in the new output_format(). Is this correct?

Post splitting process is quite specific.

With most other R Markdown formats that render just a single file, the post_processor basically lets you process the final file, because in most cases the pandoc conversion is the final step.

With bookdown, the splitting and building takes place after pandoc, so in some sense bookdown does not really offer the same ability to hook into the final files compared to other formats. Do you agree?

Would that be something interesting to you?

I hadn't considered creating a site generator. After looking over the bookdown_site() code it offers some good leads and may be helpful. But I guess I was still hoping split_chapters() or render_book() would somehow expose the files it creates after splitting.

I understand the problem you see here. I guess you can open a feature so that we look at considering in the future better exposing the splitted file. Currently only one file path is outputed by split_chapter.
I don't know how this could be done the best so that it comply with how the rmarkdown post processor is working, and how bookdown splitting happens;
I guess something could be done using an internal storage that could be retrieve during post processing step.
I agree that currently it is not easy to that - only workaround would be to retrieve the output dir from the YAML and list the files in the folder.

bs4_book() solved this issue by running split_chapters it self and registering the file name in the build function passed to split chapters. That is not the easiest but it works for a new format including in bookdown.

Have a new hook system to insert before splitting and after splitting could also be considered. I don't know if this is best or not.

Anyway, you should open an issue so it is tracked. However, unfortunately, we don't plan to spend time on bookdown very soon to add new features, as we need to focus on others packages for the coming months.
So if you want to spend time on a PR and you come up with something that has low impact, this would be the fastest way to have such features included.

Thanks for the discussion, and this great idea!

1 Like

Thanks for your help. I filed an issue and wanted to share it here for posterity: https://github.com/rstudio/bookdown/issues/1256.

1 Like