How can I read/parse the metadata from an rmarkdown file?

I need to read an RMarkdown file and parse its metadata as a List. I'd like something like this:

rmarkdown::read_metadata("my_file.Rmd")

I'm aware of the rmarkdown::metadata object, but that appears to only be available inside the Rmarkdown file itself when it is rendered.

Thank you,
-Shane

In the YAML header? Can you post reprex?

I'm not sure if there's already a canned solution. The example function below addresses the most basic metadata structure. Before I go further, I just wanted to see if this is what you had in mind.

read_metadata = function(file) {
  x = readr::read_lines(file)
  rng = grep("^---$", x)
  rng = rng + c(1, -1)
  x = x[rng[1]:rng[2]]
  names(x) = gsub("(.*):.*", "\\1", x)
  x = gsub(".*: (.*)", "\\1", x)
  as.list(x)
}

# Sample rmarkdown document is included at the end of this answer
read_metadata(file="test_metadata.rmd")
$title
[1] "\"My Title\""

$author
[1] "\"Author\""

$output
[1] "html_document"
---
title: "My Title"
author: "Author"
output: html_document
---

## R Markdown

This is an R Markdown document. 

```{r cars}
summary(cars)
```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

What I want to do is extract the params metadata - which may be arbitrarily complex:

---
title: Leaflet Demo
output: html_document
params:
  tiles:
    label: "Tiles"
    value: "normal"
    input: "select"
    choices:
      - "normal"
      - "monochrome"
---

...
2 Likes

Clearly the rmarkdown library is already doing this parsing. I'm just trying to figure out how to get access to it.

Actually, pandoc. I’ll dig out an old post.

Thanks. Here's what I found four years ago:

Buried in help(rmarkdown) lies the secret sauce to pass to pandoc the command line arguments that will sneak an invisible filter through which will tickle its imput to pummel the raw output of xtable into presentable \LaTeX and, thus, ultimately, into polished tables that don't require hand tweaking.

In the forepart yaml, add

output:
  rmarkdown::tufte_handout:
    pandoc_args: [
      "--filter", "/Users/rc/bin/style1"
         ]

where style1 is a program that reads from stdin and writes to stdout, parsing the output of pandoc and feeding it back into pandoc for further rendering.

While pandoc's API exposes a serialization of the abstract syntax tree (AST) that it uses to internally represent the source document for transformation into the target document and allows free access to anything in the yaml forematter for use in a document template, the functionality to directly access the yaml fields for anything else is suppressed. Unless you want to fork the pandoc project and refactor the code (in Haskell) to permit this, keeping your programmatic edits in the forematter is not in the cards, at least to the dealer at this table.

I was able to tweak the yaml header with

    output:
      rmarkdown::tufte_handout:
        pandoc_args: [
          "--filter", "/Users/rc/bin/style1"
             ]

where style1 is a Haskell program that reads from stdin and writes to stdout, parsing the output of pandoc and feeding it back into pandoc for further rendering.

My requirement was for formatting. My recollection is that my plan if I wanted to draw on arbitrary yaml was to run the *.Rmd file through a pre-processor parser to create chunks of R code based on the header.

Things might have changed in the past four years. While I've keep current with both rmarkdown and pandoc, I haven't been specifically monitoring this issue. Looking at the latest help(rmarkdown) there doesn't seem to be any suggestion of grabbing arbitrary yaml. I'd have to re-read the current pandoc source code to retract my earlier conclusion on the limitation I described. But I'm a bit rusty.

I think you are looking for rmarkdown::yaml_front_matter function.

Here is how it works in your example

content <- c(
  "---",
  "title: Leaflet Demo",
  "output: html_document",
  "params:",
  "  tiles:",
  "    label: Tiles",
  "    value: normal",
  "    input: select",
  "    choices:",
  "      - normal",
  "      - monochrome",
  "---",
  "\n",
  "This is a test document")
rmd_file <- tempfile(fileext = ".Rmd")
xfun::write_utf8(content, rmd_file)
yml_metadata <- rmarkdown::yaml_front_matter(rmd_file)
params <- yml_metadata[["params"]]
str(params)
#> List of 1
#>  $ tiles:List of 4
#>   ..$ label  : chr "Tiles"
#>   ..$ value  : chr "normal"
#>   ..$ input  : chr "select"
#>   ..$ choices: chr [1:2] "normal" "monochrome"
unlink(rmd_file)

Created on 2019-12-22 by the reprex package (v0.3.0.9001)

Also, if you want to play with yaml header, there is this new package that may be useful.

Hope it helps.

2 Likes

That's a great find! It looks to do exactly what's needed. To see, knit the following Rmd

---
title: "ymlthis demo"
author: "Richard Careaga"
date: "12/20/2019"
output: html_document
params:
  a_string: "yadayadayada"
  an_int: 1
  a_dbl: 2.14
---

```{r setup, include=FALSE}
library(ymlthis)
knitr::opts_chunk$set(echo = TRUE)
params$a_string
class(params$a_string)
params$an_int
class(params$an_int)
params$a_dbl
class(params$a_dbl)

Sorry but I am not sure to understand your example.
Ymlthis seems useful only outside rmarkdown document to help creates yaml headers.
In your example, I don’t see what you use ymlthis for. The ˋparams$` you use is just how parametric Rmd works. I am curious of what I missed... :thinking:

1 Like

You're absolutely right! ymlthis had no role! I'll have to go back to rmarkdown and find out when I missed params. In 2015, I was looking for a way to bring yaml arbitrary header variables into chunks without success.

I could be wrong again in this thread, but it now appears to be that params will only work one layer deep. Could you live with

params$tiles_label
params$tiles_value
params$tiles_input

Not sure to see what you mean. Parametrized reports are explained here

If you want to use a parameter it is only params$name and you'll get the value. You don't need the other field that are used to create a shiny widget interface for interactively setting the parameter.

Let's note try to guess what @shanebdavis really wants to do, and see if rmarkdown::yaml_front_matter() is what he is looking for. :smile:

1 Like

Sorry for the slow reply -- holidays. What I want to do is read the Yaml from -outside- the rmarkdown file, not from code -inside- the rmarkdown file. Specifically, I'm going to be serving rmarkdown files and I want to extract their metadata for use in the server processes.

rmarkdown::yaml_front_matter is -exactly- what I needed. Thank you!

1 Like

Great. Please mark the solution for the benefit of those to follow.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.