There is a way, but it uses an unexported function from the tools package, so it's not guaranteed to continue working.
Instead of just dumping code with no explanation, I'll detail the process. First I looked in the R Internals manual for how help pages are handled for installed packages. It says they are stored in parsed form (hooray!) in an .rdb or .rdx file in the package's help folder.
So the first function reads that file. This is where I used the unexported function. The parsed documentation is returned as a list.
#' Create a list of section content from every documentation page from a package
#' @param pkg String naming the package
#' @return A list whose names are documentation page titles. Each element is
#' itself a list of the details and sections for a page.
package_docs <- function(pkg) {
help_dir <- system.file("help", package = pkg)
db_path <- file.path(help_dir, pkg)
tools:::fetchRdDB(db_path)
}
rdb <- package_docs("ggplot2")
names(rdb)[1:5]
# [1] "absoluteGrob" "add_theme" "aes" "aes_" "aes_all"
str(rdb[1], max.level = 1)
# List of 1
# $ absoluteGrob:List of 7
# ..- attr(*, "Rdfile")= chr "D:/temp/Rtmp6to88Q/R.INSTALL13f6c7f85557d/ggplot2/man/absoluteGrob.Rd"
# ..- attr(*, "class")= chr "Rd"
# ..- attr(*, "meta")=List of 1
# ..- attr(*, "srcref")= 'srcref' int [1:6] 0 0 17 1 0 1
# .. ..- attr(*, "srcfile")=Class 'srcfile' <environment: 0x0000000021529908>
# ..- attr(*, "prepared")= int 3
The elements of this list are themselves lists, one for each manual page. Each page's list is broken out by the help page's sections.
str(rdb[[1]][1:2], max.level = 1)
# List of 2
# $ :List of 1
# ..- attr(*, "Rd_tag")= chr "\\title"
# ..- attr(*, "srcref")= 'srcref' int [1:6] 5 1 5 21 1 21
# .. ..- attr(*, "srcfile")=Class 'srcfile' <environment: 0x0000000021529908>
# $ :List of 1
# ..- attr(*, "Rd_tag")= chr "\\name"
# ..- attr(*, "srcref")= 'srcref' int [1:6] 3 1 3 19 1 19
# .. ..- attr(*, "srcfile")=Class 'srcfile' <environment: 0x0000000021529908>
The sections are classified using the "Rd_tag" attribute, so extracting the ones we want is simple. For generality, the function will return any number of sections as a list.
extract_sections <- function(doc_page, sections) {
tags <- vapply(doc_page, attr, character(1), "Rd_tag")
tags <- gsub("^\\\\", "", tags)
doc_page[match(sections, tags)]
}
title_and_name <- extract_sections(rdb[["element"]], c("title", "name"))
title_and_name
# [[1]]
# [[1]][[1]]
# [1] "Theme elements"
# attr(,"Rd_tag")
# [1] "TEXT"
#
# attr(,"Rd_tag")
# [1] "\\title"
#
# [[2]]
# [[2]][[1]]
# [1] "margin"
# attr(,"Rd_tag")
# [1] "VERB"
#
# attr(,"Rd_tag")
# [1] "\\name"
So the sections are, again, lists. And the "arguments" section is no prettier.
margin_args <- extract_sections(rdb[["element"]], "arguments")[[1]]
margin_args[1:2]
# [[1]]
# [1] "\n"
# attr(,"Rd_tag")
# [1] "TEXT"
#
# [[2]]
# [[2]][[1]]
# [[2]][[1]][[1]]
# [1] "t, r, b, l"
# attr(,"Rd_tag")
# [1] "TEXT"
#
#
# [[2]][[2]]
# [[2]][[2]][[1]]
# [1] "Dimensions of each margin. (To remember order, think trouble)."
# attr(,"Rd_tag")
# [1] "TEXT"
#
#
# attr(,"Rd_tag")
# [1] "\\item"
What one can see from this:
- Blank lines are included as lists containing only
"\n"
- Parameter lines have two sublists (again): one with the parameter name(s), and the other with the description. Multi-line text is stored as a character vector with an element for each line.
I think a matrix would be a simple output, so this next function returns one with a column for the names and a column for the descriptions.
tablify_arguments <- function(arg_section) {
param_lines <- arg_section[lengths(arg_section) == 2]
collapse_index <- function(x, index) {
paste0(x[[index]], collapse = "")
}
params <- vapply(param_lines, collapse_index, character(1), index = 1)
descriptions <- vapply(param_lines, collapse_index, character(1), index = 2)
descriptions <- gsub("\n", " ", descriptions)
cbind(params = params, descriptions = descriptions)
}
arg_table <- tablify_arguments(margin_args)
arg_table[1:3, ]
# params descriptions
# [1,] "t, r, b, l" "Dimensions of each margin. (To remember order, think trouble)."
# [2,] "unit" "Default units of dimensions. Defaults to \"pt\" so it can be most easily scaled with the text."
# [3,] "fill" "Fill colour."
Note that my solution drops all \LaTeX-like markup from the content, but it's still present in the section's list as attributes. Anyone ambitious enough can figure out how to recover it.
margin_args[[26]][[2]][[2]]
# [[1]]
# [[1]][[1]]
# [1] "grid::arrow()"
# attr(,"Rd_tag")
# [1] "TEXT"
#
# attr(,"Rd_tag")
# [1] "\\link"
# attr(,"Rd_option")
# [1] "grid:arrow"
# attr(,"Rd_option")attr(,"Rd_tag")
# [1] "TEXT"
#
# attr(,"Rd_tag")
# [1] "\\code"