Programmatically accessing @param fields values of functions from installed package


#1

is there a way to assign to a global environment variable the @param field value of a function in an installed package, via ({tools} or {utils})?


#2

There is a way, but it uses an unexported function from the tools package, so it's not guaranteed to continue working.

Instead of just dumping code with no explanation, I'll detail the process. First I looked in the R Internals manual for how help pages are handled for installed packages. It says they are stored in parsed form (hooray!) in an .rdb or .rdx file in the package's help folder.

So the first function reads that file. This is where I used the unexported function. The parsed documentation is returned as a list.

#' Create a list of section content from every documentation page from a package
#' @param pkg String naming the package
#' @return A list whose names are documentation page titles. Each element is
#'   itself a list of the details and sections for a page.
package_docs <- function(pkg) {
  help_dir <- system.file("help", package = pkg)
  db_path <- file.path(help_dir, pkg)
  tools:::fetchRdDB(db_path)
}

rdb <- package_docs("ggplot2")
names(rdb)[1:5]
# [1] "absoluteGrob" "add_theme"    "aes"          "aes_"         "aes_all"
str(rdb[1], max.level = 1)
# List of 1
#  $ absoluteGrob:List of 7
#   ..- attr(*, "Rdfile")= chr "D:/temp/Rtmp6to88Q/R.INSTALL13f6c7f85557d/ggplot2/man/absoluteGrob.Rd"
#   ..- attr(*, "class")= chr "Rd"
#   ..- attr(*, "meta")=List of 1
#   ..- attr(*, "srcref")= 'srcref' int [1:6] 0 0 17 1 0 1
#   .. ..- attr(*, "srcfile")=Class 'srcfile' <environment: 0x0000000021529908> 
#   ..- attr(*, "prepared")= int 3

The elements of this list are themselves lists, one for each manual page. Each page's list is broken out by the help page's sections.

str(rdb[[1]][1:2], max.level = 1)
# List of 2
#  $ :List of 1
#   ..- attr(*, "Rd_tag")= chr "\\title"
#   ..- attr(*, "srcref")= 'srcref' int [1:6] 5 1 5 21 1 21
#   .. ..- attr(*, "srcfile")=Class 'srcfile' <environment: 0x0000000021529908> 
#  $ :List of 1
#   ..- attr(*, "Rd_tag")= chr "\\name"
#   ..- attr(*, "srcref")= 'srcref' int [1:6] 3 1 3 19 1 19
#   .. ..- attr(*, "srcfile")=Class 'srcfile' <environment: 0x0000000021529908>

The sections are classified using the "Rd_tag" attribute, so extracting the ones we want is simple. For generality, the function will return any number of sections as a list.

extract_sections <- function(doc_page, sections) {
  tags <- vapply(doc_page, attr, character(1), "Rd_tag")
  tags <- gsub("^\\\\", "", tags)
  doc_page[match(sections, tags)]
}

title_and_name <- extract_sections(rdb[["element"]], c("title", "name"))
title_and_name
# [[1]]
# [[1]][[1]]
# [1] "Theme elements"
# attr(,"Rd_tag")
# [1] "TEXT"
# 
# attr(,"Rd_tag")
# [1] "\\title"
# 
# [[2]]
# [[2]][[1]]
# [1] "margin"
# attr(,"Rd_tag")
# [1] "VERB"
# 
# attr(,"Rd_tag")
# [1] "\\name"

So the sections are, again, lists. And the "arguments" section is no prettier.

margin_args <- extract_sections(rdb[["element"]], "arguments")[[1]]
margin_args[1:2]
# [[1]]
# [1] "\n"
# attr(,"Rd_tag")
# [1] "TEXT"
# 
# [[2]]
# [[2]][[1]]
# [[2]][[1]][[1]]
# [1] "t, r, b, l"
# attr(,"Rd_tag")
# [1] "TEXT"
# 
# 
# [[2]][[2]]
# [[2]][[2]][[1]]
# [1] "Dimensions of each margin. (To remember order, think trouble)."
# attr(,"Rd_tag")
# [1] "TEXT"
# 
# 
# attr(,"Rd_tag")
# [1] "\\item"

What one can see from this:

  • Blank lines are included as lists containing only "\n"
  • Parameter lines have two sublists (again): one with the parameter name(s), and the other with the description. Multi-line text is stored as a character vector with an element for each line.

I think a matrix would be a simple output, so this next function returns one with a column for the names and a column for the descriptions.

tablify_arguments <- function(arg_section) {
  param_lines <- arg_section[lengths(arg_section) == 2]
  collapse_index <- function(x, index) {
    paste0(x[[index]], collapse = "")
  }
  params       <- vapply(param_lines, collapse_index, character(1), index = 1)
  descriptions <- vapply(param_lines, collapse_index, character(1), index = 2)
  descriptions <- gsub("\n", " ", descriptions)
  cbind(params = params, descriptions = descriptions)
}

arg_table <- tablify_arguments(margin_args)
arg_table[1:3, ]
#      params       descriptions                                                                                    
# [1,] "t, r, b, l" "Dimensions of each margin. (To remember order, think trouble)."                                
# [2,] "unit"       "Default units of dimensions. Defaults to \"pt\" so it can be most easily scaled with the text."
# [3,] "fill"       "Fill colour."

Note that my solution drops all \LaTeX-like markup from the content, but it's still present in the section's list as attributes. Anyone ambitious enough can figure out how to recover it.

margin_args[[26]][[2]][[2]]
# [[1]]
# [[1]][[1]]
# [1] "grid::arrow()"
# attr(,"Rd_tag")
# [1] "TEXT"
# 
# attr(,"Rd_tag")
# [1] "\\link"
# attr(,"Rd_option")
# [1] "grid:arrow"
# attr(,"Rd_option")attr(,"Rd_tag")
# [1] "TEXT"
# 
# attr(,"Rd_tag")
# [1] "\\code"

#3

wow that is a great way! very different from what i got to.

thanks for the answer i learned new stuff for extraction.



query_help_params <- function(f,ns,port = tools::startDynamicHelp(NA)){
  
  if(!ns%in%loadedNamespaces())
    loadNamespace(ns)
  
  LINES <- readLines(sprintf('http://127.0.0.1:%s/library/%s/html/%s.html',port,ns,f),warn = FALSE)
  
  if(length(LINES)==0)
    return(NULL)
  
  LINE <- paste0(LINES,collapse = '\n')
  
  HTML <- xml2::read_html(LINE)
  
  TABLES <- rvest::html_table(HTML)
  
  if(length(TABLES)>1)
    TABLES <- TABLES[[2]]
  
  tibble::as_tibble(TABLES)
  
}

query_help_params(f = 'pluck',ns = 'purrr')
#> starting httpd help server ... done
#> # A tibble: 4 x 2
#>   X1       X2                                                             
#>   <chr>    <chr>                                                          
#> 1 .x       A vector or environment                                        
#> 2 ...      "A list of accessors for indexing into the object. Can be\nan …
#> 3 .default Value to use if target is empty or absent.                     
#> 4 attr     An attribute name as string.

i tried to target better the table by id name but could get it to work well, this solution assumes that format is not set to NULL in roxygen2.

Created on 2018-09-21 by the reprex
package
(v0.2.0).


#4

@jimhester the first solution can be a great introspection study case for {roomba}.


#5

Your solution's pretty slick.

I thought about going through the HTML as well, but using tools::Rd2HTML() on what comes out the .rdb file. Here's an XML path that should get you the arguments table:

TABLES <- xml_find_first(HTML, "//table[@summary = 'R argblock']")

#6

i got a 0 length when i tried that. i'll try again. thanks


#7

that works now. thanks

  HTML <- xml2::read_html(LINE)
  
  XML_TABLE <- xml2::xml_find_first(HTML, "//table[@summary = 'R argblock']")
  
  if(is.na(XML_TABLE))
    return(NULL)
  
  rvest::html_table(XML_TABLE)


#8

@nwerth my solution creates problems in cmd check and testing (non interactive() environments). Yours may be more robust in that sense.