Is there a way to extract the names of all functions in an r script?

To start: this was fun for me. Thanks for this cool problem!

You can use the parse function to convert a script into an expression. Then you can go through the expression, collect any functions called, flatten subexpressions, and repeat until nothing's left to flatten.

We can identify expressions because they have a length: the number of subexpressions plus tokens they contain. Tokens are the smallest unit of a language. For example, 1 + 2 has three tokens: 1, +, and 2.

get_calls <- function(filepath) {
  code <- parse(filepath)
  tokens <- as.list(code)
  calls <- c()
  while (TRUE) {
    any_unpacked <- FALSE
    for (ii in seq_along(tokens)) {
      part <- tokens[[ii]]
      # Calls always have the function name as the first element
      if (is.call(part)) {
        fun_token <- part[[1]]
        calls <- c(calls, deparse(fun_token))
      }
      # Expressions have a length
      if (length(part) > 1) {
        tokens[[ii]] <- as.list(part)
        any_unpacked <- TRUE
      }
    }
    tokens <- unlist(tokens)
    if (!any_unpacked) break
  }
  unique(calls)
}

Here's it run against an example script: ~/example.R:

# ~/example.R
library(dplyr)

iris_plot <- iris %>%
  mutate(id = sample(c(1:10, 99), n(), replace = TRUE)) %>%
  rename_all(tolower) %>%
  rename_all(stringr::str_replace, pattern = ".", replacement = "_")

p <- print
p("Hello, world!")
getFunction("message")("Hello, again!")

The result:

get_calls("~/example.R")
#  [1] "library"                  "<-"                      
#  [3] "p"                        "getFunction(\"message\")"
#  [5] "%>%"                      "getFunction"             
#  [7] "rename_all"               "mutate"                  
#  [9] "sample"                   "c"                       
# [11] "n"                        ":"

Where the function fails:

  • Functions as objects (it didn't pick up tolower or gsub)
  • Functions going by other names (it didn't pick up print)
  • Functions retrieved dynamically (it didn't pick up message)
  • Probably a bunch of other edge cases

This would only find functions defined in the script, not the ones used. But I like the idea of running the script to create the rats nest of environments. Then maybe we could pair up parsed expressions with the environments they're run in.

Definitely a lot of ways to approach this.

6 Likes