Is there a way to extract the names of all variables in another r script?

Hello,

My question is similiar to
Is there a way to extract the names of all functions in an r script? where somebody asked "Is there a way to extract the names of all functions in an r script?" but I would need only the variables, not the functions, of another r script (not the currently running one or the current environment).

Is there a way to do that?

Thanks in advance.

This sounds similar to the examples used in Hadley's Advanced R. In Chapter 18, he does something like this as an exercise, so you should check it out and see if it helps you to do this.

Hi,

This seems like a fun challenge, but I agree there might be something in the Advanced R book to help. I had some ideas using regex, but that's always more dangerous because you could forget certain situations :slight_smile:

One of the situations that immediately comes to mind is this: do you consider variables assigned within a function as a variable name to be returned? They are not saved in the R environment, because they only live within the function's environment.

a = 1
b = function(x, y = 1){
  c = x + y + 1
  return(c)
}
d = b(1)

So do you want to say that a, c and d are variables, or only a and d? And what about y? It is an argument in a function, but a variable within...

PJ

My goal: I want to write a function that loads a file and checks whether all its variables are named correctly (in regards to an internal naming convention).

So I would like to have a, c, d, x and y.

(Functions will be checked in another, seperated function).

Hi,

That complicates things a bit of course, because that means if you want to check arguments as well, this should only be for user defined functions, as other functions might not have arguments that are following your internal standards.

Is this for checking student's code writing or something? Again, it's a cool challenge, but I'm curious about the application.

PJ

It's just a simple tool to improve our company's code style.

Hi,

Here is an attempt of mine using regex. I'm sure there will be glitches in this, especially with the tidyverse and the way it creates variables, but at least it's a start I guess. I had a lot of fun coding it :stuck_out_tongue:

scriptToExamine.R

library(stringr)
library(dplyr)

checkScriptVars = function(script){
  
  myScript = readLines(script)
  
  #Remove Quoted text with "" or ''
  myScript = str_remove_all(myScript, "[\\\\]+\\\"")
  myScript = str_remove_all(myScript, "\"[^\"]+\"")
  
  myScript = str_remove_all(myScript, "[\\\\]+\'")
  myScript = str_remove_all(myScript, "'[^']+'")  
  
  # Remove comment lines
  myScript = str_remove_all(myScript, "#.*")
  
  #Merge script into one long string
  myScript = paste(myScript, collapse = " ")
  
  #Get the variable names
  vars = str_extract_all(myScript, '.\\s*[\\w\\.]+\\s*(<-|=)\\s*[^\\s\\(]+') %>% unlist()
  vars = vars[!str_detect(vars, "^,")]
  vars = str_match(vars, "([^\\s\\(]+)\\s*(<-|=)\\s*([^\\(]+)") %>% as.data.frame()
  
  vars = vars %>% filter(V4 != "function") %>% pull(V2) %>% unique()
  
  #Get the argument names from custom functions
  args = str_match_all(myScript, "function\\(([^\\)]+)\\)")[[1]][,2]
  args = str_match_all(args, "(^\\s*|\\s*,\\s*)([^\\s,]+)")
  
  args = sapply(args, function( arg ) arg[,3]) %>% unlist() %>% as.character() %>% unique()
  
  return(data.frame(name = c(vars, args), type = c(rep("var", length(vars)), rep("arg", length(args)))))
}

script = "scriptToExamine.R"
checkScriptVars(script)

In this example, I go meta and examine the script itself on variables and custom function arguments. The output looks like this:

      name type
1 myScript  var
2     vars  var
3     args  var
4     name  var
5   script  var
6   script  arg
7      arg  arg

Argument here defines only arguments from custom functions. I also made sure to remove all comments and quoted text, because they can have text that looks like a variable assignment, but would not count towards your coding standards.

Let me know what you think :slight_smile:
PJ

2 Likes

Thanks a lot :slight_smile:

I have to tweak the function a bit because some variables contain "$" as some scripts put variables into environments (cool.env$new.variable).

I had to change the parameter "V4" of the dplyr function "filter" to "vars$V4".

Hi,

Well I'm sure there's more room for improvement, but grad I was able to help.

PJ

You should check out the lintr package. It's pretty much exactly what you are trying to do.

I use it as part of a custom autograder I built for the R programming class I TA.

2 Likes

lintr looks nice, thanks.

Sadly it seems not to distinguish between variables and functions.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.