Build package namespace dynamically during `.onLoad()`

mmuurr · January 10, 2018, 11:12pm

[EDIT: The thread title and content has been changed a bit to address the more general question of dynamically building namespaces at package load-time.]

I'm trying to make a (very) lightweight package that wraps some of my existing Python code using reticulate. This Python package contains a module (the_module) that itself is an ever-growing container for functions f1(), f2(),etc.

Version 1: the_package::the_module$f1()

One straight-forward way to do this is:

Bundle the Python code with the R package (at inst/python/<the_py_package>).
Load the Python module within the R package with

the_module <- NULL
.onLoad <- function(libname, pkgname) {
  the_module <<- reticulate::import_from_path("the_py_module", system.file("python", "the_py_package", package = packageName(), mustWork = TRUE))
}

And include export(the_module) via the NAMESPACE file.

This gives R access to the Python functions via:

the_package::the_module$f1()
the_package::the_module$f2()
## ... and so on

Version 2: Declare and export each function explicitly

As nice as Version 1 is, I'd like to have each Python function be a 'top-level' export within the package's namespace and accessible like so:

the_package::f1()

One way to achieve this is to explicitly 'declare' each function, then 'define' those functions with .onLoad:

## for NAMESPACE exporting
f1 <- NULL
f2 <- NULL

## actually _define_ the functions at package load-time
.onLoad <- function(libname, pkgname) {
  the_module <- reticulate::import_from_path("the_py_module", system.file("python", "the_py_package", package = packageName()))
  f1 <<- the_module$f1
  f2 <<- the_module$f2
}

... and then include export(f1), export(f2), etc. via the NAMESPACE file.

This works! But ... every time new functions are added to the Python module, these explicit declarations and definitions are required to expose those functions to R.

Version 3: Build the package namespace during `.onLoad()`

To reduce this extra leg-work, I've adopted this (working!) pattern, where the entire R package is 'defined' with only this .onLoad() code:

.onLoad <- function(libname, pkgname) {
    pkg_ns_env <- parent.env(environment())
    the_module <- reticulate::import_from_path("the_py_module", system.file("python", "the_py_package", package = packageName(), mustWork = TRUE))
    lapply(names(the_module), function(name) assign(name, the_module[[name]], pkg_ns_env))
}

... along with only this in the NAMESPACE file:

exportPattern("^[^\\.]")

This also works! After installing the package, I can access any of the module's functions with:

the_package::f1()
the_package::f2()
## etc.

Better yet, if I make a change to the Python module and add a function f3(), it becomes visible to R (at the next load-time) without any changes to the R package.

Question(s):

Adding functions to the package namespace during .onLoad() seems ... non-standard.

Is it risky?
Are there caveats that I'm overlooking here?
Alternatively, what other ways (if any) have folks adopted for dynamically exporting functions from wrapper packages?

mmuurr · January 11, 2018, 2:26am

[EDIT: the first post has been updated with working code]

nwerth · January 12, 2018, 6:59pm

From what I've read, you're doing the right thing. From the help page for .onLoad:

Anything needed for the functioning of the namespace should be handled at load/unload times by the .onLoad and .onUnload hooks.

From the "Using reticulate in an R Package" vignette:

If you write an R package that wraps one or more Python packages, it’s likely that you’ll be importing Python modules within the .onLoad method of your package so that you can have convenient access to them within the rest of the package source code.

You can automate the process a bit, if you'd like. This is working for me so far in a testing package:

inst/python/the_py_module.py

def f1():
  return "f one"


def f2():
  return "f two"

R/load-python.R

# Load the module and create dummy objects from it, all of which are NULL
the_py_module <- reticulate::import_from_path(
  "the_py_module",
  file.path("inst", "python")
)
for (obj in names(the_py_module)) {
  assign(obj, NULL)
}
# Clean up
rm(the_py_module)

# Now all those names are in the namespace, and ready to be replaced on load
.onLoad <- function(libname, pkgname) {
  the_py_module <- reticulate::import_from_path(
    "the_py_module",
    system.file("python", package = packageName()),
    delay_load = TRUE
  )
  # assignInMyNamespace(...) is meant for namespace manipulation
  for (obj in names(the_py_module)) {
    assignInMyNamespace(obj, the_py_module[[obj]])
  }
}

Of course, you'll probably want to document each of the exported functions. In which case, writing fn <- NULL for each isn't much additional work.

mmuurr · January 12, 2018, 8:06pm

@nwerth I never knew about assignInMyNamespace().
assign() has been working for me, but only after carefully managing which namespace is the package's, so this is definitely an improvement; thanks!

mmuurr · January 12, 2018, 8:11pm

@nwerth Also a note: when assigning from within .onLoad, I don't think you ever need to create the dummy variables ahead-of-time (i.e. at package installation). Even without those dummy variables, I've found that the assign() exposes/exports the variables when the package is loaded.

Build package namespace dynamically during `.onLoad()`

Version 1: the_package::the_module$f1()

Version 2: Declare and export each function explicitly

Version 3: Build the package namespace during .onLoad()

Question(s):

Version 3: Build the package namespace during `.onLoad()`