Best practice for creating package data of a class defined in the package?

I'm writing a package for reading text files in an industry-standard format. Because the format (which is fixed-width) reserves space for custom fields, the package needs to allow users to define those fields. I decided to define a class named record_format for users to organize that data. Right now, the class is just a data.table with certain columns.

I'd like my package to eat its own dogfood and store the official field definitions as a record_format object. But this risks a loop when building the package: the data is created using a function in the package, which cannot be built until the data is created.

R's official documentation says data files can be .R scripts, but it then says

Note that R code should be “self-sufficient” and not make use of extra functionality provided by the package

So what's a good way to do this?

1 Like

In the end, I decided to create the objects without using the package's function and then modify their class.

I did this because it was the easiest way. Normally, that phrase would make me cringe, but it has its place. This solution wasn't too hard in my case, considering the class is just a data.table with predictable columns. As it is, the creation function would've just modified the class anyway.

If maintaining the class-creation code in two or more places isn't a good idea, then you can instead create non-lazy-loaded objects in the R/ directory scripts. Example for a package named test with filenames in the comments:

# R/foo.R
as.foo <- function(x) {
  class(x) <- c("foo", class(x))
  x
}

internal <- as.foo(1)
public   <- as.foo(2)
# NAMESPACE
export(as.foo)
export(public)
# <Session>
library(test)

class(public)
# [1] "foo"     "numeric"
exists("internal")
# [1] FALSE
class(test:::internal)
# [1] "foo"     "numeric"

I tried to create "normal" lazy-loaded datasets that use the function, but only got part way. I usually keep my data-creation scripts in the data-raw/ subdirectory and wanted to keep that habit. But the created object is meant to be loaded at runtime, so it can't be stored in the data/ subdirectory. Instead, inst/extdata/ will work. The problem is the data() function can't load it anymore.

# data-raw/create-lazy.R
lazy <- read.csv("data-raw/lazy.csv")
saveRDS(lazy, file = "inst/extdata/lazy.rds")
# R/foo.R
as.foo <- function(x) {
  class(x) <- c("foo", class(x))
  x
}

delayedAssign(
  "lazy",
  {
    message("-----\nNow loading the data\n-----")
    as.foo(readRDS(system.file("extdata/lazy.rds", package = "test")))
  }
)
# NAMESPACE
export(as.foo)
export(public)
export(lazy)
# <Session>
library(test)

data("lazy", package = "test")
# Warning message:
# In data("lazy", package = "test") : data set ‘lazy’ not found

exists("lazy")
# [1] TRUE
class(lazy)
# -----
# Now loading the data
# -----
# [1] "foo"        "data.frame"
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.