Creating an R package with function sourcing from several scripts

I am trying to create my first R-package using R Studio.
I am learning from these useful tutorials:
https://www.rstudio.com/resources/videos/you-can-make-a-package-in-20-minutes/
and https://support.rstudio.com/hc/en-us/articles/200486488-Developing-Packages-with-RStudio
but I have few questions still.

The package I developed is composed of the main function which requires other functions to calculate or do statistical analysis on a dataset. I wrote every function in separate scripts for a total of 9 scripts, and the main function has to source them in order to work (they are in the same R-folder).
Looking at the tutorials, I learnt the functions of the future package can be in the same script. Is this compulsory? Should I put all the functions together or can I have multiple scripts as I already have?
Does this affect the final package or once it is on CRAN? Do I have to follow a different procedure to create the package?

@LucaS There is a guide created by Hadley Wickham Building R- Package. It comes in real handy to get all the concepts of creating a package right.
Your R folder can contain more than one .R files. My suggestion would be to create/build a source file .tar.gz or a .zip files and test it on your own machine before distributing. Putting it on CRAN requires a robust error handling and structuring of your package along with help files and vignettes.
While going through the above link pay special attention on the NAMESPACE, S3 programming, dependencies.

Let me know if this helps.

Thanks!
Heramb

2 Likes

When a package is built, every file in the R/ subdirectory is run, and then the objects they've created are part of the package. By default, you shouldn't assume they're run in a specific order. And usually, that's fine. Let's say we have a package with two .R files:

# R/aardvark.R
aardvark <- function(x) {
  zebra(x)
}
# R/zebra.R
zebra <- function(x) {
  x + 1
}

Even if aardvark.R is run before zebra.R, the package will build just fine. By the time anything calls the aardvark() function, zebra() has already been defined.

If you need to have the files run in a certain order, you can use the Collate field in the DESCRIPTION file. The official Writing R Extensions describes how this works. That manual's the primary guide for package creation rules, and Hadley's guide (mentioned by @heramb) is an easier read offering good suggestions.


As to whether you should put all the functions in a single script or spread them out: do whatever works best for you. Personally, I have two types of .R files in my package:

  • Categories (e.g., text-formatting.R). These contain a bunch of functions and other data that have a similar purpose.
  • Major objects (e.g., query_website.R). These files are named after important functions or datasets in the package and also contain all the "helper" functions used to define them.

Your choice could also be affected by which version control system you use, if/how you collaborate with others, whether you like scrolling or switching windows to go between code, etc. Choose whatever makes you most productive.

Good luck in creating the package! It's definitely overwhelming at first, but it's totally worth it and addictive.

2 Likes