Function Naming Conventions and Best Practice

I'm interested in getting some more discussion going around what might be considered best practice in terms of naming conventions for the exported functions of R packages.

Of particular interest is whether and when the object_verb naming strategy is suitable. This type of prefixing is evident in packages like stringr (str_*), forcats (fct_*) and googlesheets (gs_* etc).

I think the two main advantages of this are that it:

  1. facilitates tab completion and the discovery of functions - especially for interactive use when it's common for packages to be loaded upfront with library()
  2. helps avoid the potential for namespace conflicts with other packages that use similar verbs (e.g. API packages)

I'm in favour of this approach, and it is also encouraged in ROpenSci's packaging guide. However, it seems like it could also be slightly controversial (e.g. some feel this is essentially a namespacing issue and prefixing is a dirty workaround). Please see my blog post for more background on this:

In terms of framing the discussion, I'd like to point people to this article by Francois Chollet, the author of Keras (thanks to @jennybryan for sharing this!). Some parts of it apply more directly to naming conventions than others, but I think it is an excellent resource and I am all for encouraging a "design culture" among R developers.

My sense is that a "design culture" means that sometimes we may need to be a little flexible with certain development principles in favour of making things more useable for both ourselves and others. Purists may disagree with that, but one of the things which makes R a great language is the ongoing interaction between R users and (package) developers, and the fact that many of us wear both hats (almost) interchangeably. This is underpinned to a large extent by some of the design intent for R (and S) as highlighted by @jcheng in this interview:

R is not a DSL. It’s a language for writing DSLs, which is something that’s altogether more powerful.

By the way - that whole interview is well worth the read if you haven't seen it before and much of it is also relevant to the framing of this discussion. Especially the latter parts about comparing R to other languages, and the influence of computer science and software engineering making its way into R.

8 Likes

Of course, I already weighed in in the GitHub thread! But I really think the autocompletion argument is important. When you're in the thick of developing a package, it's hard to imagine what it feels like to be the occasional user who can't remember all the functions. It's a great help to start typing and have them appear in a list. Ditto for the self-documenting property within a script -- it's obvious which calls are to functions in your package. As for conflicts (literal or mental): several API wrappers I work on also have the equivalent of, for example, your set_api_key().

3 Likes

This may be related to your "design culture" argument. Is the object_verb part important? E.g. purrr uses verb_object syntax (e.g. map_* or flatten_) an it works very well for me.

Great blog post and good on you for opening this up for community discussion.

I have a couple of thoughts on these points:

Re 1., RStudio tab completion will always work with or without the prefix. It does a very good job of partial matching. The package that contains the match is shown. So the only added benefit of a prefix is discovery... But let's take sec to ask ourselves: Why is it that the tab completion box in RStudio is an important mode of function discovery for users? Is this something we should be encouraging?

My theory is that a combination of functional programming style and dated conventions around R's help are together to blame for this. Unless you want to open a clunky PDF file, most of the time there's no way to get to a higher level context with an overview of the function landscape. I think we can do much better than the tab completion box, and I would love to see a community discussion take off around making help better for discovery.

Re 2., Yes, this is where I agree the prefix is quite important and will probably remain so. But what if you solved the namespace issue the old school way and went to an Object Oriented Programming style? Audible gasps are heard

I've become interested in this idea thanks to Winston Chang's UseR!2017 talk where he made a compelling case for using R6 and OOP to encapsulate the necessarily stateful operations required to use APIs. This is the exact case you mentioned as a flashpoint for namespace collisions. If I replace the first underscore in your function names with $ and now imagine I am looking at methods of an ex object, it still seems to hang together quite well. I don't know that this buys you that much in the end, but it maybe a solution to try if you end up at another impasse.

2 Likes

The double-colon notation (foo::bar) already does this. It also shows the exact package a function is from. And, if you know there won't be a namespace problem, a package_verb() name just adds unnecessary text.

I agree with everything but "discovery." Discovery is done by reading the documentations and vignettes. Tab completion is nice because we cannot be expected to remember every function's name. But again, double-colon notation does the same thing. If I type foo::, RStudio shows me a list of objects in the foo package.

2 Likes

I don't think that's mutually exclusive with discovery in context. Think highway signs. It might be a bad example now that people have GPS, but there is a benefit to having reminders in-process.

I was kind of joking when I tossed @jonmcalder the Foucault quote, but there is an argument to be made for cuing using familiar syntactical patterns. (Though, counter-argument: we don't all speak the same language).

https://twitter.com/dataandme/status/937824065498353664

I agree with you, Dan, and I don't think that there's necessarily a single answer for all packages. As @jennybryan mentions, it's good to keep the occasional user in mind (especially when it comes to wrappers, IMHO). Jon can, of course, correct me if I'm wrong, but I don't think the envisioned use for the exercism package is nec. comparable to purrr's.

I haven't done enough thinking about this yet to come to an abstract answer. :woman_shrugging:

2 Likes

Thanks @milesmcbain for this feedback and for sharing your thoughts - much appreciated!

I really like that you've raised this, since I had a similar sense that this sort of (meta) discussion could be sparked by the naming/tab completion issue and might prove valuable, either in terms of ideas for the RStudio IDE or how package authors and users utilize R help etc.

Haha nice! I will certainly check out Winston's talk (thanks for the recommendation). I agree that OOP has it's place for certain use cases, and I appreciate the suggestion, but the target market for exercism is predominantly new R users learning to solve programming problems with R. I think bringing R6 into the picture complicates things unnecessarily from a new users perspective (and even from a package development point of view). My goal with exercism is to try and assist users with the basic workflow steps (fetch an exercise, test a solution, submit etc) without getting in the way - they need to focus on solving the exercises in front of them. Fetching, testing and submitting the exercises should be simple!

Thanks @jennybryan! Really appreciate you following up here and I think we are firmly in agreement!

I think the "start typing" part of this crucial. I get that autocomplete works well with partial matching and/or with foo:: etc and I make use of this all the time, but the key is (and I assume this is what you're trying to emphasize), that it's really valuable to make it easy for the occasional user to know what to start typing. If you don't have an inkling of which letters to start with (either part of a package namespace or part of a function name), then you can't derive any of the "discovery" benefits of autocomplete.

Function names with a common prefix greatly increase the chance of users having the required inkling of which letters to start typing.

@danklotz I agree with you and with @mara - there probably isn't a single answer for all packages, and the verb_object() syntax does have similar (design) value.

Previously I was thinking more generally in terms of (principled) common prefix conventions - the object_verb() syntax surfaced through looking into ROpenSci's function naming recommendations.

verb_object() could be seen as an extension of this common prefixing principle, but I think many would make a distinction between the two. I'd say the verb_object() syntax is a somewhat natural by-product of a functional programming approach with e.g the tidyverse style guides recommendation of verbs for function names + use of snake_case.

e.g. though verbs like map or flatten are used repeatedly in purrr functions to group similar actions together (and that's good and useful design), there are loads of other verbs used for purrr too so this isn't an overly memorable/defining characteristic. I doubt anyone would suggest that this is overly principled or redundant prefixing, but some do seem to think that about e.g. stringr/stringi, forcats etc

One note is that with purrr's verb_object() syntax, the object suffix is the output type. For stringr, the str_ prefix is the input type. So, placement could potentially indicate input versus output. It's less clear for forcats, where the functions generally take a factor or level (or at least a vector that is coded with factors) and output the same.

2 Likes

My feeling is that prefixes aren't being used as substitutes for foo::. To me, the str_* prefix is primarily telling us that these are the functions that work on strings, not that the functions are from stringr. Similarly, for forcats, the fct_* functions operate on factors but there are other functions in the package without the prefix.

Another example is sparklyr where the machine learning functions are prefixed ml_*, while functions that operate on a Spark DataFrame are prefixed sdf_*. This seems much more useful that a single prefix for the whole package.

Having prefixes that signal the object a function works on may also be preferable in terms of resembling OOP methods.

1 Like

I couldn't live without namespace prefixes, since a lot of functionality is similar over the API packages I do, for example when I've loaded Google Analytics and Search Console APIs, which one would an auth() function apply to? Yes you can prefix with the package name via blahblah::auth() but on balance I think the prefix is much nicer, plays well with package namesspace and @importFrom too.

IMO the "naming things is hard" necessity is about communicating exactly what each function does, not about having succinct names which is more a nice-to-have. If its unambiguous what a function does without a prefix, then its good to go, otherwise if working in any environment with more than one package and for the auto completion benefits, I prefer prefix_object_verb (I'm resisting urge to go back and refactor all code to follow that schema, but hoping to stick to that going forward)

1 Like