Dependencies for a spatial data package?

mfherman · December 10, 2018, 12:47am

I’m dipping my toe into package development and starting with a very simple data package that will contain various New York City geographic boundaries (e.g. boroughs, census tracts, census blocks, etc.). I do a fair amount of mapping work in New York and the idea is to save me (and others!) the time of importing and cleaning shapefiles for each analysis.

Each of the geographies will be different sf objects. One could in theory use the sf objects in my package without having sf attached, but you wouldn't really want to. It seems like most other data packages I’ve seen don't have many dependencies, but in this case does it make sense to add sf to Depends or Imports in my DESCRIPTION?

P.S. I just started working on the package today, so I haven't yet put it on GitHub, but I will!

brycemecum · December 10, 2018, 2:01am

Great question, and great idea for a package. It's a good idea to consider this stuff so .

The distinction recommended in R Packages (2e) - 9 DESCRIPTION is one I like and one I think a lot of other folks use:

(Paraphrasing) Is is central to your package / does your package not do much without it?

If yes: Put it in Imports
If no: Put it in Suggests and do something like:

(Copied from R Packages (2e) - 9 DESCRIPTION)

# You need the suggested package for this function    
my_fun <- function(a, b) {
  if (!requireNamespace("pkg", quietly = TRUE)) {
    stop("Package \"pkg\" needed for this function to work. Please install it.",
      call. = FALSE)
  }
}

Also from http://r-pkgs.had.co.nz/description.html:

When releasing your package, using Suggests is a courtesy to your users. It frees them from downloading rarely needed packages, and lets them get started with your package as quickly as possible.

It sounds like, since you are shipping sf objects with your package, an Import makes sense as your package wouldn't be very useful if the user didn't have it.

mfherman · December 10, 2018, 3:05am

Thanks! My instinct is to import sf for this reason. Though looking at another spatial data package, spData, they have chosen to only include sf (and other spatial packages) as Suggests and not import or depend on any packages: spData/DESCRIPTION at master · Nowosad/spData · GitHub

Curious to get other perspectives on this too!

technocrat · December 10, 2018, 3:42am

Take a look at tidycensus

Package: tidycensus
Type: Package
Title: Load US Census Boundary and Attribute Data as 'tidyverse' and
        'sf'-Ready Data Frames
Version: 0.8.1
Authors@R: c(
    person(given = "Kyle", family = "Walker", email="kyle.walker@tcu.edu", role=c("aut", "cre")), 
    person(given = "Kris", family = "Eberwein", email = "eberwein@knights.ucf.edu", role = "ctb"))
Date: 2018-08-27
URL: https://github.com/walkerke/tidycensus
BugReports: https://github.com/walkerke/tidycensus/issues
Description: An integrated R interface to the decennial US Census and American Community Survey APIs and
    the US Census Bureau's geographic boundary files.  Allows R users to return Census and ACS data as
    tidyverse-ready data frames, and optionally returns a list-column with feature geometry for many 
    geographies. 
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends: R (>= 3.3.0)
Imports: httr, sf, dplyr (>= 0.7.0), tigris, stringr, jsonlite, purrr,
        rvest, tidyr (>= 0.7.0), rappdirs, readr, xml2, units, utils
Suggests: ggplot2
RoxygenNote: 6.1.0
NeedsCompilation: no
Packaged: 2018-08-27 03:03:15 UTC; kylewalker
Author: Kyle Walker [aut, cre],
  Kris Eberwein [ctb]
Maintainer: Kyle Walker <kyle.walker@tcu.edu>
Repository: CRAN
Date/Publication: 2018-08-27 04:10:03 UTC
Built: R 3.5.0; ; 2018-08-27 19:38:07 UTC; unix

hoelk · December 10, 2018, 6:17am

If your package includes just data and no functions that rely on sf it's a no-brainer: Regardless of what other people here said, don't import sf. You would even get an R CMD CHECK warning if you did. You also wouldn't get any benefits from importing it, as the user would still have to attach it before being able to use sf on your data. I guess that is the case for spData. tidycensus is much more than a data package and the example does not apply to your case as I understand it.

The hadley quote above is a bit out-of-context for this case also. What it means is that if you have a few functions in your package that require said dependency, and these functions only offer small benefit to your package, put them in suggests. F.e. I made a package that retrieves routing data from a web-api and includes a function to preview the captured route with leaflet. Since the interactive route preview is just an "icing on the cake" kind of thing, leaflet goes into suggests there.

Personal opinion:

sf is a pretty complicated dependency (requires a few system libraries installed that are not always available by default, was an issue for me at work several times already where i don't have control over the environment).
so if it's not strictly required for your package, put it in suggested. that way people can still work with the non - spacial aspects of your data. For example, I work with NUTS data, and sometimes I just need the area codes.

If you think it's very unlikely anyone would ever work with the non - spatial part of your data (i.e. the normal columns) , and you would get a benefit out of importing some functions from sf, go wild and import it.

mfherman · December 10, 2018, 3:30pm

Thanks for the input! It's a good point to consider that a user may just want the non-spatial elements of the data I'm including. And yes, sf is a complex package to build. So I guess for now, I'll just stick to suggests and make clear in the documentation, that a user should install sf to get the most out of the data.

I'm also thinking about adding some functions in the future that would help a user to clip spatial features to a given area (say get all census tracts in a given school district) and those would likely include some functions from sf like st_intersects(). If I want to keep sf as suggests, when I add those functions, I could rely on the code @brycemecum noted above:

# You need the suggested package for this function    
my_fun <- function(a, b) {
  if (!requireNamespace("pkg", quietly = TRUE)) {
    stop("Package \"pkg\" needed for this function to work. Please install it.",
      call. = FALSE)
  }
}

brycemecum · December 10, 2018, 3:44pm

Nicely said, @hoelk !