Files in inst/extdata will be included in the package when installed. That folder is for package data in file formats not allowed in data. For example, it can hold an HTML template to use with a custom RMarkdown format.
With packages, I've taken up the practice of dividing data files into four groups:
-
data/: Prepared datasets for the package.
-
inst/[something]: Non-standard data files. Basically inst/extdata, but I prefer using descriptive directory names like css or stata.
-
data-raw/: Text files of the data which are used to create the files from #1. These can be edited in any way a developer chooses, even by hand. They are version-controlled, because changes here mean the package's contents are different.
-
external/[something]: "Unreliable" data files and scripts for processing them. If I scraped data from an API or used a SAS program to reverse-engineer a file format, it goes here. These are tools used to build the files in data-raw/, and they change mostly because other people make changes to what they offer. Most of the time, the data files are not version controlled, and the scripts describe how to get them.
You can look at my naaccr package for an example. I use Excel, PDF, and other messy files from sources that can easily change in the future, but like to keep curated files in data-raw. There are actually way more files on my computer in the external directory.
It works for me, especially because I think curated and language-agnostic data files are one of the most valuable things from any open source project.