I'm writing to ask about some best practices in package development. I am developing a couple packages that are intended to be used for the same type of (linguistic) data. I think it makes sense to split the functions up into two packages because each contains a cohesive group of functions that perform a specific task. I've prepared some data to be used for vignettes and example code (as well as other teaching demonstrations outside of these two packages). Since it very well may be the case that both packages will be loaded within the same script, I don't want to create a clash in the datasets.
So my question is this: is it better to duplicate the data across the two packages, or would it make more sense to create a third, data-only package that the other two would import for vignettes? I'm inclined to make the data-only package, because I may create additional packages in the future. Does the data-only package need to be on CRAN for the others to import it in the DESCRIPTION file or can it just be on GitHub?