Can't access data in a package that I created

I have a question that's similar to the ones asked in this SO post and this other SO post.

I created an R package that contains several datasets in the data/ directory. The datasets are .RData files. When I run devtools::load_all() and then try to pull up one of the datasets, called "nsw", with data(nsw), it loads just fine into my environment. But when I install the package from Github using devtools::install_github() and do the same thing, data(nsw), I get this message:

Warning message:
In data(nsw) : data set ‘nsw’ not found

It's the same with the other datasets in the package, not just nsw. And it's not just the data() function: I can't access those data objects at all, with any other functions (e.g. head, dim) either.

I went over to the Library directory and checked the folder for the installed package--the datasets show up, and there are no errors when I run devtools::check() before building the package. I have properly documented the datasets in a data.R script, and devtools::document() successfully generates an .Rd file for each dataset.

In the SO posts I linked above, other people had a similar issue to this one, and they fixed it by changing the file extensions of the datasets. Some sources say '.rda' is preferred, others say that '.RData' is preferred. I've tried '.Rda', '.rda', '.Rdata', and '.RData'--none of those file extensions fixes my problem.

One thing to note is that some of these data files are quite large. I wonder if that could be part of the problem? And yet, I'm not having any trouble installing the package from Github; the files seem to download just fine.

Does anyone know what might be wrong here?

I tried installing and it worked just fine, so the only thing I can think of is...
Have you called library(ygdpAddSurvey) before trying to access your datasets?

Unless you already found a solution?

Hi @antdurrant,
Thanks for following up here. Yes, I was definitely calling library(ygdpAddSurvey) after installing the package.

I was able to find a solution--it had to do with Git LFS, which I had been using to track and manage some of my large data files. I didn't realize it was a problem because the data files looked fine--correct file sizes, they downloaded properly from GitHub into the package folder on my computer, etc.--but I guess under the hood they weren't functioning properly because of some problem with LFS. I've since read up on LFS, and I see that people have talked about problems with forking, pull requests, and other collaboration when using LFS with a repo. I wonder if the same thing goes for package development?

Anyway, I untracked the files from LFS, and I split up the large data files into smaller ones so that Git would accept them. That seems to be working fine now, and my package loads the data correctly.

Glad to hear you got it sorted! I only noticed that the post was fairly old after I had posted, so apologies for the basic suggestion two weeks on.
Your blog looks interesting - it hadn't occurred to me to link that sort of thing on profiles here.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.