Accessing common reference files from multiple RStudio projects?

I’m working on converting my workflow to use RStudio projects,and tools like the “here” package. (following the great advice from @JennyBryan). I’m putting each project in a separate folder and I’m able to eliminate most of my setwd()’s.

However, I’m not sure how to handle access to a set of about 12 common data files that I use in various combinations in almost all of my projects. These files contain reference data and are updated weekly. I don’t want to copy them into every project. I frequently rerun the projects with the latest reference data. I’m currently using setwd() to read the files from a common folder.

Is there a good “reproducible” way to read from a set of common reference files in multiple projects without using setwd()?

Hi,

It's indeed good practice to create an R-Studio project with a separate folder for each project you have. Let's imagine you have a project set up in the folder C:/Documents/Projects/myProject/. This would be the working directory and all files in that folder can be accessed just by typing their name without the rest of the path (e.g. myProjectFile1.csv).

Referring to files outside the project can be done in two ways:

  • Using the full path name of the file. Ex: C:/Documents/Projects/commonFiles/commonFile1.csv
  • Specifying the relative path of the file starting from the working directory of the project. Ex: ../commonFiles/commonfile1.csv

The two dots .. in the relative path name mean you have to go 'up a folder' then follow the path as specified by the rest of the folders in the path name. You can go up multiple folders by using the .. several times Ex: ../../otherFolder/ would access a folder named 'otherFolder' in the Documents folder.

*** OPTION 2***
If you find you need all twelve files almost every time in all projects, it could also be useful to save them all in one .RData file. You just load all the files in R in a script, give variable names you would use in every project for these files, and then save them all together as once big .RData file.

In new projects, all you have to do is read in that one R.Data file, and it will 'restore' the twelve variables at once. See this link for more details:

Hope this helps,
PJ

1 Like

PJ,

Thanks for the feedback and suggestions.

They both may be better than my current process of setwd() and reading files. I suppose there is no “magic” to have all your data in a project structure and still use external references.

I think I’ll try the RDS idea for most of my reference files. I have one file that’s very large and I don’t want to bring in each time, but having all of the others available as needed will be helpful. I could even copy the RDS file into the project data folder to be more reproducible and true to the spirit of projects.

Thanks again for taking time to reply and share ideas.
Herb

Hi,

Just so you know: RDS files are compressed and can achieve amazing reductions in file size.

Glad I could help,
PJ

Follow up...I was mixing up RDS and RData. Now I understand RDS can only be used to save one object, while RData can be used for one or more objects, or even all objects in a workspace with save.image(). I also found this RData reference.
Saving my reference files as RData gives me an RData file that is much smaller that all of the original XLSX and CSV files, and also loads much faster than reading the individual files.

Hi,

Yea it can be confusing in the beginning, but once you ge the hang of it it's handy :slight_smile:

Glad I could help
PJ

How about putting your RData files in data/. of a package? This way you get access just by using library() from any project, you can provide consistent documentation, and you have a single place to do any maintenance. Here is a good explanation and suggestion of best practices http://r-pkgs.had.co.nz/data.html

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.