Instruct RStudio to ignore some files in a project

rstudio

#1

Hello,

I have a simple question, that can be summarized as: can I tell RStudio which files should be part of a project which should not?
Details: when I'm woking on a project, with a home directory (that is, where .Rproj file is located, all the files in that directory and all its subdirectories are included in that project, although I only care about a very small subset of them. Besides the unnecessary clutter in the Files tab, my main issue is with tagging --- the process where RStudio looks for the location where a function is defined. I can easily image a hierarchy where you have some old backup R file with an old variant of the function, for example, but that I never source that file --- unless RStudio opened that file for me when I looked up for the definition of the function. Of course, once I sourced it, the old code of the function is now associated with the function name, and you image where it can lead to...
What I am asking is basically whether I can choose which files I'd like to include in the project, or alternatively have something like the "ignore" file in git that tells RStudio not to show or look a subset of files (the former is better, at least for me, as I have way more many data and files that are related to the project but never want RStudio to see or use them --- e.g. output files that are generated by my program/scripts.

Thanks


#2

I"m not sure I fully understand your use-case, but from the sounds of it you should use projects + git a lot more. I'm not sure that there is exactly what you want in RStudio.

For example, you are saying that you might have two versions of the same function lying around - one is newer, second is older. If you use git, you can check-in old function for future reference and then replace it with the new function. This way you'll only have one function defined.

Another way to achieve this is to use packages. You'll define your function in R directory and then (that's what I do a lot) create a new folder at the top level called sketch or something similar. You can then put this folder into .Rbuildignore filie and whenever you rebuild a package, this folder won't be sourced. Then your sketch folder might have whatever you want, including old version of the package (although I do recommend to use approach outlined above). Moreover, when you rebuild a package, you'll only ever have one version of the function since R itself will control it for you.


#3

Hi,
Thanks for the quick response. You're correct that a lot of my issues can be resolved by maintaining a cleaner workspace, no doubt. At the moment, the same directory that holds the source files also holds lots of output from scripts that code in the source line generates, for example, so having a list of only the files that I decide to be part of a project is what I was looking for (and then also have RStudio tag only these files --- not sure if tagging is the official word, that's how it called in other editor that I'm using). Think about this as in git, where you have files that are part of the repository, and many that the repository know nothing about, and an .ignore list, which is basically mostly use to avoid clutter when you have git list the files, and when you use GUI on top of git. Most GUI interfaces for git, e.g. SourceTree, allows you to see only the files that git recognizes as part of the repository. So in a sense I guess that this is what I was missing, to define a project with an existing files but to manually choose which files are in that project. That's usually the behavior in various code editor that I worked with, e.g. SlickEdit, Netbeans, etc. -- you have to manually choose which files are part of the project, so I was a bit surprised by the all files are part of the project approach, but every editor is different.

In my case I have a bigger problem that probably has to do more with how my projects are structured in directories, which is probably (most definitely) not the way they should be. Maybe you can give me some guidance here, from the perspective of what will work best with RStudio? I will highly appreciate it if you do.

I have a bunch of projects that are related to Finance work. Some that takes care of my Portfolio, another that does some market analysis, another to implement algotrading, etc. As you may guess, that's a lot of basic code that is used by all projects, so I am sourcing many common files in all projects. What I have right now is definitely not how things are usually done: I have a parent directory Finance which is a project itself, and was the only project for many years, so has a lot of code that is used by the other projects. I then started adding subdirectories for each additional project, and rooted a RStudio projects in each. All common files are part of the first project, lying in the root, Finance directory. I know, messy, and not the way I will usually maintain real projects, but it help me avoid some issues with alternative approaches, as I mention below.
One other approach would be to have a parent Finance directory that is not a RStudio project, but includes all common files; then projects will be in subdirectories of it, and have a softlink from them to the common files in the parent directory. I assume here that RStudio follow soft links, so it will find function definitions even if the file is not physically in the project directory. There is an issue with this approach, though, with git: I could have a git repo per project, which makes sense but who is responsible for these common files? Should they be in a repo by themselves? I don't think that git allows having a repo that is rooted at a subdirectory of another repo, and it seems like a bad idea for me even if it does. I guess that another option would be to have the common file in a separate, non parent directory and manage them by a separate git repo. That may work.
What I have either now is a single git repo positioned at Finance (the parent directory), where all the common files sit, and branches for each project that resides in a subdirectory --- subprojects if you wish. That means merging to the trunk and then then to each of the branches updates to the common files, but these do not happen very often. All of that work but having an RStudio project at the top level, Finance, in order to be able to see the common file, is quite messy as it now shows me all the files of all other projects.

I think that the bottom line is that I was surprised by not being able to choose which files are part of my project, but the main reason that it bothers me is a poor structure of my different highly related projects. If you have any advice on that, I'll appreciate if you share it; otherwise I'll make some attempts and see how they work and look with RStudio.

Thanks much.


#4

I would say you have a sever case of legacy :slight_smile:.

I'm not going to go over everything you wrote point by point, but with R/RStudio you are much better served with projects/packages as I've wrote at the very beginning. You can also nest projects, by the way, so "Finance" can still be the root project while all others will be sub-directories with their own project files.

However, main point would be to take a closer look at functions that you are using all the time and putting them into a stand-alone package. You can then use this package in all of the projects thus reducing dependencies between them and making it easier to create new projects without the fear of breaking something in other projects.

P.S. I'm not entirely sure if RStudio comes with a possibility to use disparate files as one project as other editors, but I for sure never came across this functionality, so even if it exists, best practice is to not use it.


#5

Thanks much, Misha, this is very helpful.

I definitely have an issue with legacy code but am trying to gradually fix it in the "right" way which while work the best with both git and RStudio, which is the main editor (used to be the only) that I'm using for R. Overall, BTW, it is amazing.
One much easier use case that I noticed and maybe you know how to deal with is when you have the .git directory in the folder/directory which is the root of the project. The .git directory holds, among all other metadata, a file with exactly the same name as the file you put in the repository. That means that whenever you look for a definition of a function you get at least two locations where it exists --- the actual file, and the one under .git. Even that is a bit dangerous as obviously you never want to edit the file under .git.
BTW, development in R is not my main job, but I do it a lot for finance tracking and analysis, so the slow pace of improvement and often quick patching for putting out fire (fixing a bug that is part of bigger problem) is the main reason for the bad design and legacy code. I usually have to save the majority of my time for work I am actually being paid for :slight_smile:


#6

This is a good idea, and we have a feature request tracking it. Upvotes/comments welcome:


closed #7

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.