Hi, I'm starting my first independent research project. I would like to use reproducible research methods, as much as possible, specifically:
- All code and data in a Docker container
- The paper itself will be written in R Markdown
- I don't exactly like the paper as a package approach, but I may try it again.
I'm planning on compiling my main dataset from several sources, however this process will involve using data that contains private information that can not be publicly released. The final product (with sensitive data removed) can be released.
My best idea is that I could use two separate Docker containers: one for compiling the dataset and removing sensitive data, and then one for the actual analysis.
Does anyone have any best practices for a situation like this?
Thanks,
Ben