Integrating Python into RStudio Workflow

reticulate

#1

I'm running the v1.2 preview of RStudio and playing around with the increased support for reticulate and Python.

This is how I typically organize my projects:

| project/
| -- data/
| ---- data.csv
| -- docs/
| ---- notebook.Rmd
|
| -- project.Rproj

In order to open data.csv in my notebook, I would use readr::read_csv(here::here("data", "data.csv")).

I'm trying to figure out the best way to manage Python's working directory in this structure if I want to call a python code chunk. My best effort so far is to include this as the initial code chunk in my notebook:

knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())

That yields the expected:

> import os
> print(os.getcwd())
/Users/chris/GitHub/project

instead of /Users/chris/GitHub/project/docs.

Are there any other suggestions out there for navigating this?


#2

Have you seen the R Markdown Python Engine article in the reticulate docs?

https://rstudio.github.io/reticulate/articles/r_markdown.html

What happens if you knit without the root.dir setup?


#3

Yep! They were super helpful for getting started.

I created a little test repo that shows the project setup that I use and the problem I'm running into. There are two notebooks:

  • docs/workingDirectory_fail.Rmd does not work since the Python working directory defaults to the directory where the notebook is - this fails when I get to the python chunk for reading data/mpg.csv into a pandas data frame
  • docs/workingDirectory_success.Rmd does work since I've used the root.dir setup - the plot is created successful

I'm not a fan on the root.dir approach, but it does work. Just wondering if there is a better way within the confines of Python to manage this.


#4

Hey @chris.prener great question! I think it looks like the issue is less to do with Python and more to do with RMarkdown and its default behavior of setting the working directory to the directory of the file being rendered. The behavior for R and Python looks identical on my end and the difference is that you're using here::here in your R chunks and not in your Python chunks.

I think you still have a few options:

  1. Implement here::here in Python and use it like you're using it in all of your R code chunks. I think this is an answer to your "better way within the confines of Python to manage this." question. I couldn't find a Python package that does the exact same thing as here but there are some helpful methods in the os module and a few packages on PyPi that do similar things to here.
  2. Make use of the rmarkdown package's knit_root_dir in a similar fashion to how you've done. I don't think the root dir approach is bad at all
  3. Don't use here in your R code chunks and just use ../data/* in your paths. This is how I do this but that's personal preference.

#5

Thanks for getting back to me @brycemecum - I finally had the chance to sit down and test out your third solution just to see how it behaved. I think this is the best solution for me and how I set up things. It seems simple and effective, requires no overhead, and should work across operating systems.

Since here such an engrained habit, and also how I train my students, I don't like using root.dir because it breaks other .Rmd files that use here during a given session. I'm not sure my Python chops are quite at the point where I could implement here in Python right now.


#6

:+1: Sounds like a great student project!


#7

Hahaha I actually had that thought. I’ll have to find the right student, but if I can that might be a good way to break myself in to python development...