My Anaconda don't want none of your virtualenv, son

I figure this is already on the roadmap, but it looks like reticulate is tightly coupled with Anaconda. More importantly, reticulate doesn't play nice with non-Anaconda virtual environments. I've been trying to cook up some RMarkdown examples that have both R and Python code in chunks, however, I'm finding it impossible to install the Pandas library in Python on the RStudio Cloud. I've created a virtualenv and installed Pandas in that virtualenv, but I can't get R to run the Python chunks in that virtualenv.

So, in short, I'd love to see Anaconda installed on the RStudio Cloud!

-J

It's definitely not tightly coupled with Anaconda. It works both with Anaconda and with virtualenv (see these docs: https://rstudio.github.io/reticulate/articles/python_packages.html). In fact, when running on Linux and OSX the TensorFlow/Keras bindings use virtualenv by default.

There might be something RStudio Cloud specific that is preventing this from working, I'd encourage you to follow up with RStudio Cloud to sort this out.

1 Like

All of that said, because Anaconda tends to work much, much better than virtualenv on Windows (because it provides binaries), we are going to be recommending that package developers build automation around Anaconda (to keep things simple and only have to tool a single target environment). See these recently updated docs: https://rstudio.github.io/reticulate/articles/package.html#installing-python-dependencies

So I would strongly recommend that RStudio Cloud add Anaconda to their base image, and to also ensure that Anaconda environments (or at least the by-convention special "r-reticulate" and "r-tensorflow" environments) are replicated in forks the same way that R packages are.

2 Likes

That's really good input, thank you. I spent more time with the documents and I see that I misunderstood some of the documentation. non-conda virtual environments are supported and I was even able to get Pandas installed!

For future Reticulators that find themselves here, I'll leave some breadcrumbs that I found helpful:

Reticulate and RStudio Cloud puts virtual envs in a special directory:

> virtualenv_root()
[1] "~/.virtualenvs"

If you have virtual envs installed in other places, commands like virtualenv_list() won't show the environments in other places. This confused me at first but makes sense in retrospect.

So to install pandas in a new virtual env, the following works in in RStudio Cloud:

virtualenv_install('my_new_env', 'pandas')

Now when I create Python code chunks, I start them with

use_virtualenv("my_new_env")

and things seem to work as I would expect them to.

That said, I'll happily switch to an Anaconda based workflow when that's available in RStudio Cloud

-J

3 Likes

Just putting the use_virtualenv() in your .Rprofile or in the setup chunk of an Rmd will also suffice (it's a global one-time side effect as Python is loaded only once per R process, you can't reload with a different Python).

3 Likes

great tip. I'll do exactly that. Thank you!