learnr with python and chained code chunks

In this pull request barret mentions that learnr will soon have (1) an option to use python code and (2) allow "chained" code chunks.

I would love to know more about how you can "chain" the output from multiple chunks if you are using a separate R process to run each chunk. How do you keep track of the variables created in each chunk and pass them in as an environment (?) to be used when evaluating the next chunk?

Exert from the PR:

"@nischalshrestha (and I) is currently working on a new approach (and also allowing for chained setup chunks; another big request!), learning from @zoews's branch. (We will be copying over a lot of the UI improvements already solved in this branch.)"

tl/dr: Use reticulate to pass variables between R and python.


Chained setup chunks

Before rstudio/learnr#390, the most complicated arrangement an exercise could have is "setup R chunk > exercise setup setupA R chunk > exercise ex1 R chunk (with user's code)".

Ex:

```{r setup, include = FALSE}
library(learnr)
d <- 3
```

```{r setupA}
a <- 5
```

```{r myexercise, exercise=TRUE, exercise.setup = "setupA"}
x <- a + d + 1
x # 9
```

After rstudio/learnr#390, exercise.setup can be recursively followed and each Rmd chunk will be used. In addition, exercise.setup and exercise chunks can use other languages that are supported by Rmd / knitr. Ex: python, MySQL, and bash.

To expand on the prior example, you are now allowed more than one exercise.setup chunk. When submitting an ex1 user exercise, these chunks will be run: setup R chunk > exercise setup setupA R chunk > exercise setup setupB R chunk, exercise ex1 R chunk (with user's code)

(Modified from rstudio/learnr@5a061d03 setup-chunks.Rmd)

```{r setup, include = FALSE}
library(learnr)
d <- 3
``` 

```{r setupA}
a <- 5 
``` 

```{r setupB, exercise.setup = "setupA"}
b <- a + d
``` 

```{r ex1, exercise=TRUE, exercise.setup = "setupB"} 
x <- b + 1
x # 9
```

Mixing python and R in {learnr} exercises

@vnijs You are correct in that the python and R sessions will behave as different processes with no automatic communication.

To get around this issue, authors should use reticulate as it handles creating a bridge from R to python and python to R just by calling library(reticulate) in a setup chunk.

Full document example:

---
title: "Chained setup chunks"
output: learnr::tutorial
runtime: shiny_prerendered
---

```{r setup, include = FALSE}
library(learnr)
library(reticulate)
```

```{r even_more_setup}
d <- 3
```

# learnr + reticulate demo

<!-- Create Python variable `a` which reads `d` from R: -->

```{python setupA, exercise.setup = "even_more_setup"}
a = r.d + 2 # 5
```

<!-- Read `a` from Python, and create `b`: -->

```{r setupB, exercise.setup = "setupA"}
b <- py$a + d # 8
```

R exercise that uses `b` (via R and python setup chunks):

```{r ex1, exercise = TRUE, exercise.setup = "setupB"}
b + 1 # 9
```

Python exercise using `b` (via R and python setup chunks):
```{python ex2, exercise = TRUE, exercise.setup = "setupB"}
r.b + 1 # 9
```

(small disclosure, found + fixed setup chunk engine bug in rstudio/learnr#440 when making this demo)

1 Like

Very cool! So are all chunks re-run when you execute ex1? Or is there caching? A (somewhat) related question, have you considered using docker container to run the R/Python code? That way you could setup the R/data/code environment that all expressions running which could be pretty efficient if you have large datasets that each code chunk needs to use.

So are all chunks re-run when you execute ex1?

Yes. All exercise setup chunks (except for setup) and the user's code is run at exercise submission time. (No other caching is performed, except for the chunk named setup.)

Have you considered using docker container to run the R/Python code?

Yes. Outsourcing the user input computation is a wonderful approach for security purposes and custom execution environments. We currently have undocumented ability in GitHub {learnr} to send a request to an external server to do the computation. We are actively working on this. It is highly experimental and the API may change.

which could be pretty efficient if you have large datasets that each code chunk needs to use

Unfortunately, external exercise evaluators will not resolve an exercise setup chunk having larger processing time. To cache the larger processing time, be sure to put the expensive code in the setup chunk. (Note: all objects in the setup chunk are available to all exercises.)

Theoretically, the external evaluator could ignore all setup chunks and have them pre-processed and ready to go. But that is a logistical nightmare that learnr will not address.

I'd rather solve this by having a directory that is copied over to the exercise execution document at run time so that local files can be accessed. (I can not find the corresponding issue.) Then, you'd only be limited by the speed of reading data from disk, which is a lot better than downloading each time. Another way to get around it is to make an R package that ships with data included so that it can be accessed. Ex: mypkg::mydata1 and mypkg::mydata2 can be resolved by doing library(mypkg) in the setup chunk or in any exercise setup chunk.

Thanks for the detailed response Barret! Looking forward to seeing these new features in action

2 Likes

It's been fun to watch the development of learnr and how the python integration is being handled. I'm currently teaching a "Python for R Users course, so this is super useful.

I'm encountering an issue with configuring Python when I'm hosting the tutorial on shinyapps.io. The error message is:
ImportError: No module named 'pandas' Detailed traceback: File "<string>", line 1, in <module>.

I've copied the relevant source below and you can run into the error by looking at the exercises under 5.2.0 here in the app: https://abray.shinyapps.io/pandas-1.

This runs fine locally, but something appears to be up with the way that pandas is getting installed into the virtual environment. Any tips on how to troubleshoot this?

Thanks,
Andrew


```{r ready-python, message = FALSE}
library(tidyverse)
library(reticulate)
cereal <- read_csv("cereal.csv")
virtualenv_create(envname = "python_environment", python= "python3")
virtualenv_install("python_environment", packages = c('pandas','numpy'))
use_virtualenv("python_environment", required = TRUE)
```

```{python prepare-pandas1-q4, message = FALSE}
import pandas as pd
cereal = pd.DataFrame(r.cereal)
cereal
```

#### Question 4

Enter the command that will return just the `name` column of this data set as a Panda Series.

```{python pandas1-q4, exercise = TRUE, exercise.setup = "prepare-pandas1-q4"}

```

```{python pandas1-q4-solution}
cereal["name"]
```

Hi Andrew,

One tip for troubleshooting this is to include an R exercise in your learnr tutorial, so you can look at the python configuration on shinyapps.io via reticulate::py_config() which prints something like this:

python:         /home/shiny/.virtualenvs/py3-virtualenv/bin/python
libpython:      /usr/lib/python3.5/config-3.5m-x86_64-linux-gnu/libpython3.5.so
pythonhome:     //usr://usr
virtualenv:     /home/shiny/.virtualenvs/py3-virtualenv/bin/activate_this.py
version:        3.5.2 (default, Apr 16 2020, 17:47:17)  [GCC 5.4.0 20160609]
numpy:          /home/shiny/.virtualenvs/py3-virtualenv/lib/python3.5/site-packages/numpy
numpy_version:  1.18.5

NOTE: Python version was forced by RETICULATE_PYTHON

This should at least give you clues on how the python configuration is setup.

I'm not exactly sure what the problem is here but it might be related to issues creating and using the virtualenv. I suggest a solution I came across which helped me (there are other tips in that project on Shiny + reticulate).

In a nutshell, you include an .Rprofile file like this in your project, which sets the RETICULATE_PYTHON system environment that reticulate looks for when the setup chunk is run. Then, you can pull in the environment variables in your setup chunk and apply the rest of your code like so:


```{r setup, include=FALSE}
...
# grab system environment variables set by .Rprofile
virtualenv_dir = Sys.getenv('VIRTUALENV_NAME')
python_path = Sys.getenv('PYTHON_PATH')
# create a virtualenv, install dependences within it, and finally use the virtualenv
reticulate::virtualenv_create(envname = virtualenv_dir, python = python_path)
reticulate::virtualenv_install(virtualenv_dir, packages = c('numpy', 'pandas'))
reticulate::use_virtualenv(virtualenv_dir, required = TRUE)
...
```

Hope that helps!

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.