Converting Pandas DataFrames to R DataFrame using Reticulate not Working Consistently

I’m using RMarkdown with the reticulate package and often have the requirement to print pandas DataFrame objects using R packages such as Kable. Unfortunately, the conversion appears to work intermittently when Knitting the document. Again, sometimes it works, sometimes it doesn’t.
Here is a reproducible example.

---
title: "Test"
date: "2/13/2019"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
import pandas as pd
df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
print(df)
library(reticulate)
df2 <- reticulate::py$df
print(df2)
print(reticulate::py$df)

Actual Result:

import pandas as pd
df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
print(df)
##    a  b  c
## 0  4  5  9
library(reticulate)
df2 <- reticulate::py$df
print(df2)
##                                   a                                 b
## 1 <environment: 0x000000001dddb808> <environment: 0x000000001decdc58>
##                                   c
## 1 <environment: 0x000000001e000918>
print(reticulate::py$df)
##                                   a                                 b
## 1 <environment: 0x000000001e807f78> <environment: 0x000000001e8fd480>
##                                   c
## 1 <environment: 0x000000001e9ee608>

Expected Result:
The small dataframe should print (3 times)

Here is the session information:

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] reticulate_1.10.0.9004
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0      lattice_0.20-38 digest_0.6.16   rprojroot_1.3-2
##  [5] grid_3.5.2      jsonlite_1.6    backports_1.1.2 magrittr_1.5   
##  [9] evaluate_0.11   stringi_1.1.7   Matrix_1.2-15   rmarkdown_1.10 
## [13] tools_3.5.2     stringr_1.3.1   yaml_2.2.0      compiler_3.5.2 
## [17] htmltools_0.3.6 knitr_1.20

Would appreciate any guidance!

I'm not able to reproduce your error, this works as expected, the only difference is that I load reticulate on the setup section and I don't use reticulate:: to call the py object.

---
title: "Example Reticulate"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(reticulate)
use_python("/usr/bin/python3.6")
```
```{python}
import pandas as pd
df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
print(df) 
```
```{r}
library(reticulate)
df2 <- py$df
print(df2)
print(py$df)
```

R out put

df2 <- py$df
print(df2)
#  a b c
#0 4 5 9
print(py$df)
#  a b c
#0 4 5 9
1 Like

Thanks @andresrcs for jumping on to this so quickly. Yes, I've experimented with this to no avail.
---
title: "Test"
date: "2/13/2019"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(reticulate)
```

```{python}
import pandas as pd
df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
print(df)
```

```{r}
df2 <- py$df
print(df2)
print(py$df)
sessionInfo()
    Output
    import pandas as pd
    df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
    print(df)
    ##    a  b  c
    ## 0  4  5  9
    df2 <- py$df
    print(df2)
    ##                                   a                                 b
    ## 1 <environment: 0x000000001e0d6c08> <environment: 0x000000001e294b00>
    ##                                   c
    ## 1 <environment: 0x000000001e7b6e10>
    print(py$df)
    ##                                   a                                 b
    ## 1 <environment: 0x000000001eadcc60> <environment: 0x000000001ebd2788>
    ##                                   c
    ## 1 <environment: 0x000000001ecc1ac0>

Perhaps there is something wrong in my configuration. I've included the session info. Anything stick out?

I recently had some issues with reticulate on windows and installing the development version from github made the trick for me, maybe you can give it a try.

remotes::install_github("rstudio/reticulate")

Also, Im ussing Rstudio version 1.2.1278

This question is also posted on SO

Please make sure to follow our cross posting policy.

Ah, thanks for the note on the cross-post. Will comply.

Uninstalled and reinstalled latest build of Rstudio v. 1.2.1280
Running R version 3.5.2
Reinstalled the dev version of reticulate_1.10.0.9004.

Same result, unfortunately.

Have you tried the development version of reticulate from github?

reticulate 1.11 (development)

Yes, i've tried, but :
a) get the same result and
b) sessionInfo() now says reticulate_1.10.0.9004 is installed, not 1.11.

Wondering if an R downgrade would be appropriate. I'm running version 3.5.2.

I don't think that is the cause, I'm also using R 3.5.2 and I can't reproduce your issue, have you tried specifying your python engine

use_python("/usr/bin/python3.6")

See also: https://github.com/rstudio/reticulate/issues/389

It would help to know what version of Pandas you have installed.

What is the output of the following for you (run as an R script)?

library(reticulate)
py_run_string("
import pandas as pd
print(pd.__version__)
df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
")

main <- import_main(convert = FALSE)
class(main$df)
class(main$df$a)

Hi Kevin:

Thanks for jumping in here.

I'm currently running pandas 0.23.4

I just installed the latest .NET Core SDK because it was required by rTools.
Ran the script and expected and actual output match.
No clue if the two are related. Actually doubt it.

Regarding your script, the output is as follows:

library(reticulate)
py_run_string("
import pandas as pd
print(pd.__version__)
df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
")

main <- import_main(convert = FALSE)
class(main$df)

## [1] "pandas.core.frame.DataFrame"       
## [2] "pandas.core.generic.NDFrame"       
## [3] "pandas.core.base.PandasObject"     
## [4] "pandas.core.base.StringMixin"      
## [5] "pandas.core.accessor.DirNamesMixin"
## [6] "pandas.core.base.SelectionMixin"   
## [7] "python.builtin.object"
class(main$df$a)
## [1] "pandas.core.series.Series"         
## [2] "pandas.core.base.IndexOpsMixin"    
## [3] "pandas.core.generic.NDFrame"       
## [4] "pandas.core.base.PandasObject"     
## [5] "pandas.core.base.StringMixin"      
## [6] "pandas.core.accessor.DirNamesMixin"
## [7] "pandas.core.base.SelectionMixin"   
## [8] "python.builtin.object"

It appears to be working...for now. I'd like to get a true sense of root cause as I'm not sure what the problem actually is.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.