Reticulate::py_to_r How to convert a pandas DataFrame to a R data.frame

reticulate

#1

I am using the reticulate package to integrate Python into an R package I'm building. One of the capabilities I need is to return R data.frames from a method in the R6 based object model I'm building. I utilize Python Pandas package to create a DataFrame in the reticulate python environment. My objective is to return this an R data.frame. The issue I'm seeing is that when I used reticulate::py_to_r(df) it does not convert to R and instead it returns a python DataFrame object. Below is a simple test I'm doing:

# platform       x86_64-pc-linux-gnu         
# arch           x86_64                      
# os             linux-gnu                   
# system         x86_64, linux-gnu           
# status                                     
# major          3                           
# minor          4.4                         
# year           2018                        
# month          03                          
# day            15                          
# svn rev        74408                       
# language       R                           
# version.string R version 3.4.4 (2018-03-15)
# nickname       Someone to Lean On          

 
library(reticulate)
pd <- import("pandas",as = "pd",convert = FALSE)
aa <- pd$read_csv("CDM_LOCATION.csv")
df <-  reticulate::py_to_r(aa)
print(class(df))

df
 

[1] "pd.core.frame.DataFrame" "pd.core.generic.NDFrame" "pd.core.base.PandasObject"
[4] "pd.core.base.StringMixin" "pd.core.accessor.DirNamesMixin" "pd.core.base.SelectionMixin"
[7] "python.builtin.object"

Am I using the wrong method of transforming a DataFrame from Python to R? Feedback will be appreciated!

Thank you,
Brett


#2

Possibly related?


#3

I had some snags doing object conversion in with the CRAN version of Reticulate. If you are running the CRAN version, try using the dev version:

devtools::install_github("rstudio/reticulate")

#4

Hi mara and jdlong,
Thank both of you for the feedback. I had forked reticulate into my github repository so I am using the latest version. I also see that there are well defined S3 methods to handle pandas DataFrame conversion in the reticulate py_to_r() S3 class (e.g. py_to_r.pandas.core.frame.DataFrame). I have identified the problem. The following test executes correctly in a new R session.

library(reticulate)
library(testthat)
 
pd <- import("pandas")
py_config()
## python:         /usr/local/bin/python3
## libpython:      /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/config-3.6m-darwin/libpython3.6.dylib
## pythonhome:     /Library/Frameworks/Python.framework/Versions/3.6:/Library/Frameworks/Python.framework/Versions/3.6
## version:        3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
## numpy:          /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy
## numpy_version:  1.14.5
## pandas:         /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas
## 
## NOTE: Python version was forced by RETICULATE_PYTHON
before <- iris
expect_is(before,class = "data.frame")
convert <- r_to_py(before)
expect_is(convert,class = "pandas.core.frame.DataFrame")

after  <- py_to_r(convert)

expect_is(after,class = "data.frame")

The failure occurs when I utilize the function 'reticulate::import("pandas", as="pd")' with the as parameter. You can see below that the pandas.DataFrame is not converted into an R data.frame. So the problem is related to the S3 method for the pandas DataFrame not matching based on the name of the python module.

pd <- import("pandas",as = "pd")
py_config()
## python:         /usr/local/bin/python3
## libpython:      /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/config-3.6m-darwin/libpython3.6.dylib
## pythonhome:     /Library/Frameworks/Python.framework/Versions/3.6:/Library/Frameworks/Python.framework/Versions/3.6
## version:        3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
## numpy:          /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy
## numpy_version:  1.14.5
## pandas:         /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas
## 
## NOTE: Python version was forced by RETICULATE_PYTHON
before <- iris 
expect_is(before,class = "data.frame")

convert <- r_to_py(before)
expect_is(convert,class = "pd.core.frame.DataFrame")

after  <- py_to_r(convert)
print("after class: ")
## [1] "after class: "
print(class(after))
## [1] "pd.core.frame.DataFrame"        "pd.core.generic.NDFrame"       
## [3] "pd.core.base.PandasObject"      "pd.core.base.StringMixin"      
## [5] "pd.core.accessor.DirNamesMixin" "pd.core.base.SelectionMixin"   
## [7] "python.builtin.object"
#expect_is(after,class = "data.frame")

I have tested this on two different Docker containers, and also on my MacBook Pro and the same error occurs. I think this should be addressed in the reticulate package.
Thanks,
Brett


#5

Hi Brett,

If there isn't an open issue in the reticulate repo, then I suggest you file one! You've done a great job of prepping the problem, so hopefully it can get resolved soon.

Mara


#6

Hi Mara,
I just created an issue in the reticulate Github repository. If I were the developers of reticulate, I would start by just creating documentation in this area. I hope the Rstudio community knows that reticulate enables a great capability for R programmers to utilize Python when necessary. The package I'm building right now is Neo4jDriveR which will enable use of the Neo4j Python library which is supported by Neo4j and it will provide the correct access to the Graph Database. I wouldn't take this on without the reticulate package Rstudio's team has developed. Great work!

https://www.hitfuturenow.com/blog/2018/05/17/2018-05-14-leveraging-python-in-r-to-access-the-bolt-protocol-of-neo4j/


#7

Thanks, Brett. Looks like a really neat project!

Would you mine linking the issue back to this thread so others who run into the same problem can follow along?

Thanks again.

Mara


#8

The reticulate::py_to_r() issue is posted on Github at https://github.com/rstudio/reticulate/issues/319