Reticulate best practices if just using a little bit of Python in R

In order to get the chunk writing ability from sqlalchemy in R I have created a function that wraps around the python code and allows me to write to redshift using sqlalchemy and pandas. It works great. But I have some questions about "best practices". Right now I have a start_python function that I run before I want to export. That fires up reticulate and imports some libraries. Then I run my write function. Then in some cases I may be done with Python/Reticulate for a while. Should I shut down the Reticulate Python stack somehow? How do I even do that? Should I just not worry about it? Other thoughts?

start_python <- function(){
  library(reticulate)
  use_condaenv( "r-reticulate")
  pd <- import('pandas')
  sa <- import('sqlalchemy')
}

# write a table to redshift
write_to_redshift <- function(df, tablename, if_exists = 'append'){
  pd_df <- r_to_py(df)
  eng = sa$create_engine('postgres://user:pwd@redshift_name:5439/db_name')
  conn = eng$connect()
  write_result <- pd_df$to_sql( name=tablename, con=conn,  index = FALSE, if_exists = if_exists, schema='my_schema', chunksize=10000L)
  conn$close()
  return(write_result)
}
3 Likes

Hi,

It is not really about your question but out of curiosity why do you need to use python for connecting to redshift ? odbc client with dbplyr does not work for you ?

About your question: I would not put a library call inside a function. I would use reticulate:: each time you need a reticulate function, as when writing a :package: . Doing that you won't have to 'shut down' anything, reticulate won't have been loading, you'll just have used the fonctions you need fromthe package when needed.
The best practice could be to put your functions inside a package.

1 Like

That works fine for reading, but I've found that writing with the odbc client to be very slow as it does single row writes, which commit after each record. Details of that here: sql - inserting multiple records at once into Redshift with R - Stack Overflow

I get your point about calling reticulate functions with the package prefix. Seems reasonable. Even though it will leave a Python session running in the background until I shut down R. Probably not a big deal.

Good to know ! thanks!