Reading Pickle file from s3 into ec2

Hey all,

I have a pickle file on s3 (which comes from a python/pandas DataFrame), and I want to read it into R. I know from a previous question how to read in a csv, and if I was in Python, I'd know how to read in a pickle from s3, but I am having difficulty combining them in R with reticulate.

In Python, I run the following:

import pandas as pd
import pickle
import boto3
from io import BytesIO

bucket = 'my_bucket'
filename = 'my_filename.pkl'

s3 = boto3.resource('s3')
with BytesIO() as data:
    s3.Bucket(my_bucket).download_fileobj(my_filename, data)
    data.seek(0) 
    df1 = pickle.load(data)

which works succesfully.

so I tried to convert this into R, but failed:

library(reticulate)

reticulate::use_condaenv("base2", required = TRUE)

boto3 <- reticulate::import("boto3")
pickle <- reticulate::import("pickle")
io <- reticulate::import("io")
data <- io$BytesIO()

s3 <- boto3$resource("s3")
bucket <- 'my_bucket'
filename  <- "my_filename.pkl"

s3$Bucket(bucket)$download_fileobj(filename, data)
data$seek(0)
#> Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: integer argument expected, got float
df1 <- pickle$load(data)
#> Error in py_call_impl(callable, dots$args, dots$keywords): UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 4: invalid start byte

Created on 2020-07-23 by the reprex package (v0.3.0)

Can anyone help with the python <--> R conversion?

I haven't used reticulate much before, but are you confident the versions of pickle are the same? I know pickles don't have to have the same format across versions.

Perhaps one way to isolate the problem is to save the downloaded file in R, then load it in Python, and vice versa - downloading using Python and loading in R. Hopefully then you can narrow down to either pickle or s3.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.