Reading a file from S3 on connected EC2

AJF · August 27, 2019, 2:20pm

Hi all,

I am trying to read a csv file from my S3 into my (connected) EC2. They seem to be connected via an instance profile (or instance role, not sure if they are the same thing)

When doing this in python/jupyter notebooks, its now super simple. As long as I have the s3fs library installed, all I need to do is

import pandas as pd
df1 = pd.read_csv("s3://bucket/path/to/file.csv")

and it works!

I would like to do read the file on my R/RStudio server on the same EC2 machine. Is there a way to do so? I am trying to use the aws.s3 package, but I can't get it to connect seamlessly.

Thanks!

hugo-pa · September 12, 2019, 9:26pm

Hi AJF,

You can achieve this relatively seamlessly in R using the aws.s3 package in conjunction with the aws.ec2metadata package. The aws.s3 package uses the aws.signature package to sign AWS API requests; as stated in the readme:

Regardless of this initial configuration, all awspack packages allow the use of credentials specified in a number of ways, in the following priority order:

[...]

If R is running on an EC2 instance, the role profile credentials provided by aws.ec2metadata, if the aws.ec2metadata package is installed.

Thus, if you install the aws.ec2metadata package on your EC2 instance, you should be able to achieve the same functionality (assuming your EC2 instance's IAM role has the appropriate permissions) as in python/jupyter notebooks as follows:

df1 <- read.csv(text = rawToChar(aws.s3::get_object(object = "path/to/file.csv", bucket = "bucket")))

The reason I said this is relatively seamless above is that aws.s3::get_object() retrieves the object into memory as a raw vector; thus, the object must first be converted into a character vector using rawToChar() before being supplied to the text parameter of read.csv(). Of course, you could always create a wrapper function for this that replicates the python/jupyter behaviour if you like:

s3.read_csv <- function(s3_path) {
  s3_pattern <- "^s3://(.+?)/(.*)$"
  s3_bucket <- gsub(s3_pattern, "\\1", s3_path)
  s3_object <- gsub(s3_pattern, "\\2", s3_path)
  read.csv(text = rawToChar(aws.s3::get_object(s3_object, s3_bucket)))
}

df1 <- s3.read_csv("s3://bucket/path/to/file.csv")

Hope this helps!

AJF · September 13, 2019, 1:32pm

Thanks! It worked as well with readr::read_csv() and data.table::fread() as well (without needing to use the text argument in the read function).

hugo-pa · September 13, 2019, 2:54pm

Excellent! Glad I could help.

system · September 20, 2019, 2:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.