Reading a file from S3 on connected EC2

Hi AJF,

You can achieve this relatively seamlessly in R using the aws.s3 package in conjunction with the aws.ec2metadata package. The aws.s3 package uses the aws.signature package to sign AWS API requests; as stated in the readme:

Regardless of this initial configuration, all awspack packages allow the use of credentials specified in a number of ways, in the following priority order:

[...]

  1. If R is running on an EC2 instance, the role profile credentials provided by aws.ec2metadata, if the aws.ec2metadata package is installed.

Thus, if you install the aws.ec2metadata package on your EC2 instance, you should be able to achieve the same functionality (assuming your EC2 instance's IAM role has the appropriate permissions) as in python/jupyter notebooks as follows:

df1 <- read.csv(text = rawToChar(aws.s3::get_object(object = "path/to/file.csv", bucket = "bucket")))

The reason I said this is relatively seamless above is that aws.s3::get_object() retrieves the object into memory as a raw vector; thus, the object must first be converted into a character vector using rawToChar() before being supplied to the text parameter of read.csv(). Of course, you could always create a wrapper function for this that replicates the python/jupyter behaviour if you like:

s3.read_csv <- function(s3_path) {
  s3_pattern <- "^s3://(.+?)/(.*)$"
  s3_bucket <- gsub(s3_pattern, "\\1", s3_path)
  s3_object <- gsub(s3_pattern, "\\2", s3_path)
  read.csv(text = rawToChar(aws.s3::get_object(s3_object, s3_bucket)))
}

df1 <- s3.read_csv("s3://bucket/path/to/file.csv")

Hope this helps!

1 Like