I'm having a hard time getting sparkly (spark 2.4.3) to connect to AWS S3 (s3a://) data sources when using instance roles (EC2 Metadata service). When I have a known working IAM credentials in the EC2 metadata service (tested via
cloudyr/aws.s3), I'm getting error messages that start:
Error: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: REDACTED, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: REDACTED
My spark initialization is pretty simple, following https://spark.rstudio.com/guides/aws-s3/, and looks like:
conf <- spark_config() conf$sparklyr.defaultPackages <- "org.apache.hadoop:hadoop-aws:2.7.7" conf$fs.s3a.endpoint <- "s3.us-east-2.amazonaws.com" Sys.setenv(AWS_ACCESS_KEY_ID="") Sys.setenv(AWS_SECRET_ACCESS_KEY="") sc <- spark_connect(master = "local", config = conf) stream_read_text(sc, "s3a://REDACTED_BUT_KNOWN_WORKING_PATH")
I've tried both up and down leveling the version of the
hadoop-aws package and tried both with and without setting those AWS environment variables to empty strings (the env var method came from https://stackoverflow.com/questions/45924785/how-to-access-s3-data-from-rstudio-on-ec2-using-iam-role-authentication).
Would be very grateful for any tips to get this working!