Connecting Sparklyr to s3a (AWS S3) via instance roles

I'm having a hard time getting sparkly (spark 2.4.3) to connect to AWS S3 (s3a://) data sources when using instance roles (EC2 Metadata service). When I have a known working IAM credentials in the EC2 metadata service (tested via cloudyr/aws.ec2metadata and cloudyr/aws.s3), I'm getting error messages that start:

Error: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: REDACTED, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: REDACTED

My spark initialization is pretty simple, following, and looks like:

conf <- spark_config()
conf$sparklyr.defaultPackages <- "org.apache.hadoop:hadoop-aws:2.7.7"
conf$fs.s3a.endpoint <- ""
sc <- spark_connect(master = "local", config = conf)
stream_read_text(sc, "s3a://REDACTED_BUT_KNOWN_WORKING_PATH")

I've tried both up and down leveling the version of the hadoop-aws package and tried both with and without setting those AWS environment variables to empty strings (the env var method came from

Would be very grateful for any tips to get this working!

This took a tremendous amount of work, but I finally cracked the code to get this working.

conf <- spark_config()
conf$sparklyr.defaultPackages <- "org.apache.hadoop:hadoop-aws:2.7.3"
conf$ <- ""
sc <- spark_connect(master = "local", config = conf, version = "2.4.4")
ctx <- spark_context(sc)
jsc <- invoke_static(sc, 
hconf <- jsc %>% invoke("hadoopConfiguration")  

# we always want the s3a file system with V4 signatures
hconf %>% invoke("set", "fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
hconf %>% invoke("set", "", "true")

# connect to us-east-2 endpoint
hconf %>% invoke("set", "fs.s3a.endpoint", "")

# ensure we always use bucket owner full control ACL in case of cross-account access
hconf %>% invoke("set", "fs.s3a.acl.default", "BucketOwnerFullControl")
# use EC2 metadata service to authenticate
hconf %>% invoke("set", "", 

I have to say, the documentation on this (particularly the distinction between spark_config() and hadoop config) is a bit...rough about the edges. :stuck_out_tongue_winking_eye:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.