I am trying to read a csv file from my S3 into my (connected) EC2. They seem to be connected via an instance profile (or instance role, not sure if they are the same thing)
When doing this in python/jupyter notebooks, its now super simple. As long as I have the s3fs library installed, all I need to do is
import pandas as pd
df1 = pd.read_csv("s3://bucket/path/to/file.csv")
and it works!
I would like to do read the file on my R/RStudio server on the same EC2 machine. Is there a way to do so? I am trying to use the aws.s3 package, but I can't get it to connect seamlessly.
You can achieve this relatively seamlessly in R using the aws.s3 package in conjunction with the aws.ec2metadata package. The aws.s3 package uses the aws.signature package to sign AWS API requests; as stated in the readme:
Regardless of this initial configuration, all awspack packages allow the use of credentials specified in a number of ways, in the following priority order:
[...]
If R is running on an EC2 instance, the role profile credentials provided by aws.ec2metadata, if the aws.ec2metadata package is installed.
Thus, if you install the aws.ec2metadata package on your EC2 instance, you should be able to achieve the same functionality (assuming your EC2 instance's IAM role has the appropriate permissions) as in python/jupyter notebooks as follows:
The reason I said this is relatively seamless above is that aws.s3::get_object() retrieves the object into memory as a raw vector; thus, the object must first be converted into a character vector using rawToChar() before being supplied to the text parameter of read.csv(). Of course, you could always create a wrapper function for this that replicates the python/jupyter behaviour if you like: