pins and S3 use case / best practice

glynnfoster · February 26, 2020, 10:57pm

Hi all,

I have a question about pins and a use case with AWS S3 boards that I'm not sure about in terms of recommendations. Here at Montoux, we develop a SaaS web based platform for life insurance companies to model their portfolios from an actuarial analysis point of view. We're developing a bunch of data science based models in R, and we're figuring out the mechanics of productising these into our platform.

We've previously been using s3mpi (https://github.com/robertzk/s3mpi) as a way to allow us to fetch data from S3 and cache it locally - some of the data we consume is pretty large, so this has worked well for explorative analysis. However, we feel like most of the community traction is around pins, so we're looking at how we can use it.

One area that's a little unknown to us is how we should approach data that hasn't yet been cached/pinned - for example, may have been uploaded directly by a user, or produced as part of some other data processing - ie. the metadata data.txt hasn't been produced. One way to approach this would be to use aws.s3 to pull the file and then pin it, but this seems somewhat inefficient and it would be nice if there was a way for pins to populate the cache from an existing S3 object. Is there anything I'm missing here, or is this a use case that is outside the scope of pins?

I'd really appreciate any experiences or recommendations anyone has.

Thanks!
Glynn

alexkgold · February 26, 2020, 11:10pm

Hi Glynn,

That's a great question! Currently the best way forward here would be to pull the file and pin it separately, but you should feel free to file an issue on the pins github repo.

@javierluraschi, the package author, is really interested in hearing how people are using pins and how to make it better.

glynnfoster · February 26, 2020, 11:37pm

Hey Alex,

Thanks for confirming that approach - it works ok for us, though we try to keep a principle of maintaining read-only access to original source data. I'll definitely put something into the issue tracker on this.

Cheers,
Glynn

system · March 18, 2020, 11:37pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.