R code in production - options

spiritus87 · October 15, 2017, 1:29pm

Hello! I am new to the rstudio community (but not to rstudio, of course :)).

I am writing this post to get some perspectives from you people on the deployment of R machine learning/predictive models in a production environment. Up to now I have used R only to do exploratory data analysis, reporting, model selection and so forth, but all these activities are 'static' in the sense that they allow for no or limited automation in a production setting. If I wanted to deploy my R models in production, what would be my best option?

A more concrete case I am working on. Let's say I want to do forecasting on pageviews data. I have a development cloud server where I run rstudio, periodically import pageviews data from (say) google analytics as batches, train/retrain/score/explore various models using the forecast package, and select the best performing model - say it's model M. Good job.

Now I want to deploy this model M on a production server so as to predict on new data that is thrown at it - e.g., forecast pageviews data for the next days, for different geographical locations, etc. The ideal solution would be something that is:

(1) flexible: it can be easily adapted for the deployment of any other type of model I have trained and is language/IT agnostic, in the sense that predictions for my model can be used by the rest of the production environment where R is not used at all (but javascript, C#, python, etc are used)
(2) scalable: it should be fast and robust enough to handle many predictions without breaking
(3) easy: it should be as straightforward as possible for someone who has limited support from developers/devOps

It seems to me that this scenario that I am describing is extremely common, and is one of the most important - if not the single most important - challenge facing R (and, much less, for Python; see below). I have conducted an extensive search online for the options at my disposal, and these seem to be:

(1) Get the developers to translate the model to java/C/whatever. This is of course terrible, as it does not satisfy conditions (1) and (3) above. It is perhaps good for big corporations like Google who can afford to have a whole team of engineers to take the models from the data scientists and optimize everything in C++ and whatnot, but for most companies/scenarios this is impracticable; moreover, this approach is feasible for Python (which is a language that engineers know) but much less for R (which is not used in development).

(2) Make use of proprietary environments/platforms which make it easier to deploy R models, such as Microsoft ML server/SQL Server ML services or platforms like https://www.dominodatalab.com/ and similar. Some of these services require you to already have some type of infrastructure (e.g. SQL server) where all your data is stored, which makes it inflexible if your model takes heterogenous data from multiple sources. Platforms like Domino, it seems to me, make you pay for something that you can do yourself (see point 4 below), which might be good as they free you from the hassle - but then they do not constitute different methods to deploy R models

(3) deploy the model by saving the .rds (serialize it), move it to the production server, predict on new batch data that comes in say every day, and return the dataframe/json object containing the scoring/predictions for further processing. This approach works for quite some use cases, but it has many drawbacks, in particular that you can only score batches of data, which makes it quite inflexible.

(4) deploy the model as a micro-webservice/API on a cloud production server, which can take HTTP requests with input data and returns the predictions as, say, a JSON. This, it seems to me, is perhaps the best and most flexible approach, since developers can request predictions without understanding R and it can be easily adapted for any other predictive model, by writing a small API for each model. The issues here come from scalability, of course, which are made worse by the fact that R is single-threaded. It seems that there are the following packages to expose an R model as a service:

(1) Plumber. Seems under active development, but I am not sure how stable it is. Scalability could be dealt with by running many R processes in different docker containers using kubernetes (see here, and the posts below).
(2) OpenCPU: seems pretty solid and tested, see here, here, and here. Single-threadedness is dealt with by starting a new process for every request, keeping RAM and cpu usage in check.

For both of the options above my biggest worry is scalability. For instance, here it is said that openCPU worked quite well, but in the end they switched to Python because it's a "more proper programming language" (whatever that means), while here it is said that openCPU scales reasonably well but not for intensive websites.

(5) a final option would be to go for the above option (webservice/API) but switch partially or wholly to python in production. A partial switch would look like this: set up an API using flask + gunicorn or Django and run the R models using rpy2. A full switch would take the same route but just run the python equivalents of the R models. My main question here is whether flask + gunicorn (+rpy2 if needed) will scale better than openCPU/plumber.

I would like to get you guys' perspective on this issue, which, as I said, seems quite topical to me for the future of the R language. What's the best way to go about this problem?

Best,
Riccardo.

jclemens1 · October 18, 2017, 7:17pm

Hi Riccardo, thanks for the post. This is a great topic since, as you point out, many organizations see model deployment as a barrier in R. We have added Plumber support to RStudio Connect to address this issue and promote the use of R based models in production.

With regard to performance, the best way to know may be to deploy the model on Connect using our trial license (hence at no charge). You can get the trial from our website. If you are interested in more info about API access through Plumber in Connect, this post from Jeff Allen may help.

Good luck.

Jim

raybuhr · October 19, 2017, 5:13am

I love this question. I literally just wrote a blog post about this using plumber. It's also what I'm using at work to deploy predictive models as REST APIs returning JSON responses.

I've got a part two in development about storing your trained models in S3, using nginx for localhost proxy port forwarding, and load balancing using either Docker or AWS/GCP managed services.

I have friends using opencpu with good results, but requires a bit more setup and a bit less flexibility. With plumber, the biggest barrier to performance is your R code, not the library. In other words, try to make your model just return predictions and precompute all the data processing you can beforehand.

spiritus87 · October 22, 2017, 6:45pm

@jclemens1
Thanks a lot! I'll make sure to try rstudio connect and if it works hopefully I'll convince my bosses to buy it - we are just starting with R in the company so it's a new thing.

@raybuhr
Raymond, what a great blogpost!! I am looking forward to the next instalment!!

I will try plumber out at work now, and see if I can expose my models as API's for everyone to enjoy
So you think that if I only return prediction (i.e., no training/processing whatever, just predict(...)), plumber + docker running on AWS would be sufficient in terms of responsiveness? Perhaps I should just try it out and see how it goes. Do you have a lot of requests to your API's at work?

Since our data lives in a database on our servers, another thing I am considering is first starting with the simple approach of deploying the model by pulling data from the DB, predict and then return the predictions to the DB as a table (the batch processing approach I was mentioning), and then in time deploy the model as an API. In this respect, would you suggest to keep training and prediction in two different servers? I was thinking of predicting new scores every night when new data from the day comes in, and to retrain the model say every week to start with.

Ciao,
R.

raybuhr · October 23, 2017, 12:45am

Using data from the main application database and storing predictions as a table is a really good process for performance (speed from prediction request to prediction response) since the prediction doesn't need to do any extra computation or network transfer. If that's how your company's application is structured, it's a great way to serve predictions. The downsides of that batch process strategy is that you have to have precomputed predictions. That's not a problem if you only have a few types of predictions to make. But if you have dozens of database tables for different predictions, database performance and therefore application performance can start to suffer. Limiting to only off hours like nightly batch inserts/updates is the most common way to offset the performance impact.

My company's application has a microservices architecture, which means lots of smaller databases. I.e. users are in a different database than products, which are in a different database than orders, etc. Since our application has to send data back and forth between different databases, using plumber and serving predictions as a json api is a natural fit.

anhhoangduc · November 17, 2017, 9:39am

I have used AzureML package from Microsoft and it is a very good solution.
I followed the instruction here and have successfully create a simple web app. The only issue is to install other R packages like H2O in AzureML as my team relied on H2O for buidling predictive model.