How to best best structure larger microservices in Plumber?

konradino · April 26, 2019, 3:11pm

Hello,

the adoption of APIs built in R using the plumber package and deployed to RSC becomes more widespread in our organisation. It also means that our services become increasingly large and more complicated, especially when they interact with one another.

This means they also need to be written and organised more efficiently. I feel I'm lacking some typical computer science knowledge on this matter and wanted to ask a bit around on what the best way of doing it is.

Let's say I deployed 4 TF models that essentially do the same thing but are dedicated to different markets and their local specificities. However, what I had to build on top was a preprocessing endpoint that would accept the request and prepare it in order then to call one of the four respective endpoints. So within the preprocessing API I have some code that prepares the data and calls the TF model, retrieves the response and passes it on. I thought the best way to tackle this is the following:

at the beginning of my API code I'm loading all necessary data that is shared across those endpoints to decrease initial load time
each TF model call for each country has it's dedicated endpoint (there will be 4 but currently are 2 as shown below)

Question : is that the right way of doing it or should perhaps each preprocessing endpoint for each TF model be a separate API? What's the best way of structuring this?

Let's say that additionally to those TF models the preprocessing endpoint should also make a call to another API (let's call it Y) in order for it to make some additional computations on the same data and get back the response from Y in order to integrate it in the main preprocessing response. The Y API is a separate service as it also could be called independently of the main preprocessing endpoint.

Two questions on that design:

at the moment API Y is always called when the main preprocessing endpoint it called. How to best structure the endpoint path and the underlying code efficiently in order to give a client the choice of with/ without API Y response? I'm more thinking in terms of not copying & maintaining the same code etc., would using filters here be the right choice if the TF model (for one of the markets) will always be called and then depending on the request it would be routed to an endpoint?
a more general one is if this kind of nesting of services is a good idea. I have a feeling that I'm loosing visibility on timing my services. Is that common practice what I'm doing or is there a better way?

Thank you!

Blair09M · April 26, 2019, 7:40pm

What a great question! API design is certainly subjective and, while there are some guidelines and best practices, it’s ultimately up to each organization to determine design practices that best suit their needs. With that said, here are my thoughts:

Your existing approach to pre-processing sounds good to me as long as you don’t anticipate needing the pre-processing step to be invoked on its own. If you anticipate needing pre-processing independent of running the actual model, you could separate that step into its own endpoint and then invoke it from the model endpoint(s).
Instead of building a separate endpoint for each country, you could use a single endpoint with a dynamic route that would essentially act the same way for the end user, but prevent you from maintaining duplicate logic in the API.

#* Get country prediction
#* @param country Country code
#* @get /<country>/predict
function(country) {
	# Function logic
}

In this implementation, API consumers still send GET requests to the urls you’ve specified (/au/predict and /nl/predict) but all of the logic is handled in a single endpoint. Within that endpoint, you can determine what the appropriate response is based on the value of country.

In response to your other questions, nesting services in this way is certainly a common design. This type of architecture allows you to separate each piece of logic into its own service, and then each service can either be invoked in isolation or orchestrated together as part of a more complex process. This type of architecture can also simplify the maintenance requirements for complicated systems, since each piece can be updated in isolation.

There are a couple of ways you can enable a user to decide whether or not they want the results of API Y to be included. The first option is to include a query parameter, something like include_y. You could check for this parameter before submitting a request to API Y.

#* Preprocessing
#* @param include_y Include results from Y?
#* @get /preproc
function(include_y = FALSE) {
  if (include_y) {
    # Include Y
  } else {
    # Don't include Y
  }
}

API users could then query this endpoint by submitting a GET request to api.path/preproc?include_y=TRUE to include API Y or api.path/preproc?include_y=FALSE to not include API Y.

An alternative option is to include a header indicating the same thing.

#* Preprocessing
#* @get /preproc
function(req, res) {
  if (req$HTTP_INCLUDE_Y) {
    # Include Y
  } else {
    # Don't include Y
  }
}

In this case, we're not checking a parameter, but we're instead checking a header attached to the req object. A request made to this endpoint would look something like this:

GET /preproc HTTP/1.1
Host: <API Host>
Include_y: TRUE

Hope that's helpful! Don't hesitate to fire away with more questions.

konradino · April 29, 2019, 4:46am

Thank you very much - that's definitely very helpful and will enable us move forward with building up that architecture!

Actually, I'll take this opportunity and ask a couple more questions to clear things up completely:

API timing - I posted another question under this link regarding big, unexplainable discrepancies in timing my API: Plumber APIs timing - big discrepancies between R Studio Connect and local runs. Could you please take a look at that too?
filters - I completely understand the role of a filter as e.g.: a check to verify whether an authorised request is made, but other than that I have a bit of an issue understanding their utility and how they 'cooperate' with endpoints. Are you able to make a couple more practical examples (other than in the official documentation) when they can be useful?
articles/ blog posts/ books - could you recommend any of the following that discuss building up APIs from individual microservices into larger architectures? Generally, I would like to get a better grip of it and it doesn't necessarily have to be in R. Looking any for well-written and practical guides
plumber documentation - still some very important pieces of the official plumber documentation remain empty: testing, organising large applications, performance etc. Since plumber is now becoming a major part of the entire R production ecosystem are you guys planning to complete those? I think it would be a vital source of information for many R users

Thank you!

system · May 6, 2019, 4:46am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.