nginx load balancing Shiny Server ... is `ip_hash` really needed?

mmuurr · October 17, 2020, 6:37pm

In this RStudio article about load-balancing Shiny Server processes (by Amanda Gadrow, which is very helpful, BTW!), it's mentioned that the NGINX balancing algorithm should be set to ip_hash:

To load balance Shiny Server, you can put the addresses to your two Shiny Servers in the "upstream" property in the nginx configuration file, using the ip-hash load balancing mechanism. This mechanism is necessary to ensure "sticky" client sessions; otherwise, the applications will break in unexpected ways.

My understanding of websockets, and specifically how NGINX handles websockets, is once the socket handshake is complete, there are no new HTTP 'connections' made, unless a page is reloaded (e.g., due to a broken socket and/or causing a broken socket). And if a new socket is being established, that'd be a new Shiny session object anyhow (i.e. a new session token). Here's NGINX's take on how it handles guaranteeing continued communication with the appropriate upstream server for a websocket:

Since version 1.3.13, nginx implements special mode of operation that allows setting up a tunnel between a client and proxied server if the proxied server returned a response with the code 101 (Switching Protocols), and the client asked for a protocol switch via the “Upgrade” header in a request.

I've played around a bit with some NGINX-load-balanced Shiny Server-hosted apps using the random NGINX upstream directive (which is, in some ways, the complete opposite of ip_hash since it's not deterministic), and as far as I can tell, they work fine (though I've certainly not stress-tested every Shiny feature or code path out there).

Does anyone have more specific information about how an application might "break in unexpected ways"? It seems that nearly all session identification is tied to the session's token, which is tied to the websocket, and thus any upstream balancing directive to should work in theory?

mmuurr · October 17, 2020, 9:36pm

The only use-case for ip_hash stickiness I've been able to come up with so far is the case where some Shiny program writes an intermediate file to the local filesystem, then Shiny (via websocket) instructs the client to download the resource via standard HTTP Request (i.e. circumventing the websocket). In that case, the request would need to be guaranteed to go to the same machine, or at minimum one with access to that intermediate file via the same path/routing structure.

Perhaps this is the type of breakage mentioned in the article? (Though if the two Shiny Servers were, say -- on the same physical machine, this likely wouldn't be a concern or a breakage at all.)

More succinctly, at what points during a running Shiny's application life would the client need to robustly establish new HTTP connections to the same Shiny Server process?

system · December 11, 2020, 1:36am

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.