I have a Shiny Server installation that runs exactly one application. It runs a Simple Scheduler and so there should be at most one R/Shiny process spawned by Shiny Server. Periodically, the machine becomes unresponsive and when I'm finally able to connect (to a terminal, usually after other system-wide failures triggered by this issue), I see the unresponsiveness is due to lots of memory swapping, which itself is caused by a bunch of R and SockJS processes, all spawned by Shiny Server. It short, there are times where it appears Shiny Server fails to kill old R sessions when the application timeout has elapsed. Most of the obvious answers for how this can happen are already ruled out (e.g. the codebase isn't changing, so no existing sessions are lingering while new connections run newer R/Shiny application code, there aren't multiple Shiny Server processes, nor multiple server
or location
directives in the configuration file).
I placed Shiny Server's logs in "trace" mode and am beginning to dig in a bit, but some light Googling ahead of time to help prepare me didn't turn up much. Have others experienced anything remotely similar to this? Where Shiny Server launches multiple version of an application, despite not being configured to do so? A GitHub Issue search for this on Shiny Server also didn't turn up much that was useful :-/, but I can't imagine I'm the only one that's experienced this. It doesn't happen often, but I also run hundreds of Shiny Apps continuously (each on its own system) and I've seen this now happening to me about once per month, so n > 1.
I'll likely have to start digging into Shiny Server code myself, but the logs indicate a possible chain-reaction cycle starting with something like this:
[2022-09-05T18:28:36.071] [ERROR] shiny-server - Uncaught exception: TypeError: Cannot read property 'url' of null
[2022-09-05T18:28:36.194] [ERROR] shiny-server - TypeError: Cannot read property 'url' of null
at getInfo (/opt/shiny-server/lib/proxy/robust-sockjs.js:29:46)
at RobustSockJSRegistry.robustify (/opt/shiny-server/lib/proxy/robust-sockjs.js:59:16)
at Server.<anonymous> (/opt/shiny-server/lib/proxy/sockjs.js:50:29)
at Server.emit (events.js:314:20)
at App.emit (/opt/shiny-server/node_modules/sockjs/lib/sockjs.js:196:29)
at /opt/shiny-server/node_modules/sockjs/lib/transport.js:111:25
at processTicksAndRejections (internal/process/task_queues.js:79:11)
[2022-09-05T18:28:36.222] [INFO] shiny-server - Stopping listener on http://[::]:9001
[2022-09-05T18:28:36.222] [INFO] shiny-server - Shutting down worker processes (with notification)
/opt/shiny-server/lib/main.js:387
throw err;
^
TypeError: Cannot read property 'url' of null
at getInfo (/opt/shiny-server/lib/proxy/robust-sockjs.js:29:46)
at RobustSockJSRegistry.robustify (/opt/shiny-server/lib/proxy/robust-sockjs.js:59:16)
at Server.<anonymous> (/opt/shiny-server/lib/proxy/sockjs.js:50:29)
at Server.emit (events.js:314:20)
at App.emit (/opt/shiny-server/node_modules/sockjs/lib/sockjs.js:196:29)
at /opt/shiny-server/node_modules/sockjs/lib/transport.js:111:25
at processTicksAndRejections (internal/process/task_queues.js:79:11)
Where Shiny Server itself wants to shut down, so it tries to kill the children R/Shiny processes, but that process-reaping step fails, so then a new process of Shiny Server fires up, which then spawns the same R/Shiny application, which now consumes more memory, and then if the same Cannot read property 'url' of null
error occurs in Shiny Server the whole thing starts again, consumes more memory, then leads to this:
[2022-09-05T18:29:08.647] [INFO] shiny-server - Starting listener on http://[::]:9001
[2022-09-05T18:30:35.812] [INFO] shiny-server - Error getting worker: Error: The application took too long to respond.
[2022-09-05T18:30:35.917] [INFO] shiny-server - Error getting worker: Error: The application took too long to respond.
[2022-09-05T18:30:35.917] [INFO] shiny-server - Error getting worker: Error: The application took too long to respond.
... many more times
... because by now the entire OS is memory-starved.
Debugging this is pretty challenging as an end user because the logs don't dump whatever message/object is being referenced in the initial 'url' in null
error, and this seems to happen somewhat non-deterministically (though particularly in applications with very long-lived connections).
Anyone ever find themselves in this (or a similar) rabbit hole?