Dear all,
We have production and development servers (with the same configuration) which had been running a shiny application (using shiny-server pro) for few years without any issues until recently, with the problems starting after changing from R 3.4 to R 3.5.0.
We are seeing segfault errors on both servers, e.g. in /var/log/messages we see lines such as
Sep 14 13:41:33 server kernel: R[20882]: segfault at 1 ip 00007f462caa0120 sp 00007ffe15b00aa0 error 4 in libR.so[7f462c903000+407000]
Sometimes, on the production system (protected by a firewall), but not on development server, we see messages about C stack close to the limit, e.g. in the /var/log/shiny-server/*.log files we sometimes see lines like:
Loading required package: rcdk
Loading required package: rcdklibs
Loading required package: rJava
Loading required package: fingerprint
Loading required package: rcellminerData
Consider citing this package: Luna A, et al. rcellminer: exploring molecular profiles and drug response of the NCI-60 cell lines in R. PMID: 26635141; citation("rcellminer")
Error: C stack usage 186609093604 is too close to the limit
Execution halted
Error in exists(name, envir = ld, inherits = FALSE) :
not a BUILTIN function
Calls: <Anonymous> -> cleanup -> :: -> getExportedValue
Warning: stack imbalance in 'lazyLoadDBfetch', 12 then -8
Fatal error: error during cleanup
When we get the above, our website becomes unreachable, and there is always a job running that is using about 200% of a cpu. We have found that killing that job allows us to again access the website. However, it appears, and this could be wrong and misleading, but again I’ll say, it “appears” that if we simply kill the job, then it isn’t too long before the website again becomes unreachable, and again a job running 200% must be killed. However, if we kill the job, then do ‘restart shiny-server’, then it seems to take much longer before the problem occurs again. However, let me state that we may be wrong about that, and doing ‘restart shiny-server’ might not actually have any effect upon how long it is before the problem re-occurs. It might simply be due to what people are running at the time.
Both servers are running CentOS 6.10 (Linux 2.6.32-696.23.1.el6.x86_64). They both have the same shiny-configuration, however the production system is using the xhr streaming protocol and the development one is using the web socket protocol.
It seems, we may have some type of memory of leak, however we don’t have any recursive indefinite call.
Note that after changing from R 3.4 to R 3.5.0 we did re-install all of the R packages and our internal packages from source, so we are not still using packages that were installed under R 3.4. Our internal packages were developed under a Mac OS machine and installed on the servers from source using the options –preclean –clean and –resave-data. Our packages have R and data folders with lazy loading requested for the data. The largest data that could be loaded is about 39Mb.
Do you have any idea or hint about this issue?
Thank you!