Continuing the discussion from How do I ask for help solving an Rstudio crash:
Following the advice given in my previous question, I have collected the log and backtrace data generated during my last crash.
The crash occurs during a series of simulations. The simulations all run fine independently but after a certain period of time always crash. When the crash occurs Rstudio exits.
Along with those who helped in my previous question I believe the problem may be related to memory leakage or something similar. However, I have tried sticking gc() all over the place and it didn't help.
I also ran the simulation in the terminal this did not crash.
I cannot leave my computer to run on its own as it will crash and I need to restart the calculations from last point to finish each simulation set. This is obviously really annoying.
Is there any clue from my logs what the problem is and how to solve it?
Below are details about my setup and session. I was hugely exceeding the character limit so have only included the log entry for the crash and the last (several hundred) lines of the backtrace. The backtrace goes on for ages as the simulation executes for several hours without a problem. If more info is required I will supply.
Rstudio version is : 1.1.383
The log entry for the crash is
18 Jun 2018 18:56:29 [rsession-jonno] ERROR Parent terminated; LOGGED FROM: void {anonymous}::detectParentTermination() /home/ubuntu/rstudio/src/cpp/session/SessionMain.cpp:1239
Backtrace crash report
#1067 0x00007f909c96f1d8 in Rf_eval () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1068 0x00007f909c97166e in ?? () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1069 0x00007f909c96f3a2 in Rf_eval () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1070 0x00007f909c998892 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1071 0x00007f909c998c91 in ?? () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1072 0x00007f909c998d48 in run_Rmainloop () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1073 0x0000000000e4098f in rstudio::r::session::runEmbeddedR(rstudio::core::FilePath const&, rstudio::core::FilePath const&, bool, bool, SA_TYPE, rstudio::r::session::Callbacks const&, rstudio::r::session::InternalCallbacks*) ()
No symbol table info available.
#1074 0x0000000000e1ef3f in rstudio::r::session::run(rstudio::r::session::ROptions const&, rstudio::r::session::RCallbacks const&) ()
No symbol table info available.
#1075 0x0000000000718a83 in main ()
No symbol table info available.
Thread 3 (Thread 0x7f90971a8700 (LWP 11264)):
#0 0x00007f909c447f85 in futex_abstimed_wait_cancelable (
private=<optimised out>, abstime=0x7f909717ade0, expected=0,
futex_word=0x7f9090000ba4)
at ../sysdeps/unix/sysv/linux/futex-internal.h:205
__ret = -516
oldtype = 0
err = <optimised out>
oldtype = <optimised out>
err = <optimised out>
__ret = <optimised out>
resultvar = <optimised out>
__arg6 = <optimised out>
__arg5 = <optimised out>
__arg4 = <optimised out>
__arg3 = <optimised out>
__arg2 = <optimised out>
__arg1 = <optimised out>
_a6 = <optimised out>
_a5 = <optimised out>
_a4 = <optimised out>
_a3 = <optimised out>
_a2 = <optimised out>
_a1 = <optimised out>
#1 __pthread_cond_wait_common (abstime=0x7f909717ade0, mutex=0x7f9090000b50,
cond=0x7f9090000b78) at pthread_cond_wait.c:539
spin = 0
buffer = {__routine = 0x7f909c447690 <__condvar_cleanup_waiting>,
__arg = 0x7f909717ad40, __canceltype = 0, __prev = 0x0}
cbuffer = {wseq = 102933, cond = 0x7f9090000b78,
mutex = 0x7f9090000b50, private = 0}
err = <optimised out>
g = 1
flags = <optimised out>
g1_start = <optimised out>
maxspin = 0
signals = <optimised out>
result = 0
wseq = <optimised out>
seq = 51466
private = <optimised out>
maxspin = <optimised out>
err = <optimised out>
result = <optimised out>
wseq = <optimised out>
g = <optimised out>
seq = <optimised out>
flags = <optimised out>
private = <optimised out>
signals = <optimised out>
g1_start = <optimised out>
spin = <optimised out>
buffer = <optimised out>
cbuffer = <optimised out>
rt = <optimised out>
s = <optimised out>
#2 __pthread_cond_timedwait (cond=0x7f9090000b78, mutex=0x7f9090000b50,
abstime=0x7f909717ade0) at pthread_cond_wait.c:667
No locals.
#3 0x0000000000d49ff9 in rstudio::core::thread::ThreadsafeQueue<rstudio::core::system::file_monitor::(anonymous namespace)::RegistrationCommand>::wait(rstudio_boost::posix_time::time_duration const&) [clone .isra.544] ()
No symbol table info available.
#4 0x0000000000d4b382 in rstudio::core::system::file_monitor::(anonymous namespace)::checkForInput() ()
No symbol table info available.
#5 0x0000000000d93303 in rstudio::core::system::file_monitor::detail::run(rstudio_boost::function<void ()> const&) ()
No symbol table info available.
#6 0x0000000000d4a55f in rstudio::core::system::file_monitor::(anonymous namespace)::fileMonitorThreadMain() ()
No symbol table info available.
#7 0x0000000000c1bba0 in rstudio_boost::detail::thread_data<rstudio_boost::function<void ()> >::run() ()
No symbol table info available.
#8 0x0000000000ed7aa5 in thread_proxy ()
No symbol table info available.
#9 0x00007f909c4416db in start_thread (arg=0x7f90971a8700)
at pthread_create.c:463
pd = 0x7f90971a8700
now = <optimised out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140258987116288,
2027827466652656333, 140258987114176, 0, 36619680,
140735947838944, -2088152476999179571, -2088136552615468339},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimised out>
#10 0x00007f909aff788f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
Thread 2 (Thread 0x7f90961a6700 (LWP 11266)):
#0 0x00007f909aff7bb7 in epoll_wait (epfd=4, events=0x7f90961a5680,
maxevents=128, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
resultvar = 18446744073709551612
sc_cancel_oldtype = 0
sc_ret = <optimised out>
#1 0x000000000079a03a in rstudio_boost::asio::detail::epoll_reactor::run(bool, rstudio_boost::asio::detail::op_queue<rstudio_boost::asio::detail::task_io_service_operation>&) ()
No symbol table info available.
#2 0x0000000000bc190c in rstudio_boost::asio::io_service::run() ()
No symbol table info available.
#3 0x0000000000ed7aa5 in thread_proxy ()
No symbol table info available.
#4 0x00007f909c4416db in start_thread (arg=0x7f90961a6700)
at pthread_create.c:463
pd = 0x7f90961a6700
now = <optimised out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140258970330880,
2027827466652656333, 140258970328768, 0, 36615216,
140735947838576, -2088150276902182195, -2088136552615468339},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimised out>
#5 0x00007f909aff788f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
Thread 1 (Thread 0x7f90969a7700 (LWP 11265)):
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
set = {__val = {18446744067266838239, 0, 0, 140259062069941, 0,
10836527976718403072, 140258263961904, 139, 139, 2, 22010728,
22010064, 140258263960640, 140258263960640, 140258263960640,
140258263960624}}
pid = <optimised out>
tid = <optimised out>
ret = <optimised out>
#1 0x00007f909af16801 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x5b00000000,
sa_sigaction = 0x5b00000000}, sa_mask = {__val = {140258978720472,
140259065106592, 140259065106480, 140259065106496,
10836527976718403072, 0, 140258978720912, 140258978720992,
10836527976718403072, 0, 10836527976718403072, 4294967295,
140258263960560, 4294967295, 10836527976718403072, 0}},
sa_flags = -1768264432, sa_restorer = 0x7f90969a6ce0}
sigs = {__val = {32, 0 <repeats 15 times>}}
__cnt = <optimised out>
__set = <optimised out>
__cnt = <optimised out>
__set = <optimised out>
#2 0x0000000000d6fa49 in rstudio::core::system::abort() ()
No symbol table info available.
#3 0x00000000007f952d in (anonymous namespace)::detectParentTermination() ()
No symbol table info available.
#4 0x0000000000c1bba0 in rstudio_boost::detail::thread_data<rstudio_boost::function<void ()> >::run() ()
No symbol table info available.
#5 0x0000000000ed7aa5 in thread_proxy ()
No symbol table info available.
#6 0x00007f909c4416db in start_thread (arg=0x7f90969a7700)
at pthread_create.c:463
pd = 0x7f90969a7700
now = <optimised out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140258978723584,
2027827466652656333, 140258978721472, 0, 36620512,
140735947839024, -2088149179001167155, -2088136552615468339},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimised out>
#7 0x00007f909aff788f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
Session Info
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
EDIT:
I seem to have solved the problem but am not sure why the walk function causes the crash, see my comments below.