Rstudio crash during simulation

Continuing the discussion from How do I ask for help solving an Rstudio crash:

Following the advice given in my previous question, I have collected the log and backtrace data generated during my last crash.

The crash occurs during a series of simulations. The simulations all run fine independently but after a certain period of time always crash. When the crash occurs Rstudio exits.
Along with those who helped in my previous question I believe the problem may be related to memory leakage or something similar. However, I have tried sticking gc() all over the place and it didn't help.
I also ran the simulation in the terminal this did not crash.

I cannot leave my computer to run on its own as it will crash and I need to restart the calculations from last point to finish each simulation set. This is obviously really annoying.

Is there any clue from my logs what the problem is and how to solve it?

Below are details about my setup and session. I was hugely exceeding the character limit so have only included the log entry for the crash and the last (several hundred) lines of the backtrace. The backtrace goes on for ages as the simulation executes for several hours without a problem. If more info is required I will supply.

Rstudio version is : 1.1.383

The log entry for the crash is

18 Jun 2018 18:56:29 [rsession-jonno] ERROR Parent terminated; LOGGED FROM: void {anonymous}::detectParentTermination() /home/ubuntu/rstudio/src/cpp/session/SessionMain.cpp:1239
Backtrace crash report
#1067 0x00007f909c96f1d8 in Rf_eval () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1068 0x00007f909c97166e in ?? () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1069 0x00007f909c96f3a2 in Rf_eval () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1070 0x00007f909c998892 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1071 0x00007f909c998c91 in ?? () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1072 0x00007f909c998d48 in run_Rmainloop () from /usr/lib/R/lib/libR.so
No symbol table info available.
#1073 0x0000000000e4098f in rstudio::r::session::runEmbeddedR(rstudio::core::FilePath const&, rstudio::core::FilePath const&, bool, bool, SA_TYPE, rstudio::r::session::Callbacks const&, rstudio::r::session::InternalCallbacks*) ()
No symbol table info available.
#1074 0x0000000000e1ef3f in rstudio::r::session::run(rstudio::r::session::ROptions const&, rstudio::r::session::RCallbacks const&) ()
No symbol table info available.
#1075 0x0000000000718a83 in main ()
No symbol table info available.

Thread 3 (Thread 0x7f90971a8700 (LWP 11264)):
#0  0x00007f909c447f85 in futex_abstimed_wait_cancelable (
    private=<optimised out>, abstime=0x7f909717ade0, expected=0, 
    futex_word=0x7f9090000ba4)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
        __ret = -516
        oldtype = 0
        err = <optimised out>
        oldtype = <optimised out>
        err = <optimised out>
        __ret = <optimised out>
        resultvar = <optimised out>
        __arg6 = <optimised out>
        __arg5 = <optimised out>
        __arg4 = <optimised out>
        __arg3 = <optimised out>
        __arg2 = <optimised out>
        __arg1 = <optimised out>
        _a6 = <optimised out>
        _a5 = <optimised out>
        _a4 = <optimised out>
        _a3 = <optimised out>
        _a2 = <optimised out>
        _a1 = <optimised out>
#1  __pthread_cond_wait_common (abstime=0x7f909717ade0, mutex=0x7f9090000b50, 
    cond=0x7f9090000b78) at pthread_cond_wait.c:539
        spin = 0
        buffer = {__routine = 0x7f909c447690 <__condvar_cleanup_waiting>, 
          __arg = 0x7f909717ad40, __canceltype = 0, __prev = 0x0}
        cbuffer = {wseq = 102933, cond = 0x7f9090000b78, 
          mutex = 0x7f9090000b50, private = 0}
        err = <optimised out>
        g = 1
        flags = <optimised out>
        g1_start = <optimised out>
        maxspin = 0
        signals = <optimised out>
        result = 0
        wseq = <optimised out>
        seq = 51466
        private = <optimised out>
        maxspin = <optimised out>
        err = <optimised out>
        result = <optimised out>
        wseq = <optimised out>
        g = <optimised out>
        seq = <optimised out>
        flags = <optimised out>
        private = <optimised out>
        signals = <optimised out>
        g1_start = <optimised out>
        spin = <optimised out>
        buffer = <optimised out>
        cbuffer = <optimised out>
        rt = <optimised out>
        s = <optimised out>
#2  __pthread_cond_timedwait (cond=0x7f9090000b78, mutex=0x7f9090000b50, 
    abstime=0x7f909717ade0) at pthread_cond_wait.c:667
No locals.
#3  0x0000000000d49ff9 in rstudio::core::thread::ThreadsafeQueue<rstudio::core::system::file_monitor::(anonymous namespace)::RegistrationCommand>::wait(rstudio_boost::posix_time::time_duration const&) [clone .isra.544] ()
No symbol table info available.
#4  0x0000000000d4b382 in rstudio::core::system::file_monitor::(anonymous namespace)::checkForInput() ()
No symbol table info available.
#5  0x0000000000d93303 in rstudio::core::system::file_monitor::detail::run(rstudio_boost::function<void ()> const&) ()
No symbol table info available.
#6  0x0000000000d4a55f in rstudio::core::system::file_monitor::(anonymous namespace)::fileMonitorThreadMain() ()
No symbol table info available.
#7  0x0000000000c1bba0 in rstudio_boost::detail::thread_data<rstudio_boost::function<void ()> >::run() ()
No symbol table info available.
#8  0x0000000000ed7aa5 in thread_proxy ()
No symbol table info available.
#9  0x00007f909c4416db in start_thread (arg=0x7f90971a8700)
    at pthread_create.c:463
        pd = 0x7f90971a8700
        now = <optimised out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140258987116288, 
                2027827466652656333, 140258987114176, 0, 36619680, 
                140735947838944, -2088152476999179571, -2088136552615468339}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimised out>
#10 0x00007f909aff788f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 2 (Thread 0x7f90961a6700 (LWP 11266)):
#0  0x00007f909aff7bb7 in epoll_wait (epfd=4, events=0x7f90961a5680, 
    maxevents=128, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimised out>
#1  0x000000000079a03a in rstudio_boost::asio::detail::epoll_reactor::run(bool, rstudio_boost::asio::detail::op_queue<rstudio_boost::asio::detail::task_io_service_operation>&) ()
No symbol table info available.
#2  0x0000000000bc190c in rstudio_boost::asio::io_service::run() ()
No symbol table info available.
#3  0x0000000000ed7aa5 in thread_proxy ()
No symbol table info available.
#4  0x00007f909c4416db in start_thread (arg=0x7f90961a6700)
    at pthread_create.c:463
        pd = 0x7f90961a6700
        now = <optimised out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140258970330880, 
                2027827466652656333, 140258970328768, 0, 36615216, 
                140735947838576, -2088150276902182195, -2088136552615468339}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimised out>
#5  0x00007f909aff788f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 1 (Thread 0x7f90969a7700 (LWP 11265)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {18446744067266838239, 0, 0, 140259062069941, 0, 
            10836527976718403072, 140258263961904, 139, 139, 2, 22010728, 
            22010064, 140258263960640, 140258263960640, 140258263960640, 
            140258263960624}}
        pid = <optimised out>
        tid = <optimised out>
        ret = <optimised out>
#1  0x00007f909af16801 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x5b00000000, 
            sa_sigaction = 0x5b00000000}, sa_mask = {__val = {140258978720472, 
              140259065106592, 140259065106480, 140259065106496, 
              10836527976718403072, 0, 140258978720912, 140258978720992, 
              10836527976718403072, 0, 10836527976718403072, 4294967295, 
              140258263960560, 4294967295, 10836527976718403072, 0}}, 
          sa_flags = -1768264432, sa_restorer = 0x7f90969a6ce0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
        __cnt = <optimised out>
        __set = <optimised out>
        __cnt = <optimised out>
        __set = <optimised out>
#2  0x0000000000d6fa49 in rstudio::core::system::abort() ()
No symbol table info available.
#3  0x00000000007f952d in (anonymous namespace)::detectParentTermination() ()
No symbol table info available.
#4  0x0000000000c1bba0 in rstudio_boost::detail::thread_data<rstudio_boost::function<void ()> >::run() ()
No symbol table info available.
#5  0x0000000000ed7aa5 in thread_proxy ()
No symbol table info available.
#6  0x00007f909c4416db in start_thread (arg=0x7f90969a7700)
    at pthread_create.c:463
        pd = 0x7f90969a7700
        now = <optimised out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140258978723584, 
                2027827466652656333, 140258978721472, 0, 36620512, 
                140735947839024, -2088149179001167155, -2088136552615468339}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimised out>
#7  0x00007f909aff788f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
Session Info
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

EDIT:
I seem to have solved the problem but am not sure why the walk function causes the crash, see my comments below.

1 Like

there's not much in those logs, but having done a lot of simulations, I too suspect that you are either appending to an object so it gets really big, or you possibly are creating a bunch of little objects that hang around after each sim. Neither of these would be addressed by gc() as they both create legit objects.

One way to debug is to dump out a list (either to screen or file) of all objects and their size. Something like this:

.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.prettysize <- napply(names, function(x) {
                           format(utils::object.size(x), units = "auto") })
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
    names(out) <- c("Type", "Size", "PrettySize", "Length/Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}

# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

lsos()

After each sim, call the lsos() and watch it for a few simulations. Are you getting more objects? Are your objects getting bigger after each sim?

Function above stolen from an old SO question about tracking object size: https://stackoverflow.com/a/4827843/37751

2 Likes

Hi JD
Thanks for the function, that is a much nicer way to look at object sizes. I ran the function inside of the loop but becuase nothing was being assigned outside of the function environment I couldn't see any object growth. Is there a way of getting it to show object size indside the environment?

I appear to have solved the problem by changing from using to walk to using a for loop.
As walk is part of the map family I wondered if it was storing information until the walk was completed and so running out of memory. I figured that by changing to a for loop this wouldn't happen so would avoid crashing. I ran a test example and it successfully completed 100 simulations without crashing. Previously 100 simulations would have crashed at least twice.

I don't know if the reason given above was why it crashed but here is some dummy code as an explanation of the function.

If the walk version appends all the BigObject untill the whole process is completed then by the time it got to about 30 simulations it would be holding an object of about 1.5Gb, in contrast the for loop only ever holds 50mb. Does this make sense or is something else happening?

WalkFunction <- function(Input1, Input2, Input3, folder){

  gc()
  1:nrow(Input1) %>% walk(~{

    #These are small objects, and take up very little memory
    SimNumber <- SomeFunction1(Input1) 
    Input5 <- SomeFunction2(Input2)
    Input6 <- SomeFunction3(Input13)
    
    #This takes up about 50mb
    BigObject <- MakesBigObject(Input3)

    saveRDS(BigObject, file = file.path(folder, paste0(SimNumber, ".rds")))
    rm(BigObject)
   
    gc()
    })
  }
ForLoopFunction <- function(Input1, Input2, Input3, folder){

  gc()
    for (i in 1:nrow(Input1)) ~{

    #These are small objects, and take up very little memory
    SimNumber <- SomeFunction1(Input1) 
    Input5 <- SomeFunction2(Input2)
    Input6 <- SomeFunction3(Input13)
    
    #This takes up about 50mb
    BigObject <- MakesBigObject(Input3)

    saveRDS(BigObject, file = file.path(folder, paste0(SimNumber, ".rds")))
    rm(BigObject)
   
    gc()
    }
  }

Really good question. I had never tried to run lsos() inside of a walk so I didn't realize (although I should have) that lsos() only returns the main environment and walk runs each step in a different environment.

If anyone else would like to experiment, here's a reprex of lsos() inside of a walk vs inside of a loop. But @JonnoB I was not able to reproduce the memory issue. When I run the code below the walk bit shows no memory creep. Can you alter the reprex below to make it creep like your code was? If we can reproduce it, we can probably understand it.


library(purrr)

.ls.objects <- function (pos = 1, pattern, order.by,
                         decreasing=FALSE, head=FALSE, n=5) {
  napply <- function(names, fn) sapply(names, function(x)
    fn(get(x, pos = pos)))
  names <- ls(pos = pos, pattern = pattern)
  obj.class <- napply(names, function(x) as.character(class(x))[1])
  obj.mode <- napply(names, mode)
  obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
  obj.prettysize <- napply(names, function(x) {
    format(utils::object.size(x), units = "auto") })
  obj.size <- napply(names, object.size)
  obj.dim <- t(napply(names, function(x)
    as.numeric(dim(x))[1:2]))
  vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
  obj.dim[vec, 1] <- napply(names, length)[vec]
  out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
  names(out) <- c("Type", "Size", "PrettySize", "Length/Rows", "Columns")
  if (!missing(order.by))
    out <- out[order(out[[order.by]], decreasing=decreasing), ]
  if (head)
    out <- head(out, n)
  out
}

# shorthand
lsos <- function(..., n=10) {
  .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

MakesBigObject <- function(seed){
  set.seed(seed)
  n <- 5e6
  df <- data.frame(a=rnorm(n), b=rnorm(n), c=rnorm(n))
  return(df)
}

Input3 <- 3

1:3 %>% walk(~{

  BigObject <- MakesBigObject(Input3)
  saveRDS(BigObject, file = paste0(.x, ".rds"))

  print('done with making BigObject')
  print(lsos())

  rm(BigObject)
  print('deleted BigObject')

  print(lsos())

  gc()
})

for (i in 1:3) {

  BigObject <- MakesBigObject(Input3)
  saveRDS(BigObject, file = paste0(i, ".rds"))

  print('done with making BigObject')
  print(lsos())

  rm(BigObject)
  print('deleted BigObject')

  print(lsos())

  gc()
}


2 Likes

I couldn't get lsos to work inside of a walk. I also couldn't make your very nice reprex crash either. I have been looking at my original function but am just not sure how I can change the reprex you made as it seems to be such a good representation. I tried the following, but it didn't crash

MakesBigObject <- function(seed, n = 5e6){
  set.seed(seed)
  df <- data.frame(a=rnorm(n), b=rnorm(n), c=rnorm(n))
  return(df)
}

Input3 <- 3

1:100 %>% walk(~{

  
  BigObject <- MakesBigObject(Input3, n = 5e6)
  saveRDS(BigObject, file = paste0(.x, ".rds"))

  OtherObject <- MakesBigObject(Input3, n=5e6)
  
  
  print('done with making BigObject')
  print(lsos())

  rm(BigObject)
  print('deleted BigObject')

  print(lsos())

  gc()
})