Unexpected crash while clustering with RStudio on ec2 (AWS)

I am experiencing crashes with RStudio on the ec2 while clustering with currently 32 cores using the package doSNOW. The problem keeps happening and the logs in RStudio and the awslogs show following problems:

The previous R session was abnormally terminated due to an unexpected crash. You may have lost workspace data as a result of this crash

I have tried a workaround found on the RStudio community page like this:

rm -rf ~/.rstudio

I restarted it, terminated the RStudio many times, but it didn't help. I change to a bigger instance: r4.8xlarge but the calculation couldn't be made either.

Apr 30 14:14:23 ip-172-31-46-102 rsession-rstudio[12984]: ERROR session hadabend; LOGGED FROM: rstudio::core::Error {anonymous}::rInit(const rstudio::r::session::RInitInfo&) /home/ubuntu/rstudio/src/cpp/session/SessionMain.cpp:563

This is the following code when the RStudio crashes:

# Clustering using gower distance and hclust()
d <- sapply(1:nrow(data), function(i) gower_dist(data[i,], data))
d <- as.dist(d)
h <- hclust(d)  # this causes error

Looks like this was also posted to SO a few days ago: Also posted here a few days ago: amazon web services - Unexpected crash while clustering with RStudio on ec2 (AWS) - Stack Overflow

With the solution,

The problem is solved - the hclust is not really suitable for big data. Replacing that by flashClust does not lead to a crash of RStudio anymore and the calculation was successful.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.