error-no-more-error-handlers-available-recursive-errors-invoking-abort-restart

mzoelck · February 1, 2023, 9:02pm

Good Afternoon,

I am trying to calculate the similarity between each row in a large data file (csv) with over 32000 rows. This is the code I am using:

mem.maxVSize(vsize = Inf)
mem.maxNSize(nsize = Inf)

Sys.setenv('R_MAX_VSIZE' = 40000000000000000)
Sys.setenv('R_MAX_NSIZE' = 20000000000000000)

Sys.setenv('R_MAX_MEM_SIZE' = Inf)

rm(list=ls())
gc()

library(dplyr)
row_cf <- function(x, y, df){
sum(df[x,] == df[y,])/ncol(df)
}
data<-read.csv("C:/Users/melanie/Documents/Data/KNNDataCulling.csv", header = TRUE)
data$Particle.Type<-as.factor(data$Particle.Type)
Water<-subset(data, Particle.Type=="WATER")
Water$Particle.Type<-droplevels(Water$Particle.Type)
rm(data)
gc()
Water$Corrected.Diameter.Pixels<-as.numeric(Water$Corrected.Diameter.Pixels)
Water$Contour.Slopes.Focus<-as.numeric(Water$Contour.Slopes.Focus)
Water$Center.Slopes.Focus<-as.numeric(Water$Center.Slopes.Focus)
Water$Hollowness<-as.numeric(Water$Hollowness)
Water$Ellipse.Best.Fit<-as.numeric(Water$Ellipse.Best.Fit)
Water$Ellipse.Minor.Major<-as.numeric(Water$Ellipse.Minor.Major)
Water$Ellipse.Angle<-as.numeric(Water$Ellipse.Angle)
Water$Contour.Circularity<-as.numeric(Water$Contour.Circularity)
Water$Convex.Hull.Circularity<-as.numeric(Water$Convex.Hull.Circularity)
Water$Box.H.W.Ratio<-as.numeric(Water$Box.H.W.Ratio)
Water$Angled.Box.H.W.Ratio<-as.numeric(Water$Angled.Box.H.W.Ratio)
Water <- Water[,-1]
gc()
results <- expand.grid(1:nrow(Water), 1:nrow(Water)) %>%
rename(row_1 = Var1, row_2 = Var2) %>%
rowwise() %>%
mutate(similarity = row_cf(row_1, row_2, Water))

This seems to work fine on a test data set with 100 rows or so, but when I run it on the larger data set, it runs fine for a while and then gives me the error: "no-more-error-handlers-available-recursive-errors-invoking-abort-restart". How can I avoid this from happening?

technocrat · February 1, 2023, 9:51pm

I’m not certain what this or the similar settings do. However R is in-memory and even if the process is running on some Godlike memory array, the OS won’t necessarily let the process take more than some limit—in the Unix derivatives typically about 8 gb

mzoelck · February 2, 2023, 9:06pm

Thank you for that. So I guess the question then becomes how do you calculate similarity between each row in a 32000 observations file without having R crash or abort. It has to be possible to do one would think?

technocrat · February 2, 2023, 9:49pm

Each row against each other row to produce a single measure of similarity for each pair will need about 1.46e48 of storage, and, what’s worse, how could it be reviewed.

Let’s see if we can reformulate the question.

f(x) = y

Where x is the data at hand, y is some derivation from x and f is the function (possibly composite) to accomplish this.

Beginning with the csv source data, there is a data frame of dim 32000, 12(?). Instead of expand what different function could be applied to that x to get us closer to the goal of y that lies beyond the apparent limitations imposed on expand?

system · March 16, 2023, 9:49pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.