Error cannot allocate vector of size 65.2 Gb

Hello everyone,

I would like to do a classification but I have this error :

cah_souscrip_4comp<-HCPC(acp_souscrip_4comp)
Error: cannot allocate vector of size 65.2 Gb

The size of my file is :

object.size(acp_souscrip_4comp)
65628440 bytes

So almost 66Mb. How this could need to allocate a vector bigger than 65.2 Gb ?

I'm using 64-bit version of RStudio. I've tried to use various packages such as bigmemory (even it's not that big), to change the limit using memory.limit() :

memory.limit()
[1] 1e+10

Always same error. I don't find any solution on forums, except the ones I've tried but didn't work for me.

Numbers of rows : 132 265 obs. I would like to keep all of them if possible, I don't feel like this is huge.

Could you help me please ?

Emilie.

If you sample your data down to 50% before running HPCP what happens ?
Also, what package is HPCP from? It might be an implementation issue

You're right :

test<-souscrip_analyse[1:50000,c(6:11)]
test_acp<-PCA(test)
test_cah<-HCPC(test_acp,iter.max = 10, ncp=4)
Error: cannot allocate vector of size 18.6 Gb

It worked for 1 000, I'm trying for 20 000 and I have performance issue.

I haven't done statistics for a while, but I'm sad having 130 000 obs and using maybe less than 10 000. I don't feel like 130 000 obs is a huge of data. Can't R handle it ? I would agree for 100 000, that won't change the results I guess.
Or may I do something wrong ? I followed some tutoriels for the implementation, but of course with example data there is no performance issue.

Thanks for the help !

Emilie

You could try to contact the package authors for advice?

How about trying to do a PCA with prcomp and then stats::hclust on that?

To me this implies trying to do a pca decomposition and clustering on a pca

Also , I think that often when there is a problem of too much data, one can try a bootstrapping approach.

Finally, there is a possibility of paying for compute on the cloud for temporary access to more Gb than you normally would use.

I will look at prcomp method, hclust I had the same pb but I will try with prcomp first.

The method I wanted to apply is to classify the population depending of the main variables. I studied this method a long time ago and I've seen this recently on books or online, but they never talk about the limit of size. I didn't think that 100 000 rows for around 10 var would be too much data.

I'm gonna try a bootstrapping approach also.

Oh I didn't answer for the package of HCPC : it's FactoMineR.

Thank you so much for your clues !

Emilie

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.