Parallel Computing problem

raheelasif · August 19, 2019, 11:16pm

Dear Experts,

First of all thanks for reading my question.

I have a series says "x" comprises up of 1.8 million observations collected at an interval of 5 minute for a period of 18 years. I would like to shuffle this series for say 100 times such that the series x1 to x100 of x is generated by a function in an array object abc:

abc<-replicate(100,sample(x,replace=TRUE))

now i have a big data object abc with dimension :

dim(abc) = (1.8million,1,100)

Now i wanted to compute hurst exponent of this series using DFA method. DFA function in package fractal is used. Such that i make a function called DFAfunction as under:

DFAfunction<-function(x){DFA(x,detrend="ploy1",sum.order=0, overlap=0, scale.max=trunc(length(x)/1),scale.min=NULL,scale.ratio = 2, verbose=FALSE)}

Now tried to calculate this DFA hurst exponent by a function as under for 100 series:

DFA_SS_FS<-apply(abc,3,function(x){DFAfunction(x)})

But this function is taking almost 10 days to compute results.

Then i read somewhere about parallel package and used parApply function, like this:

library("parallel")
no_cores<-detectcore(logical=F)-1
c1<-makecluster(no_cores)
parallel::clusterEvalQ(c1, c("PerformanceAnalytics","fractal" ))
DFA_SS_FS<- parApply(c1, abc,FUN =  apply(abc,3,function(x){DFAfunction(x)})))

Again it is taking too much of time as it is running from last 3 days and function running.

Anyone can help me with writing a code or giving me direction to run my code faster.

Please bear in mind that i have started using R- couple of months ago so i am a new user.

Please help,

Regards,

Raheel Asif

andresrcs · August 19, 2019, 11:34pm

A post was merged into an existing topic: Parallel Computing for Large Dataset

andresrcs · August 19, 2019, 11:35pm

Please do not duplicate open topics