Hi @nirgrahamuk,

in fact, i would to find a solution to make parallel the execution of these loops, where if it's possible obviously.

I would to know, how to set the loops in parallel if it's possible. I'm not an expert in R ahah

Hi @nirgrahamuk,

in fact, i would to find a solution to make parallel the execution of these loops, where if it's possible obviously.

I would to know, how to set the loops in parallel if it's possible. I'm not an expert in R ahah

The most a parellising improvemnt could give you would be a linear speed up.

rather than running on 1 core, running on 12 cores could make code execute 12 times faster ( under perfect conditions that are typically not possible to acheive)

We should really profile your code on small example data, to estimate the current full runtime.

if the runtime is expected to be 1 day on 1 core, then you could hope that you might approach some fraction of completing in a couple of hours on 12 (even though that would be somewhat idealistic).

If the runtime is expected to be 1 year on 1 core, then running on 12 might mean you could have some justification to expect on the order of a month or two if you sucessfully parallelise.

I wouldnt waste energy parallelising without trying to make the most efficient algorithm, and this would include heuristics for avoiding comparisons wherever possible. and i wouldnt begin parallelisation before I could reasonable estimate that my code would complete in reasonable finite time, if i was succesfull at doing it.

1 Like

here is an example of someone profiling code to achieve faster runtime, they had similar challenge of combinatorial explosion, but given the speed improvements we made to inner loop of the calculation the problem became tractable.

1 Like

@nirgrahamuk, I read the discussion, interesting.

here there's a little visualization of my data. Considere that rowx and rowy in dei weight function select only presence ontologies and next i compare all genes from my dataframe per ontologies. So, the selection of ontologies in rowx and rowy are minium not contain 22784 ontologies.

Thanks for these answers!

At the same time in V1 and V2, I have the right combination, so it's impossible have overlaps

without having hands on any example of data its all a bit too abstract.

images only show a very limited view, and I cannot apply code to them.

1 Like

i can't show all , R can't display it @nirgrahamuk

.....

`3471019`

= 2292575L, `3471020`

= 2292576L, `3471021`

= 2292577L,

`3471022`

= 2292578L, `3471023`

= 2292579L, `3471024`

= 2292580L,

`3471025`

= 2292581L, `3471026`

= 2292582L, `3471027`

= 2292583L,

`3471028`

= 2292584L, `3471029`

= 2292585L, `3471030`

= 2292586L,

`3471031`

= 2292587L, `3471032`

= 2292588L, `3471033`

= 2292589L,

`3471034`

= 2292590L, `3471035`

= 2292591L, `3471036`

= 2292592L,

`3471037`

= 2292593L, `3471038`

= 2292594L, `3471039`

= 2292595L,

`3471040`

= 2292596L, `3471041`

= 2292597L, `3471042`

= 2292598L,

`3471043`

= 2292599L, `3471044`

= 2292600L, `3471045`

= 2292601L,

`3471046`

= 2292602L, `3471047`

= 2292603L, `3471048`

= 2292604L,

`3471049`

= 2292605L, `3471050`

= 2292606L, `3471051`

= 2292607L,

`3471052`

= 2292608L, `3471053`

= 2292609L, `3471054`

= 2292610L,

`3471055`

= 2292611L, `3471056`

= 2292612L, `3471057`

= 2292613L,

`3471058`

= 2292614L, `3471059`

= 2292615L, `3471060`

= 2292616L,

`3471061`

= 2292617L, `3471062`

= 2292618L, `3471063`

= 2292619L,

`3471064`

= 2292620L, `3471065`

= 2292621L, `3471066`

= 2292622L,

`3471067`

= 2292623L, `3471068`

= 2292624L, `3471069`

= 2292625L,

`3471070`

= 2292626L, `3471071`

= 2292627L, `3471072`

= 2292628L,

`3471073`

= 2292629L, `3471074`

= 2292630L, `3471075`

= 2292631L,

`3471076`

= 2292632L, `3471077`

= 2292633L, `3471078`

= 2292634L,

`3471079`

= 2292635L, `3471080`

= 2292636L, `3471081`

= 2292637L,

`3471082`

= 2292638L, `3471083`

= 2292639L, `3471084`

= 2292640L,

`3471085`

= 2292641L, `3471086`

= 2292642L, `3471087`

= 2292643L,

`3471088`

= 2292644L, `3471089`

= 2292645L, `3471090`

= 2292646L,

`3471091`

= 2292647L, `3471092`

= 2292648L, `3471093`

= 2292649L,

`3471094`

= 2292650L, `3471095`

= 2292651L, `3471096`

= 2292652L,

`3471097`

= 2292653L, `3471098`

= 2292654L, `3471099`

= 2292655L,

`3471100`

= 2292656L, `3471101`

= 2292657L, `3471102`

= 2292658L,

`3471103`

= 2292659L, `3471104`

= 2292660L, `3471105`

= 2292661L,

`3471106`

= 2292662L, `3471107`

= 2292663L), class = "omit"), row.names = c(1L,

77L, 347L, 384L, 425L, 619L, 817L, 924L, 1233L, 1620L, 2133L,

2660L, 2981L, 3152L, 3297L, 3419L, 3636L, 5194L, 5436L, 5741L,

5856L, 6120L, 6388L, 6763L, 7162L, 7452L, 7710L, 7887L, 7985L,

8313L, 8690L, 8834L, 9039L, 9334L, 9723L, 10119L, 10372L, 10578L,

10749L, 11087L, 11145L, 11228L, 11423L, 11708L, 11731L, 11950L,

12187L, 12698L, 12949L, 13247L), class = "data.frame")

I'm afraid your brackets dont match up here, so that code isnt runnable.

try using head() with dput() to select a portion.

`dput(head(Id_GeneName3,n=50))`

I done that @nirgrahamuk, I had that huge code and however with one row...

I systemed the code and I increase the speed, but i should preallocate rowx and rowy:

```
system.time({
n <- nrow(Id_GeneNameTwoGenes)
nms <- rownames(Id_GeneNameTwoGenes)
V1 <- rep(nms[1:(n-1)],seq(from=n-1, to = 1, by = -1))
V2 <- unlist(lapply(1:(n-1), function(i)(nms[(i+1):n])))
similarity.matrix <- data.frame(source=V1,dest=V2)
weight <- apply(similarity.matrix, 1, function(row) {
#preallocate list of vect
rowx <- colnames(Id_GeneNameTwoGenes[row["source"],which(Id_GeneNameTwoGenes[row["source"],] == 1 & colnames(Id_GeneNameTwoGenes) != "ENTREZID")])
rowy <- colnames(Id_GeneNameTwoGenes[row["dest"],which(Id_GeneNameTwoGenes[row["dest"],] == 1 & colnames(Id_GeneNameTwoGenes) != "ENTREZID")])
weight2 <- sapply(tryA, function(k) (
sapply(tryB, function(w) {
sh <- distance.matrix[k,w]
if ( sh != "Inf" )
sh
else
NA
})
))
mean(weight2, na.rm = TRUE)
})
similarity.matrix$weight <- weight
similarity.matrix$source <- Id_GeneNameTwoGenes[V1,1]
similarity.matrix$dest <- Id_GeneNameTwoGenes[V2,1]
q <- quantile(unlist(similarity.matrix), probs = c(.25, .5, .75))
filter.matrix.25 <- similarity.matrix[which(similarity.matrix$weight >= q[1]),]
filter.matrix.50 <- similarity.matrix[which(similarity.matrix$weight >= q[2]),]
filter.matrix.75 <- similarity.matrix[which(similarity.matrix$weight >= q[3]),]
})
```

Ok sure you had one row with however many thousands of columns. I'm sure you could figure a way to reduce it if it was a priority.

Bets of luck.

I changed the solution, but now i wanted to change the inf values in the matrix with NA values. Next one hour R studio is crashed! Is there any fast solution to change the cells value in inf to NA????

@nirgrahamuk i tried with this solution, but is crashed:

is.na(distance.matrix.setNA) <- sapply(distance.matrix.setNA, is.infinite)

the distance.matrix.setNA is a matrix that has 22792 rows and cells..

EDIT: I'm using

```
distance.matrix.setNA[sapply(distance.matrix.setNA, is.infinite)] <- NA
```

you're simply trying to chew on too much data. the time to process that matrix is subject to On2.

You need to decide to calculate on less at a time, or find a supercomputer

@nirgrahamuk

There's a bug of R...... but program still running..

How to break the Bug? i want to see my lines.....!

its in this way since 3 days

start small. make a robust process. profile its timing, estimate its scalability.

These steps are all important on your way to addressing a *potential* monster task, and they might encourage you to abandon your dreams, or put a price on it. I.e. you might estimate how much cloud compute you might need to execute in reasonable time, and see how much that would cost on azure.

If you supply a small reprex, then the community on the forum could begin to attempt to consider how they might address your challenge.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners

1 Like

Hi,

I executed in chuncks of 10 milion of rows. Now i've the dataset!

congratulations, well done

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.