difference between apply and for loop...

Jeremy98-alt · April 4, 2021, 10:44am

Hi guys,
why with this two kinds of code the results are different?

          ego.matrix <- t(apply(ego.matrix, 1, function(x){
            gene <- x["gene"]
            out <- coeff.clustering(nt_graph, gene)
            out2 <- coeff.clustering(nt2_graph, gene)
            out3 <- coeff.clustering(nt3_graph, gene)
            return(c(gene, out, out2, out3))
          }))

          colnames(ego.matrix)[2] <- "tcga.controls"
          colnames(ego.matrix)[3] <- "tcga.tumors"
          colnames(ego.matrix)[4] <- "gtex.brca"

the execution of this code with the apply is only to replace the NA value of the empty dataframe with the clustering coefficient... I made this type of code also with the foor loop for seeing the difference in computational terms...

          for(idx in rownames(ego.matrix)) {
            ego.matrix[as.numeric(idx), "tcga.controls"] <- coeff.clustering(nt_graph, ego.matrix[as.numeric(idx), "gene"])
            ego.matrix[as.numeric(idx), "tcga.tumors"] <- coeff.clustering(nt2_graph, ego.matrix[as.numeric(idx), "gene"])
            ego.matrix[as.numeric(idx), "gtex.brca"] <- coeff.clustering(nt3_graph, ego.matrix[as.numeric(idx), "gene"])
          }

... But, in this case the result was pretty different few rows changed the values..... why? (this solution is correct in terms of results, not in the case of the apply mode...

Another difference is.... the type of variable; with the apply the variable ego.matrix was a character.... instead, with the foor loop a list.... WHY?

.. meanwhile, Happy Easter to everyone

Kind regards,
JS

Jeremy98-alt · April 5, 2021, 11:07am

Any answer? this is something anomal for me I'm not an expert on R studio, so I want to understand why this is the behavior of my two codes.

nirgrahamuk · April 5, 2021, 11:12am

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Jeremy98-alt · April 5, 2021, 11:25am

Ok, I reported the codes above about the codes that show something strange.... mainly, the apply I believe that my for loop code, works well.

The data are capturing from a graph created by igraph:

library(igraph)
print(nt_graph)
IGRAPH 9f2d13d DN-- 10 33 -- 
+ attr: name (v/c), correlation (e/n)
+ edges from 9f2d13d (vertex names):
 [1] A1BG  ->A1CF    A1BG  ->A2ML1   A1BG  ->AAAS    A1BG  ->AACS    A1BG  ->AADAC  
 [6] A1BG  ->AADACL2 A1CF  ->A2ML1   A1CF  ->A4GALT  A1CF  ->AAAS    A1CF  ->AACS   
[11] A1CF  ->AADAC   A1CF  ->AADACL2 A2M   ->A2ML1   A2M   ->A4GALT  A2M   ->A4GNT  
[16] A2M   ->AAAS    A2M   ->AADAC   A2M   ->AADACL2 A2ML1 ->AAAS    A2ML1 ->AADAC  
[21] A4GALT->A4GNT   A4GALT->AAAS    A4GALT->AACS    A4GALT->AADAC   A4GALT->AADACL2
[26] A4GNT ->AAAS    A4GNT ->AADAC   A4GNT ->AADACL2 AAAS  ->AACS    AAAS  ->AADAC  
[31] AAAS  ->AADACL2 AACS  ->AADACL2 AADAC ->AADACL2

the value of edges of my verteces are the values of correlation.

the dataframe that i wanted to fill is:

    gene tcga.controls tcga.tumors gtex.brca abs.cx.cy abs.cx.cz abs.cy.cz
1   A1BG            NA          NA        NA        NA        NA        NA
2   A1CF            NA          NA        NA        NA        NA        NA
3    A2M            NA          NA        NA        NA        NA        NA
4  A2ML1            NA          NA        NA        NA        NA        NA
5 A4GALT            NA          NA        NA        NA        NA        NA
6  A4GNT            NA          NA        NA        NA        NA        NA

I'm using a function named, as you see on the code above, coeff.clustering:

coeff.clustering <- function(gr, gene) {
  return( tryCatch( 
    expr = {
      ego.g <- induced_subgraph(gr, unlist(ego(gr, order=1, nodes = gene, mode = "all", mindist = 0)))
      round(transitivity(ego.g, vids = gene, type="weighted"), digits=3)
    }, 
    error = function(e) { NA },
    warning = function(e) { NA })
  )
}

The results with the for loop are:

      gene tcga.controls tcga.tumors gtex.brca abs.cx.cy abs.cx.cz abs.cy.cz
1     A1BG         0.800       0.833     0.600        NA        NA        NA
2     A1CF         0.762       1.000     0.833        NA        NA        NA
3      A2M         1.000       0.500     0.833        NA        NA        NA
4    A2ML1         0.800       1.000     1.000        NA        NA        NA
5   A4GALT         0.762          NA     0.833        NA        NA        NA
6    A4GNT         1.000       0.833        NA        NA        NA        NA
7     AAAS         0.667          NA     1.000        NA        NA        NA
8     AACS            NA          NA        NA        NA        NA        NA
9    AADAC         0.762          NA        NA        NA        NA        NA
10 AADACL2            NA          NA        NA        NA        NA        NA

... In my opinion filled correctly the values ... if i want to improve the speed of my code with the apply function created before, the results are...

      gene      tcga.controls tcga.tumors gtex.brca
 [1,] "A1BG"    "0.8"         "0.833"     "0.6"    
 [2,] "A1CF"    "0.762"       NA          "0.833"  
 [3,] "A2M"     "0.8"         "0.5"       "0.4"    
 [4,] "A2ML1"   "0.8"         "0.5"       "0.4"    
 [5,] "A4GALT"  "0.762"       "0.5"       "0.833"  
 [6,] "A4GNT"   "1"           "1"         "0.333"  
 [7,] "AAAS"    "0.667"       "0.5"       "0.4"    
 [8,] "AACS"    "0.9"         "NaN"       "0.5"    
 [9,] "AADAC"   "0.714"       "0.533"     "0.467"  
[10,] "AADACL2" "0.714"       "1"         "0.5"

That is strange... that apply returns to me a character variable, instead the for loop a list.. (that is correct) @nirgrahamuk

nirgrahamuk · April 5, 2021, 3:11pm

A matrix is different from a dataframe in that all entries must be of the same type. As you mix character with gene names with numeric values, a common type must obtain in the matrix, hence all values are cast to character. You could as an alternative have numeric matrix with character rownames.

Jeremy98-alt · April 5, 2021, 4:53pm

So, what I should do to have the same result using the for loop as you can see above? @nirgrahamuk

nirgrahamuk · April 5, 2021, 5:03pm

I don't see that you provided a reprex.
nt_graph ?

andyb · April 7, 2021, 9:56pm

@Jeremy98-alt . Take a look at the reprex package to get an idea of what a reproducible example is. Here is great vid from Jenny.

I have been through this same exchange with others before. It really is difficult for people to troubleshoot your problem if they can't reproduce it. After all they are helping you in their free time. The least we should be able to do is give an example they can run.

ps. it's a really cool package

system · April 28, 2021, 9:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.