Hi guys,
I'm searching to make fast the computation of spearman correlation between 15.000 genes of my dataset, but i don't understand if put cor(brca_mutations, method="spearman") into a variable return me the correct calculus.. this is my dataset:
# spearman cor for iris for only Petal.Width and Petal.Length
cor(iris$Petal.Width,iris$Petal.Length,method="spearman")
#find all combinations of the numeric variables
names_to_do <- names(select_if(iris,is.numeric))
(combinations_to_do <- t(combn(x = names_to_do , m = 2)))
library(slider)
#havign found all combinations, calculate and aggregate their correlations
slide_dfr(combinations_to_do,
~data.frame(var1 = .[1],
var2 = .[2],
cor_spear = cor(iris[[.[1]]],iris[[.[2]]],method="spearman")))
i don't understand this instruction:
cor_spear = cor(iris[[.[1]]],iris[[.[2]]]
in my case i've brca_expressions, i should catch every gene that is present in every line of combinations_to_do, for example:
there is a line which are present these genes A1B3 and X3423, i want to catch line correspond in brca_expressions and make the correlation between this set of data.
yes @nirgrahamuk , sorry ahah brca_expression have 400 milion of combinations because that dataset has around 20.000 genes whereas the brca_mutation has around 15,000 genes ahah however how to do change your code respect my request?
I dont understand your request.
Can you provide some 'small' example data, to demonstrate your issue.
you can take your actual data and use dplyr verbs like filter(), select() , slice() to reduce it in various ways to construct a transferable example dataset that you can then communicate with dput(), and then phrase your question in relation to that.... please
my main problem is that for 400 milion of rows i can process in one hour 100 thousand of rows ahah so you understand that is too even little, so i want to understand if i can use your code to increase the speed of the process!
I want to catch for each line of the matrix combination that i created with your code, the corrisponding gene rows in brca_expression.
seems unlikely, maybe theres a little inefficiency here from using data.frames when matrices would do, etc.etc. but im skeptical that we can produce R code thats 10 times faster. and 10 times faster would be what, 16 days runtime, compared to 166 ....