I am trying to perform correlation on a large dataset of over 100,000 columns and subseting and parallelization is not working, is there any other method that i can try?
Are these columns all numeric variables? About how many rows? Are you looking to do pairwise cor(d[,1],d[,2])
\dots cor(d[,n-1],cor[,n]
?
How many paring combinations are there over 100,000 possible pair inputs...
choose(100000,2)
# 4999950000
this is 4,999,950,000 or 5billion.
Its asking to do a lot of work
1 Like
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.