Dealing with large datasets

I am trying to perform correlation on a large dataset of over 100,000 columns and subseting and parallelization is not working, is there any other method that i can try?

Are these columns all numeric variables? About how many rows? Are you looking to do pairwise cor(d[,1],d[,2]) \dots cor(d[,n-1],cor[,n]?

How many paring combinations are there over 100,000 possible pair inputs...

choose(100000,2)
# 4999950000

this is 4,999,950,000 or 5billion.
Its asking to do a lot of work

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.