can anyone help me with correlation in R? I tried use cor.test(x,y) that I found on google. I need to compute correlation of returns from the historical stock price that I had. I'm not really understand what I should change and add to make it work. Thank you in advance.
I used to grumble that the help pages in
R needed their own help page. I was embarrassed to discover
which is worthwhile reviewing.
The real effort, though, is in learning to think of
R as school algebra: f(x) = y.
The three objects (in
R everything is an object) are
x what is at hand
y what is desired
f convert x to y
Keep that in mind while looking at
help(cor.test) because it's key to understanding the arguments that f here expects, which may not be the same as how your data is presently stored.
Here's the function signature
alternative = c("two.sided", "less", "greater"),
method = c("pearson", "kendall", "spearman"),
exact = NULL, conf.level = 0.95, continuity = FALSE, ...)
Everything is a default except for
y. (The mysterious \dots at the end indicates that the function is open to receiving other objects; you usually don't need to worry about those.)
x, y numeric vectors of data values. x and y must have the same length.
Let's say you have a table of stock prices for some basket at two given dates and a difference. (This neglects dividends, of course, and isn't a rate of return, but that's separate; the function doesn't care what the numbers mean.)
DF <- structure(list(open = c( 21L, 63L, 39L, 57L, 34L, 33L, 52L, 26L, 22L, 46L, 92L, 16L, 56L, 31L, 81L, 70L, 14L, 36L, 59L, 1L, 55L, 92L, 15L, 86L, 2L ), close = c( 62L, 47L, 5L, 71L, 91L, 61L, 46L, 70L, 40L, 87L, 45L, 46L, 80L, 22L, 68L, 25L, 95L, 24L, 23L, 29L, 4L, 45L, 98L, 72L, 82L ), return = c( 41L, -16L, -34L, 14L, 57L, 28L, -6L, 44L, 18L, 41L, -47L, 30L, 24L, -9L, -13L, -45L, 81L, -12L, -36L, 28L, -51L, -47L, 83L, -14L, 80L )), class = "data.frame", row.names = c( NA, -25L )) cor.test(DF$open, DF$return) #> #> Pearson's product-moment correlation #> #> data: DF$open and DF$return #> t = -5.5544, df = 23, p-value = 1.192e-05 #> alternative hypothesis: true correlation is not equal to 0 #> 95 percent confidence interval: #> -0.8868103 -0.5161355 #> sample estimates: #> cor #> -0.7569028
Created on 2021-01-03 by the reprex package (v0.3.0.9001)
In this toy example, constructed from random integers under 70, we set x to the first column of DF and y to the last. There is a marked negative correlation.
You should take a careful look at the section Details, particularly on what the return value of f, which is y in the problem set up represents. Make sure to understand what association means in this context.
thank you for your help. i understand now.