can anyone help me with correlation in R? I tried use cor.test(x,y) that I found on google. I need to compute correlation of returns from the historical stock price that I had. I'm not really understand what I should change and add to make it work. Thank you in advance.
I used to grumble that the help pages in R
needed their own help page. I was embarrassed to discover
help(help)
which is worthwhile reviewing.
The real effort, though, is in learning to think of R
as school algebra: f(x) = y.
The three objects (in R
everything is an object) are
x what is at hand
y what is desired
f convert x to y
Keep that in mind while looking at help(cor.test)
because it's key to understanding the arguments that f here expects, which may not be the same as how your data is presently stored.
Here's the function signature
cor.test(x, y,
alternative = c("two.sided", "less", "greater"),
method = c("pearson", "kendall", "spearman"),
exact = NULL, conf.level = 0.95, continuity = FALSE, ...)
Everything is a default except for x
and y
. (The mysterious \dots at the end indicates that the function is open to receiving other objects; you usually don't need to worry about those.)
Under Arguments
x, y numeric vectors of data values. x and y must have the same length.
Let's say you have a table of stock prices for some basket at two given dates and a difference. (This neglects dividends, of course, and isn't a rate of return, but that's separate; the function doesn't care what the numbers mean.)
DF <- structure(list(open = c(
21L, 63L, 39L, 57L, 34L, 33L, 52L, 26L,
22L, 46L, 92L, 16L, 56L, 31L, 81L, 70L, 14L, 36L, 59L, 1L, 55L,
92L, 15L, 86L, 2L
), close = c(
62L, 47L, 5L, 71L, 91L, 61L, 46L,
70L, 40L, 87L, 45L, 46L, 80L, 22L, 68L, 25L, 95L, 24L, 23L, 29L,
4L, 45L, 98L, 72L, 82L
), return = c(
41L, -16L, -34L, 14L, 57L,
28L, -6L, 44L, 18L, 41L, -47L, 30L, 24L, -9L, -13L, -45L, 81L,
-12L, -36L, 28L, -51L, -47L, 83L, -14L, 80L
)), class = "data.frame", row.names = c(
NA,
-25L
))
cor.test(DF$open, DF$return)
#>
#> Pearson's product-moment correlation
#>
#> data: DF$open and DF$return
#> t = -5.5544, df = 23, p-value = 1.192e-05
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#> -0.8868103 -0.5161355
#> sample estimates:
#> cor
#> -0.7569028
Created on 2021-01-03 by the reprex package (v0.3.0.9001)
In this toy example, constructed from random integers under 70, we set x to the first column of DF and y to the last. There is a marked negative correlation.
You should take a careful look at the section Details, particularly on what the return value of f, which is y in the problem set up represents. Make sure to understand what association means in this context.
thank you for your help. i understand now.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.