# Need help with correlation coeffecient

Hey everyone! I am new to rstudio and would like some help figuring out how to solve the correlation coeffecient for 2 variables given (weight and exercise_minutes). I have attached what I have done so far below.
The correlation coeffecient i am getting is incorect it should be 0.3763526 but i am getting 464394961 (a much higher number than expected) not sure why...any help is greatly appreciated. Also, the formula for determine the coeffecient correlation is written as a comment.

#Section 2 - 2.4

df\$new_variable1 <- (df\$weight - mean(df\$weight))

#Section 2 - 2.5

df\$new_variable2 <- (df\$excercise_minutes - mean(excercise_minutes))

#Section 2 - 2.6

df\$product_2_variables <- df\$new_variable1 * df\$new_variable2

#Section 2 - 2.7

# r = (xi}-{x})({yi}-{y}))/(N-1)(SxSy)

(sum(product_2_variables))/(20-1)*sd(weight)*sd(excercise_minutes)

Do you want the standard deviations in the numerator or denominator of the calculation?

In the deonminator of the calculation. I have attached a pic of the formula.

``````#Here is your formula
(sum(product_2_variables))/(20-1)*sd(weight)*sd(excercise_minutes)

#Here is a version with simple numbers 190/19*5*2
sum(c(100,90))/(20-1)*5*2
#>  100

#Here is a version with simple numbers 190/(19*5*2) and parentheses
#forcing 5 and 2 into the denominator
sum(c(100,90))/((20-1)*5*2)
#>  1
``````

Created on 2022-11-14 with reprex v2.0.2

Where did 100,90 come from? Also, why did u do (20-1)* 5*2

I used numbers in my example that would facilitate mental calculations. I kept the (20-1) and sum() from your formula to make the comparisons between my formulas and yours simple.

Looking at my two formulas, that give very different answers, are you still convinced that sd(weight) and sd(excercise_minutes) are in the denominator of your formula?

Oh I see.

I changed my code to this:

#Section 2 - 2.7

# r = (xi}-{x})({yi}-{y}))/(N-1)(SxSy)

(sum(product_2_variables))/((20-1)*sd(weight)*sd(excercise_minutes))

So i have included the missing bracket to place it in the denominator...not sure if this is correct. The coeffecient im getting is 9.3864

But when i use the cor(weight,excercise_minutes) function to determine the correlation i get 0.3763. Can you explain why?

Is t possible you need another set of parentheses in the denominator:

(sum(product_2_variables))/( (20-1)*sd(weight)*sd(excercise_minutes) )

Also, is there a reason you don't want to use

cor(df\$weight, df\$excercise_minutes)

Yes, I added another set of parentheses and the correlation coeffecient i am getting is now 9.3864 instead of the original number which was 464394961.

Also, I tried the code u wrote: cor(df\$weight,df\$exercise_minutes) and I still get 0.3763526 regardless. I am just concerned why the coeffecients aren't matching when i use the equation and the built in r function cor().

I now see that you mixed saving results in the data frame with later referring to the column name with no data frame. For example, in

``````(sum(product_2_variables))/(20-1)*sd(weight)*sd(excercise_minutes)
``````

you have not previously defined a variable `weight`. You have a column named weight but to refer to that, use df\$weight.

``````#invent some data
set.seed(123)
df <- data.frame(weight = rnorm(2), excercise_minutes = rnorm(20))

#Your code, revised. Notice several additions of df\$ in front of column names
df\$new_variable1 <- (df\$weight - mean(df\$weight))

df\$new_variable2 <- (df\$excercise_minutes - mean(df\$excercise_minutes))

df\$product_2_variables <- df\$new_variable1 * df\$new_variable2

(sum(df\$product_2_variables))/((20-1)*sd(df\$weight)*sd(df\$excercise_minutes))
#>  -0.1535665

cor(df\$weight,df\$excercise_minutes)
#>  -0.1535665
``````

Created on 2022-11-14 with reprex v2.0.2

Thank you! Its working now. I am now getting 0.3763526 for both functions.

Question. How would I describe the direction and effect size of the correlation coefficient. Would it just be positive relationship?

Also, how would u descibe these histogram shapes? For weight, I said skewed left, for exercise_minutes I wrote bell curve and for height, I wrote bimodal. Not sure if that's correct. Any help is greatly appreciated.