# Loop with correlation

I have this matrix and following code:

``````# Remove NA-observations from dataset "lus" and removing row "totalsum"
lus2 <- na.omit(lus)

lus3 <- lus2[-c(10),]

# The problem now is that "laksepris" has months in the columns, while "lus" has months in the rows

laksepris2 <- laksepris %>%

test <- rbind(setDT(lus3), setDT(laksepris2), fill=TRUE)

test[10,1] <- "Pris pr.kilo"

test_round <- test %>%
mutate_if(is.numeric, round, digits = 2)

#-------------------------------------------

rearranget_lus <- as.data.frame(t(test_round))

rearranget_lus

# Removing first row, and renaming the columns:

lus_1 <- rearranget_lus[-c(1),]

names (lus_1)  <- "Finmark"
names (lus_1)  <- "Troms"
names (lus_1)  <- "Nordland"
names (lus_1)  <- "Nord-Trondelag"
names (lus_1)  <- "Sor-Trondelag"
names (lus_1)  <- "More og Romsdal"
names (lus_1)  <- "Sogn og Fjordane"
names (lus_1)  <- "Hordaland"
names (lus_1)  <- "Rogaland og Agder"
names (lus_1)  <- "Pris pr.kilo"
``````

I just started using R, and I am therefore wondering how I can run a correlation between the values in "pris pr.kilo" against the values in column "Finmark". Following I would also like to loop this, so that the loop runs the correlation between "pris.pr.kilo" and all the other columns as well.
Does anyone have a suggestion to how this is done?

``````library(tidyverse)

df <- data_frame(A = 1:12,
B = rev(A),
C = 2 * A,
D = 2 * B,
Y = 1:12)
``````
``````> df
# A tibble: 12 x 5
A     B     C     D     Y
<int> <int> <dbl> <dbl> <int>
1     1    12     2    24     1
2     2    11     4    22     2
3     3    10     6    20     3
4     4     9     8    18     4
5     5     8    10    16     5
6     6     7    12    14     6
7     7     6    14    12     7
8     8     5    16    10     8
9     9     4    18     8     9
10    10     3    20     6    10
11    11     2    22     4    11
12    12     1    24     2    12
``````

cor(A,Y) and cor(C,Y) should be 1. cor(B,Y) and cor(D,Y) should be -1.

I would break the dataframe into two pieces:

• the portion you want to loop over (X)
• and the portion that should stay constant (Y)
``````X <- select(df, -Y)
Y <- select(df, Y)
``````

Now I can use `map_df` from the `purrr` package to feed each column of `X` to `cor` while setting the `y` parameter to `Y`. The output will be a dataframe.

``````library(purrr)

result <- map_df(X, cor, y = Y)
``````
``````> result
# A tibble: 1 x 4
A     B     C     D
<dbl> <dbl> <dbl> <dbl>
1     1    -1     1    -1
``````
2 Likes

You can also do this with the `corrr` package. Using the dataset given by @Galangjs it would look like this:

``````library(tidyverse)
library(corrr)

df <- data_frame(A = 1:12,
B = rev(A),
C = 2 * A,
D = 2 * B,
Y = 1:12)

#default output of correlate function
corrr_result <- correlate(df)
#>
#> Correlation method: 'pearson'
#> Missing treated using: 'pairwise.complete.obs'

corrr_result
#> # A tibble: 5 x 6
#>   rowname     A     B     C     D     Y
#>   <chr>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A          NA    -1     1    -1     1
#> 2 B          -1    NA    -1     1    -1
#> 3 C           1    -1    NA    -1     1
#> 4 D          -1     1    -1    NA    -1
#> 5 Y           1    -1     1    -1    NA

# look only at the desired comparison
corrr_result %>%
filter(rowname == "Y") %>%
select(-Y)
#> # A tibble: 1 x 5
#>   rowname     A     B     C     D
#>   <chr>   <dbl> <dbl> <dbl> <dbl>
#> 1 Y           1    -1     1    -1
``````

Created on 2018-08-29 by the reprex package (v0.2.0).

3 Likes