How to alignment large data set ?

Hi!
After correlation analysis, I obtained wide data with multiple sets of columns. However, i need to convert to long format of this data set for the next analysis. I think "pivot_longer" is the way for handle this, but either I haven't found the right example or I haven't understood the examples.

Since my data is really large, i made simple example set.

library(tidyverse)
library(readxl)
library(reprex)
corr <- read_excel("raw_data/corr.xlsx")
#> Error: path does not exist: 'raw_data/corr.xlsx'

tibble::tribble(
~Name, ~a, ~b, ~c, ~d, ~e, ~f, ~g,
"a", 1, 0.2, 0.4, 0.7, 0.6, 0.2, 0.6,
"b", 0.2, 1, 0.6, 0.9, 0.7, 0.99, 0.7,
"c", 0.4, 0.6, 1, 0.8, 0.6, 0.95, 0.8,
"d", 0.7, 0.9, 0.8, 1, 0.6, 0.5, 0.4,
"e", 0.6, 0.7, 0.6, 0.6, 1, 0.6, 0.5,
"f", 0.2, 0.99, 0.95, 0.5, 0.6, 1, 0.1,
"g", 0.6, 0.7, 0.8, 0.4, 0.5, 0.1, 1
)

alignment <- read_excel("raw_data/alignment.xlsx")
#> Error: path does not exist: 'raw_data/alignment.xlsx'
tibble::tribble(
~feature1, ~feature2, ~r,
"a", "b", 0.2,
"a", "c", 0.4,
"a", "d", 0.7,
"a", "e", 0.6,
"a", "f", 0.2,
"a", "g", 0.6,
"b", "c", 0.6,
"b", "d", 0.9,
"b", "e", 0.7,
"b", "f", 0.99,
"b", "g", 0.7,
"c", "d", 0.8,
"c", "e", 0.6,
"c", "f", 0.95,
"c", "g", 0.8,
"d", "e", 0.6,
"d", "f", 0.5,
"d", "g", 0.4,
"e", "f", 0.6,
"e", "g", 0.5,
"f", "g", 0.1
)

How to make three column to align this data set such as "feature 1, feature2, and r value".

Please, if you find any mistake here, let me know experts.

Is this what you're after?

library(tidyverse)

df <- tibble::tribble(
  ~Name, ~a, ~b, ~c, ~d, ~e, ~f, ~g,
  "a", 1, 0.2, 0.4, 0.7, 0.6, 0.2, 0.6,
  "b", 0.2, 1, 0.6, 0.9, 0.7, 0.99, 0.7,
  "c", 0.4, 0.6, 1, 0.8, 0.6, 0.95, 0.8,
  "d", 0.7, 0.9, 0.8, 1, 0.6, 0.5, 0.4,
  "e", 0.6, 0.7, 0.6, 0.6, 1, 0.6, 0.5,
  "f", 0.2, 0.99, 0.95, 0.5, 0.6, 1, 0.1,
  "g", 0.6, 0.7, 0.8, 0.4, 0.5, 0.1, 1
)


df %>% 
  pivot_longer(-Name, names_to = "feature2", values_to = "r") %>% 
  rename(feature1 = Name)

# A tibble: 49 × 3
   feature1 feature2     r
   <chr>    <chr>    <dbl>
 1 a        a          1  
 2 a        b          0.2
 3 a        c          0.4
 4 a        d          0.7
 5 a        e          0.6
 6 a        f          0.2
 7 a        g          0.6
 8 b        a          0.2
 9 b        b          1  
10 b        c          0.6
# … with 39 more rows
1 Like

Exactly!
Thank you for your help Williaml!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.