wide to long data frame when more then one x-variable

Hello,

I am relatively new to R and I am struggling with the following problem:
I have a data frame that consists of two sets of x and y variables i.e. x1, y1 and x2, y2.
I would like to use gather or any other function that could help to create a new data frame with
merged x1 and x2 variables into a single x variable. I have tried to use gather but can see only examples applied to data frames with a single x variable.

Bellow is my reprex with the desired output as well.

Your help is really appreciated. Thank you!

library(tidyr)

(df <- data.frame(x1=c(1,2,3),y1=c(10,14,12), x2=c(2,3,4),y2=c(12,13,11)))
#>   x1 y1 x2 y2
#> 1  1 10  2 12
#> 2  2 14  3 13
#> 3  3 12  4 11

(df_tall <- gather(df,"z", "y", y1:y2))
#>   x1  z  y
#> 1  1 y1 10
#> 2  2 y1 14
#> 3  3 y1 12
#> 4  1 x2  2
#> 5  2 x2  3
#> 6  3 x2  4
#> 7  1 y2 12
#> 8  2 y2 13
#> 9  3 y2 11

# desired output
print('Desired output')
#> [1] "Desired output"
(df_mod <- data.frame(x=c(1,2,2,3,3,4),y=c(10,14,12,12,13,11), z=c("y1", "y1", "y2", "y1", "y2", "y2")))
#>   x  y  z
#> 1 1 10 y1
#> 2 2 14 y1
#> 3 2 12 y2
#> 4 3 12 y1
#> 5 3 13 y2
#> 6 4 11 y2

Created on 2020-04-03 by the reprex package (v0.2.1)

There is probably some elegant way to do this but I would do it like this.

library(dplyr)

(df <- data.frame(x1=c(1,2,3),y1=c(10,14,12), x2=c(2,3,4),y2=c(12,13,11)))
#>   x1 y1 x2 y2
#> 1  1 10  2 12
#> 2  2 14  3 13
#> 3  3 12  4 11
df1 <- df %>% select(x = x1, y = y1) %>% mutate(z = "y1")
df2 <- df %>% select(x = x2, y = y2) %>% mutate(z = "y2")
rbind(df1, df2)
#>   x  y  z
#> 1 1 10 y1
#> 2 2 14 y1
#> 3 3 12 y1
#> 4 2 12 y2
#> 5 3 13 y2
#> 6 4 11 y2

Created on 2020-04-05 by the reprex package (v0.3.0)

1 Like

Hi @pre1ryp: I was just wondering whether you just wanted the points, as you show in your desired output, or whether you might also want to know which points were paired in your original table.

Hi @pre1ryp, welcome to RStudio Community.

It looks like you are trying to combine each pair of x and y variables. I don't think gather() is the right tool here since there is no key column to pivot around. Instead, you could do something like this:

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3
library(tidyr)

(df <- data.frame(x1 = c(1, 2, 3), y1 = c(10, 14, 12),
                  x2 = c(2, 3, 4), y2 = c(12, 13, 11)))
#>   x1 y1 x2 y2
#> 1  1 10  2 12
#> 2  2 14  3 13
#> 3  3 12  4 11

df %>% 
  unite("x", contains("x")) %>% 
  unite("y", contains("y")) %>% 
  separate_rows(everything(), sep = "_")
#>   x  y
#> 1 1 10
#> 2 2 12
#> 3 2 14
#> 4 3 13
#> 5 3 12
#> 6 4 11

Created on 2020-04-05 by the reprex package (v0.3.0)

Note: The solution above doesn't create the z column. I'm not sure if you need it in your output or if it was for reference only. If you do, then the solution given by FJCC above is probably the way to go.

To exemplify this part and to add a reference of how inelegant and inefficient could be this solution

library(tidyverse)

df <- data.frame(x1=c(1,2,3),y1=c(10,14,12), x2=c(2,3,4),y2=c(12,13,11))

df %>% 
    gather(xy, value) %>% 
    mutate(xy2 = substr(xy, 1, 1)) %>% 
    group_by(xy2) %>% 
    mutate(key = row_number()) %>% 
    pivot_wider(id_cols = key, names_from = xy2, values_from = c("value", "xy")) %>% 
    select(x = value_x, y = value_y, z = xy_y) %>% 
    arrange(x)
#> # A tibble: 6 x 3
#>       x     y z    
#>   <dbl> <dbl> <chr>
#> 1     1    10 y1   
#> 2     2    14 y1   
#> 3     2    12 y2   
#> 4     3    12 y1   
#> 5     3    13 y2   
#> 6     4    11 y2
1 Like

Hi @dromano, I just need the z-variable to indicate which pair comes from which set of x-y variables so that I can then use the new data frame in a ggplot and overlay the different pairs. Hope this helps.

@FJCC - Thank you, very simple but works great! Just for info another method that I have been shown to work for the same task:

library(data.table)
melt(setDT(df), measure = patterns('^x', '^y'),
        variable.name = 'z', value.name = c('x', 'y'))

here is yet another approach

df %>%
  pivot_longer(
    col = c(y1, y2),
    names_to = "z",
    values_to = "y"
  ) %>%
  mutate(x = case_when(z == "y1" ~ x1, TRUE ~ x2)) %>%
  select(x, y, z)
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.