Merge two dataframes into one with relation

dkraner · September 15, 2018, 1:35pm

i have two dataframes, the first shows the returns

year <- c(2013:2017)
a <- c(0.0125, 0.0045, -0.0102, 0.0023, -0.0011)
b <- c(0.0267, 0.0005, 0.0922, -0.0510, 0.0389)
c <- c(-0.0045, 0.0023, 0.0426, 0.0201, -0.0291)
d <- c(0.0119, -0.0131, 0.0189, 0.0192, 0.0111)
e <- c(-0.0279, 0.0255, 0.0059, 0.0059, 0.0223)
f <- c(0.0479, 0.0345, -0.0145, -0.0001, 0.0000)
g <- c(0.0352, 0.0165, 0.0004, 0.0055, -0.0050)

return.df <- data.frame(year,a,b,c,d,e,f,g)

and the second daraframe looks like this and shows the character

year <- c(2013:2017)
data1 <- c("c","a","g","d","e")
data2 <- c("b","f","a","e","d")
data3 <- c("f","b","c","d","e")
data4 <- c("e","a","d","b","c")
data5 <- c("d","e","c","b","a")

character.df <- data.frame(year,data1,data2,data3,data4,data5)

now i want to show in a new dataframe the returns, they're related to the character.

for example, in the first row the year 2013 it should show instead of the character "c" the return from character "c" in 2013, in this case -0.0045 and so on. the new dataframe should be

2013 -0.0045 0.0267 0.0479 -0.0279 0.0119
2014 0.0045 0.0345 0.0005 0.0045 0.0255
2015 0.0004 -0.0102... and so on

hope you know what i mean. i tried it with "merge" but could not find the solution..

thank you very much

mfherman · September 15, 2018, 2:10pm

Joins like this are easier if the data frames you are working with are in tidy or long form rather than in wide form.

Here is an approach that reshapes each of your data frames and then uses left_join() to match the year/letter pairs, replace them with the values, and then spread it back into wide form.

library(tidyverse)

year <- c(2013:2017)
a <- c(0.0125, 0.0045, -0.0102, 0.0023, -0.0011)
b <- c(0.0267, 0.0005, 0.0922, -0.0510, 0.0389)
c <- c(-0.0045, 0.0023, 0.0426, 0.0201, -0.0291)
d <- c(0.0119, -0.0131, 0.0189, 0.0192, 0.0111)
e <- c(-0.0279, 0.0255, 0.0059, 0.0059, 0.0223)
f <- c(0.0479, 0.0345, -0.0145, -0.0001, 0.0000)
g <- c(0.0352, 0.0165, 0.0004, 0.0055, -0.0050)

return.df <- data.frame(year,a,b,c,d,e,f,g)

year <- c(2013:2017)
data1 <- c("c","a","g","d","e")
data2 <- c("b","f","a","e","d")
data3 <- c("f","b","c","d","e")
data4 <- c("e","a","d","b","c")
data5 <- c("d","e","c","b","a")

character.df <- data.frame(year,data1,data2,data3,data4,data5)

return_tidy <- return.df %>%
  gather("letter", "value", -year)

character_tidy <- character.df %>%
  gather("col", "letter", -year)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped

left_join(character_tidy, return_tidy, by = c("letter", "year")) %>%
  select(-letter) %>% 
  spread(col, value)
#>   year   data1   data2  data3   data4   data5
#> 1 2013 -0.0045  0.0267 0.0479 -0.0279  0.0119
#> 2 2014  0.0045  0.0345 0.0005  0.0045  0.0255
#> 3 2015  0.0004 -0.0102 0.0426  0.0189  0.0426
#> 4 2016  0.0192  0.0059 0.0192 -0.0510 -0.0510
#> 5 2017  0.0223  0.0111 0.0223 -0.0291 -0.0011

Created on 2018-09-15 by the reprex package (v0.2.0).

dkraner · September 16, 2018, 5:19am

great that works perfect, thank you very much for the reply