How to replace column values from another dataframe by common ID

Ellenz · September 27, 2019, 5:04am

I have two dataframes, I want to replace the values in df1 using values in df2, by the common id. Is this the correct way? The actual dataset is big, so when I did it like this and checked the result, the values in df1 are not the same as in df2. So I'm wondering how this works. Thanks for your help.

library(reprex)

df1= data.frame(id=c('1_1','1_2','1_3','2_1'),
max=c(3,5,5,6),
min=c(1,2,3,.1),
value=rnorm(4))
df2= data.frame(id=c('1_1','1_2','1_3','1_4','2_1','2_2'),
value=rnorm(6)*10-3)

df2_sub = filter(df2, id %in% df1$id)
df1$value[match(df1$id,df2_sub$id)] <- df2_sub$value

Ellenz · September 27, 2019, 5:13am

In my file, the row orders are different in df1 and df2, so the resulting value column in df1 is not the same as the value column in df2. How to do it correctly?

FJCC · September 27, 2019, 6:21pm

If df2 will always have rows with all the ids of df1, you can do it as in the following code. If df1 will sometimes have ids that are not in df2, then you will have to do an extra step to keep the df1 values in those rows. I included extra steps for printing out intermediate results.

set.seed(9334)
df1= data.frame(id=c('1_1','1_2','1_3','2_1'),
                max=c(3,5,5,6),
                min=c(1,2,3,.1),
                value=rnorm(4))
df2= data.frame(id=c('1_1','1_2','1_3','1_4','2_1','2_2'),
                value=rnorm(6)*10-3)
df1
#>    id max min      value
#> 1 1_1   3 1.0 -0.5818039
#> 2 1_2   5 2.0 -0.3141435
#> 3 1_3   5 3.0  0.7452423
#> 4 2_1   6 0.1 -1.1327702

df2
#>    id       value
#> 1 1_1 -11.7366967
#> 2 1_2   0.3124703
#> 3 1_3  -6.2261151
#> 4 1_4  -2.5779802
#> 5 2_1  -1.4631567
#> 6 2_2  -4.2419756
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df1 <- left_join(df1, df2, by = "id")
#> Warning: Column `id` joining factors with different levels, coercing to
#> character vector
df1
#>    id max min    value.x     value.y
#> 1 1_1   3 1.0 -0.5818039 -11.7366967
#> 2 1_2   5 2.0 -0.3141435   0.3124703
#> 3 1_3   5 3.0  0.7452423  -6.2261151
#> 4 2_1   6 0.1 -1.1327702  -1.4631567
df1 <- select(df1, id, max, min, value = value.y)
df1
#>    id max min       value
#> 1 1_1   3 1.0 -11.7366967
#> 2 1_2   5 2.0   0.3124703
#> 3 1_3   5 3.0  -6.2261151
#> 4 2_1   6 0.1  -1.4631567

^{Created on 2019-09-27 by the reprex package (v0.2.1)}

leungi · September 27, 2019, 7:20pm

In case you're used to Excel (i.e., vlookup()), this an alternative.

This assumes the id columns are unique - each exists once in df1 and df2.

# https://www.daeconomist.com/post/2018-04-08-an-r-vlookup-not-so-silly-idea/
VLookup <- function(this, data, key, value) {
  m <- match(this, data[[key]])
  data[[value]][m]
}


df1 <- data.frame(
  id = c("1_1", "1_2", "1_3", "2_1"),
  max = c(3, 5, 5, 6),
  min = c(1, 2, 3, .1),
  value = rnorm(4)
)
df2 <- data.frame(
  id = c("1_1", "1_2", "1_3", "1_4", "2_1", "2_2"),
  value = rnorm(6) * 10 - 3
)

df1
#>    id max min      value
#> 1 1_1   3 1.0 -0.3543082
#> 2 1_2   5 2.0 -1.0183646
#> 3 1_3   5 3.0  0.2087501
#> 4 2_1   6 0.1  0.6011677

df2
#>    id      value
#> 1 1_1  1.7397978
#> 2 1_2  2.3561047
#> 3 1_3 -0.0430340
#> 4 1_4  8.5344310
#> 5 2_1  0.1335944
#> 6 2_2 -8.2167051

df1$value_from_2 <- VLookup(df1$id, df2, "id", "value")

df1
#>    id max min      value value_from_2
#> 1 1_1   3 1.0 -0.3543082    1.7397978
#> 2 1_2   5 2.0 -1.0183646    2.3561047
#> 3 1_3   5 3.0  0.2087501   -0.0430340
#> 4 2_1   6 0.1  0.6011677    0.1335944

^{Created on 2019-09-27 by the reprex package (v0.2.1)}

system · October 18, 2019, 7:20pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.