Add a column from another dataset under multiple conditions

Hi, I would like to create a new column in my dataset X and fill this column with values from dataset Y. Two conditions have to be met (same region and same date). I try to do this by forloop:
for (i in length(X[,1])) { X$Newcolumn <- sapply(1:nrow(X), function(i) Y$Value[Y$region==X$region[i] & Y$date==X$date[i]]) }

But I got this error message: Error in Ops.factor(Y$date, X$date[i]) : level sets of factors are different date in both datasets has the same format ymd.

What does it mean? I don`t understand why should factors be the same? How can I fix it please? Thanks a lot!

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Thats complicated because on small datasets no problem occurs. I would like to know, what the error message means?

Of course, I can provide you reprex, but on such small dataset no problem occurs:


Y <- data.frame(
  date = c("2005-01-01", "2005-01-01", "2005-01-01", "2005-01-02", "2005-01-02"),
  region = c(0, 1, 2, 0, 1),
  Value = c(12, 12, 11, 5, -8) 
)



X <- data.frame(
  date = c("2005-01-01", "2005-01-01", "2005-01-01", "2005-01-01", "2005-01-01", "2005-01-02", "2005-01-02"),
  region = c(0, 0, 1, 2, 3, 0, 1)
 
  )
  



for (i in length(X[,1])) { X$Newcolumn <- sapply(1:nrow(X), function(i) Y$Value[Y$region==X$region[i] & Y$date==X$date[i]]) }


Well, you could programmatically generate a large dataset, but nevertheless, I don't see the need to reinvent the wheel, instead of a for-loop you could simply use dplyr::left_join() or even base::merge(), see this example.

library(dplyr)

Y <- data.frame(
    date = c("2005-01-01", "2005-01-01", "2005-01-01", "2005-01-02", "2005-01-02"),
    region = c(0, 1, 2, 0, 1),
    Value = c(12, 12, 11, 5, -8) 
)



X <- data.frame(
    date = c("2005-01-01", "2005-01-01", "2005-01-01", "2005-01-01", "2005-01-01", "2005-01-02", "2005-01-02"),
    region = c(0, 0, 1, 2, 3, 0, 1)
    
)


X %>% 
    left_join(Y, by = c("date", "region"))
#>         date region Value
#> 1 2005-01-01      0    12
#> 2 2005-01-01      0    12
#> 3 2005-01-01      1    12
#> 4 2005-01-01      2    11
#> 5 2005-01-01      3    NA
#> 6 2005-01-02      0     5
#> 7 2005-01-02      1    -8

Created on 2020-01-18 by the reprex package (v0.3.0.9000)

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.