Sorry if my question is silly but I get ridiculous results adding info from the "info" df to my "source" df.
All I need is equivalent of vlookup so finding and adding OrgCode and OrgName based on OrgID.
source <- data.frame(
stringsAsFactors = FALSE,
URN = c("aaa","bbb","ccc",
"ddd","eee","fff","ggg"),
OrgID = c(32047,32047,32047,
32430,32430,336954,12454),
QF16 = c("Yes","No","No","Yes",
"Yes","No","No"),
QF4 = c(100, 100, 100, 100, 100, 0, 50)
)
info <- data.frame(
stringsAsFactors = FALSE,
OrgID = c(32047,
32322,31818,32430),
OrgCode = c("001702E",
"001192Q",
"000985Q","001673B"),
OrgName = c("Name1",
"Name2","Name3",
"Name4")
)
result <- left_join(source, info, by="OrgID")
Everything looks fine on the example above but when I apply that to my real two databases I have following counts
source = 98309 obs. of 67 variables
info = 49104 obs. of 16 variables
result = 5878665 obs. of 82 variables
What am I doing wrong? When do you think this may happen?