vlookup not working? Weird counts?

Sorry if my question is silly but I get ridiculous results adding info from the "info" df to my "source" df.
All I need is equivalent of vlookup so finding and adding OrgCode and OrgName based on OrgID.

source <- data.frame(
  stringsAsFactors = FALSE,
  URN = c("aaa","bbb","ccc",
          "ddd","eee","fff","ggg"),
  OrgID = c(32047,32047,32047,
            32430,32430,336954,12454),
  QF16 = c("Yes","No","No","Yes",
           "Yes","No","No"),
  QF4 = c(100, 100, 100, 100, 100, 0, 50)
)

info <- data.frame(
  stringsAsFactors = FALSE,
  OrgID = c(32047,
            32322,31818,32430),
  OrgCode = c("001702E",
              "001192Q",
              "000985Q","001673B"),
  OrgName = c("Name1",
              "Name2","Name3",
              "Name4")
)

result <- left_join(source, info, by="OrgID")

Everything looks fine on the example above but when I apply that to my real two databases I have following counts
source = 98309 obs. of 67 variables
info = 49104 obs. of 16 variables
result = 5878665 obs. of 82 variables

What am I doing wrong? When do you think this may happen?

Hi @Slavek , it seems that the likely issue is that there are duplicate OrgID in your info table. To confirm you could run: dplry::count(info, OrgID, sort = TRUE). If the first rows are all 1's, then I'm wrong, but if you have any higher than that, then it confirms duplicates, which hence produces the additional rows in your merge.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.