Join 2 dataframes but shows NA

I have 2 datasets, one is city, another one is county.
I join this 2 dataframes but the county dataframe only shows on 1st row on each of county. Others are all NA.


I tried natural_join, it didn't work.
1st data:

structure(c("0101", "0101", "0101", "0101", "0101", "0101", "0101", 
"0101", "0101", "0101", "0101", "0101", "0101", "0101", "0101", 
"0101", "0101", "0101", "0101", "0101", "0101", "0101", "0101", 
"0101", "0101", "0101", "0101", "0101", "0101", "0101", "0101", 
"0101", "0101", "0101", "0101", "0101", "0101", "0101", "0101", 
"0101", "0101", "0101", "0101", "0101", "0101", "0101", "0101", 
"0101", "0101", "0101", "0101", "0101", "0102", "0102", "0102", 
"0102", "0102", "0102", "0102", "0102", "0102", "0102", "0102", 
"0102", "0102", "0102", "0102", "0102", "0102", "0102", "0102", 
"0102", "0102", "0102", "0102", "0102", "0102", "0102", "0102", 
"0102", "0102", "0102", "0102", "0102", "0102", "0102", "0102", 
"0102", "0102", "0102", "0102", "0102", "0102", "0102", "0102", 
"0102", "0102", "0102", "0102", "0102", "HATTON", "HATTON", "HATTON", 
"HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", 
"HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", 
"HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", 
"HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", 
"HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", 
"HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", 
"HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", "HATTON", 
"LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", 
"LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", 
"LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", 
"LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", 
"LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", 
"LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND"
), .Dim = c(100L, 2L))

2nd Data

structure(c("Adams", "Adams", "Adams", "Adams", "Adams", "Asotin", 
"Asotin", "Benton", "Benton", "Benton", "Benton", "Benton", "Chelan", 
"Chelan", "Chelan", "Chelan", "Chelan", "Clallam", "Clallam", 
"Clallam", "Clark", "Clark", "Clark", "Clark", "Clark", "Clark", 
"Clark", "Clark", "Columbia", "Columbia", "Cowlitz", "Cowlitz", 
"Cowlitz", "Cowlitz", "Cowlitz", "Douglas", "Douglas", "Douglas", 
"Douglas", "Douglas", "Douglas", "Ferry", "Franklin", "Franklin", 
"Franklin", "Franklin", "Garfield", "Grant", "Grant", "Grant", 
"Grant", "Grant", "Grant", "Grant", "Grant", "Grant", "Grant", 
"Grant", "Grant", "Grant", "Grant", "Grant", "Grays Harbor", 
"Grays Harbor", "Grays Harbor", "Grays Harbor", "Grays Harbor", 
"Grays Harbor", "Grays Harbor", "Grays Harbor", "Grays Harbor", 
"Island", "Island", "Island", "Jefferson", "King", "King", "King", 
"King", "King", "King", "King", "King", "King", "King", "King", 
"King", "King", "King", "King", "King", "King", "King", "King", 
"King", "King", "King", "King", "King", "King", "001", "001", 
"001", "001", "001", "003", "003", "005", "005", "005", "005", 
"005", "007", "007", "007", "007", "007", "009", "009", "009", 
"011", "011", "011", "011", "011", "011", "011", "011", "013", 
"013", "015", "015", "015", "015", "015", "017", "017", "017", 
"017", "017", "017", "019", "021", "021", "021", "021", "023", 
"025", "025", "025", "025", "025", "025", "025", "025", "025", 
"025", "025", "025", "025", "025", "025", "027", "027", "027", 
"027", "027", "027", "027", "027", "027", "029", "029", "029", 
"031", "033", "033", "033", "033", "033", "033", "033", "033", 
"033", "033", "033", "033", "033", "033", "033", "033", "033", 
"033", "033", "033", "033", "033", "033", "033", "033", "Hatton", 
"Lind", "Othello", "Ritzville", "Washtucna", "Asotin", "Clarkston", 
"Benton City", "Kennewick", "Prosser", "Richland", "West Richland", 
"Cashmere", "Chelan", "Entiat", "Leavenworth", "Wenatchee", "Forks", 
"Port Angeles", "Sequim", "Battle Ground", "Camas", "La Center", 
"Ridgefield", "Vancouver", "Washougal", "Woodland", "Yacolt", 
"Dayton", "Starbuck", "Castle Rock", "Kalama", "Kelso", "Longview", 
"Woodland", "Bridgeport", "Coulee Dam", "East Wenatchee", "Mansfield", 
"Rock Island", "Waterville", "Republic", "Connell", "Kahlotus", 
"Mesa", "Pasco", "Pomeroy", "Coulee City", "Coulee Dam", "Electric City", 
"Ephrata", "George", "Grand Coulee", "Hartline", "Krupp", "Mattawa", 
"Moses Lake", "Quincy", "Royal City", "Soap Lake", "Warden", 
"Wilson Creek", "Aberdeen", "Cosmopolis", "Elma", "Hoquiam", 
"McCleary", "Montesano", "Oakville", "Ocean Shores", "Westport", 
"Coupeville", "Langley", "Oak Harbor", "Port Townsend", "Algona", 
"Auburn", "Beaux Arts Village", "Bellevue", "Black Diamond", 
"Bothell", "Burien", "Carnation", "Clyde Hill", "Covington", 
"Des Moines", "Duvall", "Enumclaw", "Federal Way", "Hunts Point", 
"Issaquah", "Kenmore", "Kent", "Kirkland", "Lake Forest Park", 
"Maple Valley", "Medina", "Mercer Island", "Milton", "Newcastle"

My code is as below. I used city name to join, and my goal is to allow first dataset have county assign to them

test <- merge(x = CityInfo, y = CountynCity, by = "Location City", all = TRUE)
knitr::kable(test)

What are you joining on?

Can you provide some data and some code?

2 Likes

A reproducible example, called a reprex FAQ: What's a reproducible example (`reprex`) and how do I do one? is pretty much required to give an accurate answer, @williaml says.

The hardest part can be the data, but there's an approach that helps. First, not all the data is needed. Second, you can run

dput(dataset1)
# cut and paste
dput(dataset2)
# cut and paste

Screenshots are almost never helpful. The most I can see here is that there may not be a key field to join by.

I'm not sure I can provide the data on public forum. Sorry

It only needs to be a sample that is enough to reproduce the issue. I mean you have already put a screenshot of the data in your first post.

It doesn't even need to be real data.

Otherwise, have a read:

https://dplyr.tidyverse.org/reference/join.html

2 Likes

I tried my best to post the data. Please teach me, thanks

There is a little bit missing from your second dataset (at the end). The first one is fine.

Which columns are you joining?

Can you post the code that you used?

1 Like

sorry for the incomplete dataset. I tried to join 2 datasets by city name. So my first dataset can have county assign to them.

I just posted my simple code.

Thanks for teaching.

Please check on your sample data, the code of the second dataframe is incomplete and produces an error of runed.

The first dataset ends in this, which is correct:

"LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND", "LIND" ), .Dim = c(100L, 2L))

The second dataset is missing the dimensions and maybe some other bits.

"Maple Valley", "Medina", "Mercer Island", "Milton", "Newcastle"

I don't know how to incorporate 16000 data
So here is my dataset from OneDrive:
https://1drv.ms/u/s!AkCCgVuWwF1JhPJUHD0KdK6FCnA2Zw?e=aAbvmS

Not sure whether this is what you need, but you should make sure that both datasets have a shared variable with shared values (!watch out for differences in casing)

x <- structure(c("0101", "0102", "HATTON","LIND")
              ,.Dim = c(2L, 2L)
              ,dimnames=list(NULL,c("V1","name"))
              )
y <- structure(c("Adams","Adams","Asotin","001","001","003","Hatton", "Lind", "Asotin")
              ,.Dim = c(3L, 3L)
              ,dimnames=list(NULL,c("V2","V3","name"))
              )
y[,"name"] <- toupper(y[,"name"])
merge(x,y,by="name",all.y=TRUE)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.