Find duplicates and remove them

Hi,

I'm trying do detect duplicates in the column position and delete them. i alreay tried a lot of different approaches, but nothing worked for me.

this is what i tried at last

dfZ1[!duplicated(dfZ1[c("Position")]),]

Thank you

this is my df

Postion             Einheit      
1   01.0401  Z      Stk
2   02.0714A  Z     m
3   02.0715D  Z     PA
4   02.0715E  Z     PA
5   10.1805T  Z     m
6   10.1805V  Z     Stk
7   10.1808I  Z     Stk
8   35.9008A  Z     Stk
9   98.0408A  Z     h
10  01.0401 Z       stk
11  02.0715D Z      m²
12  02.0714A Z      VE
13  99.0408A Z      VE
14  35.9008A Z     VE

this how it should look like

Postion             Einheit      
1   01.0401  Z      Stk
2   02.0714A  Z     m
3   02.0715D  Z     PA
4   02.0715E  Z     PA
5   10.1805T  Z     m
6   10.1805V  Z     Stk
7   10.1808I  Z     Stk
8   35.9008A  Z     Stk
9   98.0408A  Z     h
10  99.0408A Z      VE
structure(list(Position = c("01.0401  Z", "02.0714A  Z", "02.0715D  Z", 
"02.0715E  Z", "10.1805T  Z", "10.1805V  Z", "10.1808I  Z", "35.9008A  Z", 
"98.0408A  Z", "01.0401 Z", "02.0715D Z", "02.0714A Z", "99.0408A Z", 
"35.9008A Z"), Stichwort = c("Grundlegende Charakt. gem. DVO", 
"Bestandspläne Kanal LP und LS", "Aufp. für koord. Aufnahme und Einb. in GIS", 
"Bestandspläne Stauraumkanal", "GF-UP-Kanalrohr, PN1, DN 1800, SN 10, gew. lief.u.verl.", 
"GF-UP Abschlusskappe, PN1, DN1800, SN10  lief.u.verl.", "Aufz.angef.GF-UP Schacht.DN 1000 auf DN 1800", 
"Abflussregler 10-30 l/s", "Schrämmhammer mit Schlauch", "DVO reg", 
"GIS", "LS", "Neu", "Regler"), Einheit = c("Stk", "m", "PA", 
"PA", "m", "Stk", "Stk", "Stk", "h", "stk", "m²", "VE", "VE", 
"Stk")), row.names = c(NA, 14L), class = "data.frame")
```

Hi @zetti: Could you say a little more? It looks like the rows you removed had duplicates in the first column, but their values in the second column weren't the same: How would you choose which duplicate to keep?

Hi, always want to keep the first duplicate (the upper once) in the first column . The values in the first column are always the same. But values in the second column can differ

could the space between the Z the problem why my code does not work?

Yes, since the strings are different. Are there other values besides 'Z' that you anticipate having in the position column?

Yes, most of them ( ~27000 cells) are without the Z at the end. But every month i add a few positions with an Z on the end. thats why i want to check for duplicates. at the beginning it is no problem to do it manual, but after a few month it is to much afford.

So if you have one with a Z and one without, would you want to keep both? Or just the earliest one, whether it has a Z or not?

Thx, I found a way. I deleted the space with the gsub function and then it worked out

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.