Hi,
Few things first:
- Swapping rows is not going to change the uniqueness, as this is based on a column which will stay the same. I think what you maybe meant is swapping values within a row so they are in a different column
- You think the 'error' stems from "PROVINCE = 11, DISTRICT = 15, SUB_DISTRI = 10, VILLAGE = 37", but when I run it, I get "PROVINCE = 11, DISTRICT = 5, SUB_DISTRI = 150, VILLAGE = 12". There are probably many places where this uniqueness number per column might not work.
Looking at your data in more detail, are you actually trying to get rid of duplicate values? Because when I look at your data, a lot of rows are duplicated. You can easily fix this like so:
myData = myData %>% distinct()
This reduces the number of rows from 447195 to 106528
If you now like to see how many unique values there are in the last 3 columns and see where it goes wrong by group do this:
#Count the number of unique values per group
myData = myData %>% distinct() %>%
group_by(PROVINCE, DISTRICT, SUB_DISTRI, VILLAGE) %>%
summarise(number_1_uniqueN = length(unique(pop1_NUMBER)),
number_2_uniqueN = length(unique(pop2_NUMBER)),
number_3_uniqueN = length(unique(pop3_NUMBER)),
.groups = "drop")
#Check where number_1-3 are not identical
notCorrect = myData %>% rowwise() %>%
mutate(
sameUniqueN = all(
number_1_uniqueN ==
c(number_2_uniqueN, number_3_uniqueN)
)) %>%
filter(!sameUniqueN)
> head(notCorrect)
# A tibble: 6 x 8
# Rowwise:
PROVINCE DISTRICT SUB_DISTRI VILLAGE number_1_uniqueN number_2_uniqueN number_3_uniqueN sameUniqueN
<dbl> <dbl> <dbl> <dbl> <int> <int> <int> <lgl>
1 11 5 150 12 3 4 3 FALSE
2 11 5 150 35 16 17 16 FALSE
3 11 9 150 6 8 9 8 FALSE
4 11 9 150 12 12 13 12 FALSE
5 11 10 70 7 6 7 6 FALSE
6 11 10 70 20 18 19 18 FALSE
I get a list of 33 groups where things go wrong...
I still don't see the use of this whole exercise, so please describe better what it is you want to understand from all this anaysis.
Hope this helps,
PJ