Duplicate Value Check

Hello,
I am new to R and I am trying to pick-up Lake Water level duplicate values from the data below, but the response returned from running :duplicate() is all false. This is not correct because i can see some of the data frame.

      Date A01 A02 A03 A032 A01_CD A02_CD A03_CD A032_CD

1 1966/05/07 4.9 NA NA NA 4.9 NA NA NA
2 1966/05/08 4.9 NA NA NA NA NA NA NA
3 1966/05/09 4.9 NA NA NA 4.9 NA NA NA
4 1966/05/10 4.9 NA NA NA NA NA NA NA
5 1966/05/11 4.8 NA NA NA NA NA NA NA
6 1966/05/12 4.9 NA NA NA NA NA NA NA
7 1966/05/13 4.8 NA NA NA NA NA NA NA
8 1966/05/15 4.8 NA NA NA NA NA NA NA
9 1966/05/16 4.8 NA NA NA 4.8 NA NA NA
10 1966/05/17 4.8 NA NA NA NA NA NA NA
11 1966/05/19 4.8 NA NA NA NA NA NA NA
12 1966/05/20 4.8 NA NA NA NA NA NA NA
13 1966/05/21 4.8 NA NA NA NA NA NA NA
14 1966/05/23 4.8 NA NA NA 4.8 NA NA NA

Next I tried to remove the duplicates using: No_Duplicates = distinct(dataset9, A01, A02, A03, A032, A01_CD, A02_CD, A03_CD, A032_CD, .keep_all= TRUE), but this removes entire rows and i just wanted to removed the duplicate value in the columns. Please help me.

I'm unclear what you mean when you say,

After you remove a duplicate value, what do you want to be there? NA? 0? What?

You typically cannot just remove a value as that would create columns of different lengths making the data no longer rectangular and invalid to be a data.frame object.

So, if you had a data frame that looked like,

   A  B  C
1  1  2  3
2  2  2  4
3  4  5  6
4  2  2  4

What do you imagine the output will be after the process you want to enact?

A B C
1 1 2 3
2 2 NA 4
3 4 5 6
4 2 NA 4
I would want to replace one of the duplicates with NA, especially in the CD columns.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

1 1966/05/07 4.9 NA NA NA NA NA NA NA
2 1966/05/08 4.9 NA NA NA NA NA NA NA
3 1966/05/09 4.9 NA NA NA NA NA NA NA
4 1966/05/10 4.9 NA NA NA NA NA NA NA
5 1966/05/11 4.8 NA NA NA NA NA NA NA
6 1966/05/12 4.9 NA NA NA NA NA NA NA
7 1966/05/13 4.8 NA NA NA NA NA NA NA
8 1966/05/15 4.8 NA NA NA NA NA NA NA
9 1966/05/16 4.8 NA NA NA NA NA NA NA
10 1966/05/17 4.8 NA NA NA NA NA NA NA
11 1966/05/19 4.8 NA NA NA NA NA NA NA
12 1966/05/20 4.8 NA NA NA NA NA NA NA
13 1966/05/21 4.8 NA NA NA NA NA NA NA
14 1966/05/23 4.8 NA NA NA NA NA NA NA
This would be the outcome of the code I would run.

seems like you treat each column independently and want to keep only the first unique values.

library(tidyverse)
set.seed(42)

(d<- data.frame(a=sample.int(5,10,replace=TRUE),
           b=sample.int(5,10,replace=TRUE),
           c=sample.int(5,10,replace=TRUE),
           d=sample.int(5,10,replace=TRUE)))
# a b c d
# 1  1 1 5 3
# 2  5 5 5 2
# 3  1 4 5 4
# 4  1 2 4 4
# 5  2 2 2 2
# 6  4 3 4 5
# 7  2 1 3 4
# 8  2 1 2 5
# 9  1 3 1 4
# 10 4 4 2 2

cleanvecofdups <- function(vec){
 df <- enframe(vec,name=NULL,value="v") %>% 
   group_by_all() %>% 
   mutate(rn = row_number()) 

 df$v2 = ifelse(df$rn==1,df$v,NA)
 
}


altered_cd <- purrr::map_dfc(d %>% select(c,d),
           cleanvecofdups)

(d2 <- bind_cols(d %>% select(a,b), altered_cd)) 
# a b  c  d
# 1  1 1  5  3
# 2  5 5 NA  2
# 3  1 4 NA  4
# 4  1 2  4 NA
# 5  2 2  2 NA
# 6  4 3 NA  5
# 7  2 1  3 NA
# 8  2 1 NA NA
# 9  1 3  1 NA
# 10 4 4 NA NA