Duplicate value removal

Hi Everyone,

I would like to remove duplicate values from a dataset. I do not want to delete entire rows though, I would like to remove the value and leave it blank.

Below is an example of the data frame I am using.
A01 A02 A03 A032 A01_CD A02_CD A03_CD A032_CD
1 4.9 NA NA NA 4.9 NA NA NA
2 4.9 NA NA NA NA NA NA NA
5 4.8 NA NA NA NA NA NA NA
9 4.8 NA NA NA 4.8 NA NA NA
16 4.7 NA NA NA 4.8 NA NA NA
18 4.7 NA NA NA NA NA NA NA
31 5.0 NA NA NA NA NA NA NA

This is the outcome I would like to achieve.
A01 A02 A03 A032 A01_CD A02_CD A03_CD A032_CD
1 4.9 NA NA NA NA NA NA NA
2 4.9 NA NA NA NA NA NA NA
5 4.8 NA NA NA NA NA NA NA
9 4.8 NA NA NA NA NA NA NA
16 4.7 NA NA NA 4.8 NA NA NA
18 4.7 NA NA NA NA NA NA NA
31 5.0 NA NA NA NA NA NA NA

I have been using: No_Duplicates = dataset11 %>% distinct(A01, A02, A03, A032, A01_CD, A02_CD, A03_CD, A032_CD, .keep_all = TRUE)
But this deletes entire rows and I do not want that.

Is this different from your previous post ?

My newest question/topic is much clear on what I want to do.

why is the 4.9 in the first row eliminated from your desired output ? as the first row of the data for that variable it has not been encountered....also you are deleting the first 4.8 and keeping the second one...

There were two 4.9, so I removed it as it was a duplicate.

oh........... thanks for the tip :smiley:
My original solution works, just a case of transposing before applying and then transposing back after.
full reprex.



example_df <- function(intext) {
  tf <- tempfile()
  writeLines(intext, con = tf)
  require(tidyverse)
  as_tibble(read.delim(tf,sep=" "))
}
(df1 <- example_df("A01 A02 A03 A032 A01_CD A02_CD A03_CD A032_CD
1 4.9 NA NA NA 4.9 NA NA NA
2 4.9 NA NA NA NA NA NA NA
5 4.8 NA NA NA NA NA NA NA
9 4.8 NA NA NA 4.8 NA NA NA
16 4.7 NA NA NA 4.8 NA NA NA
18 4.7 NA NA NA NA NA NA NA
31 5.0 NA NA NA NA NA NA NA"))

(desired <- example_df("A01 A02 A03 A032 A01_CD A02_CD A03_CD A032_CD
1 4.9 NA NA NA NA NA NA NA
2 4.9 NA NA NA NA NA NA NA
5 4.8 NA NA NA NA NA NA NA
9 4.8 NA NA NA NA NA NA NA
16 4.7 NA NA NA 4.8 NA NA NA
18 4.7 NA NA NA NA NA NA NA
31 5.0 NA NA NA NA NA NA NA"))


cleanvecofdups <- function(vec){
  df <- enframe(vec,name=NULL,value="v") %>% 
    group_by_all() %>% 
    mutate(rn = row_number()) 
  
  df$v2 = ifelse(df$rn==1,df$v,NA)
  
}
altered <- purrr::map_dfc(t(df1) %>% as_tibble() ,
                          cleanvecofdups) %>% t() %>% as_tibble %>% setNames(names(df1))

I am new to R. I am not familiar with alot...So, may you please inform or explain the above code for me.

Thank you

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.