find the number of occurrence in data frame

I am trying check duplicates in data frame but while check its just showing duplicate for second and third and so on text but it should show all records which are duplicate.

do we have any other solution or am i missing something else...??

df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"))

df4 <- df4 %>% mutate(`emp_unq` = ifelse(duplicated(emp_id, incomparables=c("", NA)),"duplicate emp",""))


the output should be like

emp_id emp_unq
DEV2698 duplicate emp
DEV2698 duplicate emp
DEV2698 duplicate emp

Hi @shoaibali ,
I found a solution here:
https://stackoverflow.com/questions/54688736/mutate-using-distinct-and-ifelse-dplyr

suppressPackageStartupMessages(library(tidyverse))
df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698",
                            "HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698",
                            "HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"))

new <- df4 %>% 
  group_by(emp_id) %>% 
  mutate(num = n(),
         emp_unq = ifelse(num > 1, "duplicated emp", "")) %>% 
  filter(emp_unq == "duplicated emp")

new
#> # A tibble: 3 x 3
#> # Groups:   emp_id [1]
#>   emp_id    num emp_unq       
#>   <chr>   <int> <chr>         
#> 1 DEV2698     3 duplicated emp
#> 2 DEV2698     3 duplicated emp
#> 3 DEV2698     3 duplicated emp

Created on 2021-06-18 by the reprex package (v2.0.0)

actually it should show all data not filtered duplicate data not just 3 record . i just wanted to mutate data frame with new column and this column will show Duplicate values .

That out required was just example.

Also one more thing if i want to apply this for more 2-3 columns then ....??

suppressPackageStartupMessages(library(tidyverse))
df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698",
                            "HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698",
                            "HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"))
set.seed(42)
df4$other_id <- sample(df4$emp_id,size=nrow(df4),replace = TRUE)

mutate(df4,
       across(cols=everything(),
              .fns = duplicated,
              .names = "isdup_{col}"))
1 Like

getting error :

Error: Problem with mutate() input ..1.
x everything() must be used within a selecting function.
i See https://tidyselect.r-lib.org/reference/faq-selection-context.html.
i Input ..1 is (function (.cols = everything(), .fns = NULL, ..., .names = NULL) ....

If you get that error when you run exactly my reprex then check your dplyr version, you might be lagging. Otherwise, please make a reprex so that your error is reproducible for me.

Oh I may have just forgotten the dot of .cols.
If that's it I apologise.

this will ignore ("", NA) in columns ...??
i mean if I want to ignore these values the ...???

This is working fine , how can I change TRUE and FALSE values to "Duplicate" across Variables
I tried like this but this is changing dataframe
colss = c("emp_id")
df4 <- df4%>% mutate(
across(.cols= colss,
.fns = duplicated,
.names = "{c(1)}. unique {col}")) %>%
mutate("1. unique emp_id" == "TRUE","Duplicate","")

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.