If then with duplicates

yoyong · April 20, 2021, 2:55am

Hi everyone.

Need help.

I have a large data with 3 variables. Some observations have duplicates.

Example:

Name Code old_status new
Jose 11 1 1
Maria 22 0 0
Ben 11 1 1
Ben 44 0 1
Joey 11 1 1
Alex 11 1 1
Alex 33 0 1
Alex 44 0 1
Jhon 22 0 0
Jhon 33 0 0

I want to add a column called new where the results are binary, 0 and 1.

If the person has no duplicate, then the person retains its old_status. If the person has duplicates, the person's new status is 1, provided the code is 33 or 44 and that one of the old_status of the person is 1 (e.g. Alex), otherwise the status is 0 (e.g. Jhon). (See data)

Thanks.

mhakanda · April 20, 2021, 3:40am

Why following rows have different new observations, 1 and 0? They have same Code and old_status.
Alex 33 0 1 (7 th row)
Jhon 33 0 0 (last row)

yoyong · April 20, 2021, 4:20am

Hi.

Because one of Alex's old status is equal to 1. But for Jhon, it is both 0.
Maybe my instruction was not clear. Will make the changes.

Thanks.

yoyong

DavoWW · April 20, 2021, 6:31am

Hi @yoyong ,
I think this does what you need:

library(tidyverse)

in_text <- "
Name Code old_status
Jose  11 1
Maria 22 0
Ben   11 1
Ben   44 0
Joey  11 1
Alex  11 1
Alex  33 0
Alex  44 0
Jhon  22 0
Jhon  33 0"

df <- read.table(text=in_text, header=TRUE)
df
#>     Name Code old_status
#> 1   Jose   11          1
#> 2  Maria   22          0
#> 3    Ben   11          1
#> 4    Ben   44          0
#> 5   Joey   11          1
#> 6   Alex   11          1
#> 7   Alex   33          0
#> 8   Alex   44          0
#> 9   Jhon   22          0
#> 10  Jhon   33          0

df %>%
  group_by(Name) %>%
    mutate(dups = case_when(
    n() > 1 ~ "yes",
    TRUE ~ "no"
    )) %>%
  mutate(new = case_when(
                  dups == "no" ~ old_status,
                  dups == "yes" & (Code == 33 | Code == 44) & old_status == 1 ~ 1L,
                  dups == "yes" & max(old_status) == 1 ~ 1L,
                  TRUE ~ 0L
  ))
#> # A tibble: 10 x 5
#> # Groups:   Name [6]
#>    Name   Code old_status dups    new
#>    <chr> <int>      <int> <chr> <int>
#>  1 Jose     11          1 no        1
#>  2 Maria    22          0 no        0
#>  3 Ben      11          1 yes       1
#>  4 Ben      44          0 yes       1
#>  5 Joey     11          1 no        1
#>  6 Alex     11          1 yes       1
#>  7 Alex     33          0 yes       1
#>  8 Alex     44          0 yes       1
#>  9 Jhon     22          0 yes       0
#> 10 Jhon     33          0 yes       0

^{Created on 2021-04-20 by the reprex package (v2.0.0)}

system · May 11, 2021, 6:32am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.