Cross observational "if clause"

Hey all,

I have two 3 IDs (Personal ID, Partner ID and Houshold ID), which I know generated for every person in a unique way. Now I want to assign the same Partner ID for both partners and the same Houshold IDs for every person living in the same houshold. This should be based on specific variables.

Now I would really appreciate your help for the following:

The participants are asked if their partner participates in the survey (yes/no). If "yes" is chosen, participants are asked to fill in the first and last name (first_name_p, last_name_p). Can I match the IDs with an If clause in this case? I want to say "If first_name_p = first_name (of any another participant) AND last_name_p = last_name (of another participant) = same ID.

So basically I would like to know how to refer to all other observations in an if-clause.

I hope this is understandable. I appreciate any help!! Thanks!

Reverse engineering a question is hard. Can you provide a reprex. See the FAQ?

Hey Technocrat,

thanks, yes. See my example dataframe below;

data.frame(
stringsAsFactors = FALSE,
did = c(100, 101, 102, 103, 104, 105),
pid = c("3333", "4444", "5555", "6666", "7777", "8888"),
partner = c("yes", "no", "no", "yes", "no", "no"),
prename = c("Susi", "Peter", "Christian", "Hans", "Maja", "Robin"),
prename_p = c("Hans", NA, NA, "Susi", NA, NA),
surname = c("Bauer", "M├╝ller","Schneider", "Maurer","B├Ącker","Maler"),
surname_p = c("Maurer", NA, NA, "Bauer", NA, NA),

In this dataframe, the ID and PartnerID (pid) are all unique. What I would like to do, is to have matching IDs with the partner (e.g. Susi and Hans should have the same partner ID)

I tried the following syntax, but where it says "partner ID", it should refer to that specific partners ID (e.g. Hans should have the same pid as Susi). I am not sure, how to refer to a value of another variable in another observation.

Testfile1$pid[Testfile1$surname_p %in% Testfile1$surname & Testfile1$prename_p %in% Testfile1$prename] <-"partnerID"

Hope this is clearer now. Thanks for your help!

Here's an approach, but it's limited to the example data in usefulness because it does not account for

  1. More than one pair of partners
  2. Partners of a pair that are not separated by one or partners of a different pair

but let's start here.

d <- data.frame(stringsAsFactors = FALSE,
did = c(100, 101, 102, 103, 104, 105),
pid = c("3333", "4444", "5555", "6666", "7777", "8888"),
partner = c("yes", "no", "no", "yes", "no", "no"),
prename = c("Susi", "Peter", "Christian", "Hans", "Maja", "Robin"),
prename_p = c("Hans", NA, NA, "Susi", NA, NA),
surname = c("Bauer", "M├╝ller","Schneider", "Maurer","B├Ącker","Maler"),
surname_p = c("Maurer", NA, NA, "Bauer", NA, NA))

d
#>   did  pid partner   prename prename_p   surname surname_p
#> 1 100 3333     yes      Susi      Hans     Bauer    Maurer
#> 2 101 4444      no     Peter      <NA>    M├╝ller      <NA>
#> 3 102 5555      no Christian      <NA> Schneider      <NA>
#> 4 103 6666     yes      Hans      Susi    Maurer     Bauer
#> 5 104 7777      no      Maja      <NA>    B├Ącker      <NA>
#> 6 105 8888      no     Robin      <NA>     Maler      <NA>
# rows 1 & 4 are partners
paired <- which(d$partner == "yes")
paired
#> [1] 1 4
d[paired,'pid'] = d[paired[1],'pid']
d
#>   did  pid partner   prename prename_p   surname surname_p
#> 1 100 3333     yes      Susi      Hans     Bauer    Maurer
#> 2 101 4444      no     Peter      <NA>    M├╝ller      <NA>
#> 3 102 5555      no Christian      <NA> Schneider      <NA>
#> 4 103 3333     yes      Hans      Susi    Maurer     Bauer
#> 5 104 7777      no      Maja      <NA>    B├Ącker      <NA>
#> 6 105 8888      no     Robin      <NA>     Maler      <NA>

This approach relies on the subset operator [ ]. It first identifies the pair of partners (rows 1 and 4) and then sets the pid to the first to occur.

1 Like

Thanks! That is a step and helpful. But would u have any idea on how to do approach it, if you have multiple partners?

Need more data. Since this one is marked solved, open a new thread. Data should illustrate multiple partners, no particular order and an assumption that all individuals are uniquely identified, so there is no possibility of misidentification of who belong together.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.