from Family Relationship to Family Hierarchy

here I have a dataframe which stands for a family relationships

df <- data.frame(
  pid = c(101, 102, 103, 104, 105, 106, 107),
  pid_f = c(-8, -8, -8, -8, 102, -8, 106),
  pid_m = c(-8, 101, 101, -8, 104, -8, 103),
  pid_s = c(-8, 104, 106, 102, -8, 103, -8),
  pid_c1 = c(103, 105, 107, 105, -8, 107, -8),
  pid_c2 = c(102, -8, -8, -8, -8, -8, -8)
)

df
#>   pid pid_f pid_m pid_s pid_c1 pid_c2
#> 1 101    -8    -8    -8    103    102
#> 2 102    -8   101   104    105     -8
#> 3 103    -8   101   106    107     -8
#> 4 104    -8    -8   102    105     -8
#> 5 105   102   104    -8     -8     -8
#> 6 106    -8    -8   103    107     -8
#> 7 107   106   103    -8     -8     -8
colname label
pid personal id
pid_f father id
pid_m mother id
pid_s spouse id
pid_c1 children1 id
pid_c2 children2 id
the value "-8" means it doesn't exist
  • For the first row, we can read as person 101 has two children 103 and 102.
  • For second row, it can been read as person 102 have mother 101 , spouse 104 and a child 105.
  • ... and so on

So, from above family relationship table, we can create Family Tree (for easy to understand),

and finally obtain each person a corresponding hierarchy like this

pid pid_f pid_m pid_s pid_c1 pid_c2 Hierarchy
101 -8 -8 -8 103 102 1
102 -8 101 104 105 -8 2
103 -8 101 106 107 -8 2
104 -8 -8 102 105 -8 2
105 102 104 -8 -8 -8 3
106 -8 -8 103 107 -8 2
107 106 103 -8 -8 -8 3

My question is how to mutate the Hierarchy variable from df by some function.

df %>% mutate(Hierarchy = function(...)
       )

Could you please give me some help and advice?

I think if you stick with the data frame object recussion is your best bet. Something of the form:

getHierarchy <- function(x) {
   if(x == -8) return(0)
   x <- df[df$pid == x, ]
   max(c(
      getHierarchy(x$pid_m) + 1, 
      getHierarchy(x$pid_f) + 1
      )) 
}

This should give you the longest path to the root node of this tree. I haven't tried recursion in dplyr myself so I couldn't tell you how to implement this in that context.

1 Like

If we had known the generation gap of this family, for example gap_fam = 3,

whether can simplify the question?

df <- data.frame(
         pid = c(101, 102, 103, 104, 105, 106, 107),
       pid_f = c(-8, -8, -8, -8, 102, -8, 106),
       pid_m = c(-8, 101, 101, -8, 104, -8, 103),
       pid_s = c(-8, 104, 106, 102, -8, 103, -8),
      pid_c1 = c(103, 105, 107, 105, -8, 107, -8),
      pid_c2 = c(102, -8, -8, -8, -8, -8, -8),
     gap_fam = c(3, 3, 3, 3, 3, 3, 3)
)
df
#>   pid pid_f pid_m pid_s pid_c1 pid_c2 gap_fam
#> 1 101    -8    -8    -8    103    102       3
#> 2 102    -8   101   104    105     -8       3
#> 3 103    -8   101   106    107     -8       3
#> 4 104    -8    -8   102    105     -8       3
#> 5 105   102   104    -8     -8     -8       3
#> 6 106    -8    -8   103    107     -8       3
#> 7 107   106   103    -8     -8     -8       3

thanks @EliMiller EliMiller
at the same time,

ask similar questions, so i combine the two ideas

library(tidyverse)

df <- data.frame(
  pid = c(101, 102, 103, 104, 105, 106, 107),
  pid_f = c(-8, -8, -8, -8, 102, -8, 106),
  pid_m = c(-8, 101, 101, -8, 104, -8, 103),
  pid_s = c(-8, 104, 106, 102, -8, 103, -8),
  pid_c1 = c(103, 105, 107, 105, -8, 107, -8),
  pid_c2 = c(102, -8, -8, -8, -8, -8, -8)
)

df
#>   pid pid_f pid_m pid_s pid_c1 pid_c2
#> 1 101    -8    -8    -8    103    102
#> 2 102    -8   101   104    105     -8
#> 3 103    -8   101   106    107     -8
#> 4 104    -8    -8   102    105     -8
#> 5 105   102   104    -8     -8     -8
#> 6 106    -8    -8   103    107     -8
#> 7 107   106   103    -8     -8     -8


getHierarchy <- function(x) {
  if (x == -8) return(0)
  x <- df[df$pid == x, ]
  max(c(
    getHierarchy(x$pid_m) + 1,
    getHierarchy(x$pid_f) + 1
  ))
}


df %>%
  mutate(
    Hierarchy = map(pid, getHierarchy)
  )
#>   pid pid_f pid_m pid_s pid_c1 pid_c2 Hierarchy
#> 1 101    -8    -8    -8    103    102         1
#> 2 102    -8   101   104    105     -8         2
#> 3 103    -8   101   106    107     -8         2
#> 4 104    -8    -8   102    105     -8         1
#> 5 105   102   104    -8     -8     -8         3
#> 6 106    -8    -8   103    107     -8         1
#> 7 107   106   103    -8     -8     -8         3

Created on 2019-03-21 by the reprex package (v0.2.1)
However, for pid = 106 and pid = 104, their results 1 are not what I want. They are expected to be 2.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.