Create a long name chain using R

Is there a way in Rstudio to create the longest "name chain" within data. For example, if there are two columns in data a First Name column and Last Name Column. If both columns are filled with John, in the first name column then Chandler in the Last Name column, there a way to find another piece of data where Chandler is in the First Name column then another last name and then continuing on to make a long name chain.

Example:
John Chandler
Chandler Brian
Brian Michael
Michael Parker
Parker Tyler
Tyler Jones
STOP (Because the data does not have someone with a first name as Jones)

Welcome to RStudio Community!

There are definately ways to build something like you describe, but what R code have you tried so far? That will be a good starting point for others to help you make progress.

I have attempted an intersection where I can find names that are in both columns and also concatenate. Neither have appeared to get me anywhere

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Here is a test data set to try and find the longest name chain given a column of first names and a column of last names

First.Name<-c("John", "James", "Jack", "Jill", "Corey", "Callie", "Sofie", "Josie", "Lilly", "Luke", "Jane", "Sara", "Chandler","Flora","Parker","Preston")
Last.Name<-c("Smith","Flora","Chandler","Jack","John","Reeves","Preston","Parker","James","Brooks","Johnson","Smith","Krause","Casey","Corey","Lilly")

I tried to run >intersect(First.Name,Last.Name), which gave me the names that are in both columns but do not know where to go from there

Is there a way in Rstudio to create the longest "name chain" within data. For example, if there are two columns in data a First Name column and Last Name Column. If both columns are filled with John, in the first name column then Chandler in the Last Name column, there a way to find another piece of data where Chandler is in the First Name column then another last name and then continuing on to make a long name chain.

Example:
John Chandler
Chandler Brian
Brian Michael
Michael Parker
Parker Tyler
Tyler Jones
STOP (Because the data does not have someone with a first name as Jones)

Here is a test data set to try and find the longest name chain given a column of first names and a column of last names

First.Name<-c("John", "James", "Jack", "Jill", "Corey", "Callie", "Sofie", "Josie", "Lilly", "Luke", "Jane", "Sara", "Chandler","Flora","Parker","Preston")
Last.Name<-c("Smith","Flora","Chandler","Jack","John","Reeves","Preston","Parker","James","Brooks","Johnson","Smith","Krause","Casey","Corey","Lilly")

I tried to run >intersect(First.Name,Last.Name), which gave me the names that are in both columns but do not know where to go from there



First.Name<-c("John", "James", "Jack", "Jill", "Corey", "Callie", "Sofie", "Josie", "Lilly", "Luke", "Jane", "Sara", "Chandler","Flora","Parker","Preston")
Last.Name<-c("Smith","Flora","Chandler","Jack","John","Reeves","Preston","Parker","James","Brooks","Johnson","Smith","Krause","Casey","Corey","Lilly")
library(tidyverse)
library(sqldf)

first_df <- data.frame(
  firstname = First.Name,
  lastname = Last.Name
)

# a function to perform repeated left joins
do_next <- function(i){
lefttable <- if(i==1){"first_df"} else {paste0("result_",i-1)}
join <- if(i==1){"a.lastname"} else {paste0("nextname",i-1)}

assign(x=paste0("result_",i), sqldf::sqldf(paste0(
  "select a.*,b.lastname as nextname",i,"
                        from ",lefttable," a left join 
                        first_df b
                        on ",join,"=b.firstname")) %>% as_tibble,
  envir = globalenv())
}


i<- 0
repeat {
  i <- i +1 
  do_next(i)
  #if all the values in the final column of the latest result are NA then break as nothing to join to anything
  if(all(is.na(get(paste0("result_",i)) %>% pull(paste0("nextname",i)))))
    break;
  if(i>1000) #break at a high number in case there was a never ending cycle
    break;
}

i
get(paste0("result_",i))
   firstname lastname nextname1 nextname2 nextname3 nextname4 nextname5
   <fct>     <fct>    <chr>     <chr>     <chr>     <chr>     <chr>    
 1 John      Smith    NA        NA        NA        NA        NA       
 2 James     Flora    Casey     NA        NA        NA        NA       
 3 Jack      Chandler Krause    NA        NA        NA        NA       
 4 Jill      Jack     Chandler  Krause    NA        NA        NA       
 5 Corey     John     Smith     NA        NA        NA        NA       
 6 Callie    Reeves   NA        NA        NA        NA        NA       
 7 Sofie     Preston  Lilly     James     Flora     Casey     NA       
 8 Josie     Parker   Corey     John      Smith     NA        NA       
 9 Lilly     James    Flora     Casey     NA        NA        NA       
10 Luke      Brooks   NA        NA        NA        NA        NA       
11 Jane      Johnson  NA        NA        NA        NA        NA       
12 Sara      Smith    NA        NA        NA        NA        NA       
13 Chandler  Krause   NA        NA        NA        NA        NA       
14 Flora     Casey    NA        NA        NA        NA        NA       
15 Parker    Corey    John      Smith     NA        NA        NA       
16 Preston   Lilly    James     Flora     Casey     NA        NA

Thank you for your help! For some reason I am getting an error:

Error in get(paste0("result_", i)) : object 'result_1' not found

@feelingjuicy, can I confirm, you ran the entire script that I shared, and the above is the first error?

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.