Check if information is in another data frame

Hello everyone,

I'm a beginner to R and had a query. I have 2 data frames with information about students in a classroom.

Dataset 1 has each student pair - John-Greg, John-Linda, etc generated from

df1 <- expand.grid(respondent= c("John", "Linda", "Greg", "Jay", "Erin"), classmate= c("John", "Linda", "Greg", "Jay", "Erin"))

Dataset 2 has responses from each student which captures who are their friends in the classroom in 4 variables (friend1, friend2, so on).

I want to add a variable to df1 which captures whether the student (respondent) mentioned that classmate as a friend in any of the 4 variables of df2.

I'm confused with how to do this, any help would be greatly appreciated!

You typically want to do a left_join to bring information from one data frame to another. Therefore the trick is to manipulate the data frames to make the join possible. In other words, manipulate them so they have common columns to use as the by argument.

In this particular case, data frame 2 is in wide format and needs to be pivoted to make them "compatible."

library(tidyverse)

df1 <- expand.grid(
  respondent = c("John", "Linda", "Greg", "Jay", "Erin"), 
  classmate = c("John", "Linda", "Greg", "Jay", "Erin")
)

df2 <- tibble(
  respondent = c("John", "Linda", "Greg", "Jay", "Erin"),
  friend1 = c("Linda", "Greg", "Jay", "Erin", "John"),
  friend2 = c("Greg", "Jay", "Erin", "John", "Linda"),
)

df2p <- pivot_longer(df2, -respondent, values_to = "classmate") %>%
  mutate(is_friend = TRUE) %>%
  select(-name)
df2p
#> # A tibble: 10 x 3
#>    respondent classmate is_friend
#>    <chr>      <chr>     <lgl>    
#>  1 John       Linda     TRUE     
#>  2 John       Greg      TRUE     
#>  3 Linda      Greg      TRUE     
#>  4 Linda      Jay       TRUE     
#>  5 Greg       Jay       TRUE     
#>  6 Greg       Erin      TRUE     
#>  7 Jay        Erin      TRUE     
#>  8 Jay        John      TRUE     
#>  9 Erin       John      TRUE     
#> 10 Erin       Linda     TRUE

left_join(df1, df2p, by = c("respondent", "classmate")) %>%
  mutate(is_friend = replace_na(is_friend, FALSE))
#>    respondent classmate is_friend
#> 1        John      John     FALSE
#> 2       Linda      John     FALSE
#> 3        Greg      John     FALSE
#> 4         Jay      John      TRUE
#> 5        Erin      John      TRUE
#> 6        John     Linda      TRUE
#> 7       Linda     Linda     FALSE
#> 8        Greg     Linda     FALSE
#> 9         Jay     Linda     FALSE
#> 10       Erin     Linda      TRUE
#> 11       John      Greg      TRUE
#> 12      Linda      Greg      TRUE
#> 13       Greg      Greg     FALSE
#> 14        Jay      Greg     FALSE
#> 15       Erin      Greg     FALSE
#> 16       John       Jay     FALSE
#> 17      Linda       Jay      TRUE
#> 18       Greg       Jay      TRUE
#> 19        Jay       Jay     FALSE
#> 20       Erin       Jay     FALSE
#> 21       John      Erin     FALSE
#> 22      Linda      Erin     FALSE
#> 23       Greg      Erin      TRUE
#> 24        Jay      Erin      TRUE
#> 25       Erin      Erin     FALSE

Created on 2021-06-15 by the reprex package (v1.0.0)

1 Like

Thank you so much! I'm getting the following error - any ideas?

Error: Failed to create output due to bad names.

  • Choose another strategy with names_repair

Hm. One possibility is pivot_longer by default creates a column called "name" so if you have a "name" column already then there will be a conflict. Either rename this column or use the names_to argument so pivot_longer creates a column with a different name.

Thank you! I used this and it worked PERFECTLY!

 df2p%<>% rename(respondent_name=name)
  df2p<- pivot_longer(df2p, -respondent_name, values_to = "classmate_name") %>%
                 mutate(is_friend = TRUE) %>% drop_na(classmate_name) %>% select (1, 3, 4)   
  df1 <- left_join(df1, df2p, by = c("respondent_name" , "classmate_name")) %>%
         mutate(is_friend = replace_na(is_friend, FALSE))

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.