Using Dataset A to Fill in Dataset B where Values Match

Hello. I am trying to use a dataset (df_key) to fill in the missing values in another dataset (df1).

When the value in "Course.Name" in both datasets match, use the value in df_key's "Course.Code" to fill in the missing value in df1's "Course.Code"

Hope this is clear!

Here are the sample datasets:

df_key <- data.frame (Course.Name  = c("Philosophy", "English", "Physics", "Geography"),
                      Course.Code = c("PHIL", "ENG", "PHYS", "GEO")

df1 <- data.frame(Course.Name = c("Physics", "English", "Geography"),
                  Course.Code = c("", "ENG", ""))

Is this what you mean?

library(tidyverse)

df_key <- data.frame(Course.Name  = c("Philosophy", "English", "Physics", "Geography"),
                     Course.Code = c("PHIL", "ENG", "PHYS", "GEO"))

df1 <- data.frame(Course.Name = c("Physics", "English", "Geography"),
                  Course.Code = c("", "ENG", ""))

df1 %>% 
    rows_update(df_key %>%
                    filter(Course.Name %in% df1$Course.Name),
                by = "Course.Name")
#>   Course.Name Course.Code
#> 1     Physics        PHYS
#> 2     English         ENG
#> 3   Geography         GEO

Created on 2022-01-31 by the reprex package (v2.0.1)

Yes the output looks correct, but I can't use the rows_update function even with the tidyverse library turned on. Is there a way to recreate this solution in base R or using another library?

I appreciate your help and the time you take to answer questions :slight_smile:

Here is one method.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df_key <- data.frame (Course.Name  = c("Philosophy", "English", "Physics", "Geography"),
                      Course.Code = c("PHIL", "ENG", "PHYS", "GEO"))

df1 <- data.frame(Course.Name = c("Physics", "English", "Geography"),
                  Course.Code = c("", "ENG", ""))
df1 <- df1 |> inner_join(df_key,by="Course.Name") |> 
  mutate(Course.Code.x=ifelse(Course.Code.x=="",Course.Code.y,Course.Code.x)) |> 
  select(Course.Name,Course.Code=Course.Code.x)

df1
#>   Course.Name Course.Code
#> 1     Physics        PHYS
#> 2     English         ENG
#> 3   Geography         GEO

Created on 2022-01-31 by the reprex package (v2.0.1)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.