Create a new variable based on another dataframe; different number of rows.

Dear all,

I go straight to the example since it's somewhat hard to put it into words.

I have two dataframes.

library(tidyverse)

projects <- tibble(topic = c("a", "b", "c", "d"),
                  division = c("1", "2", "3", "4"))

topics <- tibble(topic = c("a", "b", "c"),
                 division.new = c("1.1", "2.2", "3.3"))

I would like to create a new variable in 'project' dataframe called 'division.new' by matching values by variables 'topic'. I tried something like this:

projects %>%
  mutate(division.new = if_else(topic %in% topics$topic, topics$division.new, "NA"))

But I get the following error:

Error in `mutate()`:
! Problem while computing `division.new = if_else(topic %in% topics$topic, topics$division.new, "NA")`.
Caused by error in `if_else()`:
! `true` must be length 4 (length of `condition`) or one, not 3.

I tried == instead of %in% but this cannot achieve the result that I want. What I would like to have is this:

projects <- tibble(topic = c("a", "b", "c", "d"),
       division = c("1", "2", "3", "4"),
       division.new = c("1.1", "2.2", "3.3", NA))

Which looks like this:

image

Again, to my understanding I'm creating a new variable based on another dataframe and assigning new values that are in the same dataframe. I imagine that one could do it with joins but that's not what I want or need. There must be a way how to do it without joining the dataframes, at least I hope there is!

I would appreciate any ideas. Thank you!

Here are two solutions. The first uses left_join(). I know you said you want to avoid using a join but I want to include that solution for others who find this thread.

library(dplyr)

projects <- tibble(topic = c("a", "b", "c", "d"),
                   division = c("1", "2", "3", "4"))

topics <- tibble(topic = c("a", "b", "c"),
                 division.new = c("1.1", "2.2", "3.3"))

projects <- left_join(projects, topics, by = "topic")
projects
#> # A tibble: 4 × 3
#>   topic division division.new
#>   <chr> <chr>    <chr>       
#> 1 a     1        1.1         
#> 2 b     2        2.2         
#> 3 c     3        3.3         
#> 4 d     4        <NA>

#Not using a join
projects <- tibble(topic = c("a", "b", "c", "d"),
                   division = c("1", "2", "3", "4"))

topics <- tibble(topic = c("a", "b", "c"),
                 division.new = c("1.1", "2.2", "3.3"))

New <- topics$division.new
names(New) <- topics$topic
projects$division.new <- New[projects$topic]

projects
#> # A tibble: 4 × 3
#>   topic division division.new
#>   <chr> <chr>    <chr>       
#> 1 a     1        1.1         
#> 2 b     2        2.2         
#> 3 c     3        3.3         
#> 4 d     4        <NA>

Created on 2022-04-22 by the reprex package (v0.2.1)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.