Hi all, I'm trying to compare one column in one dataframe against another two columns in another dataframe. So in the below example, I want to check whether for example 5 in tbl1 is between the start and end of tbl2; if it is, assign tbl2_id to the relevant row in tbl1. In reality, tbl2 has many columns that I want to eventually combine into tbl1 based off tbl2_id in tbl2, which I can do with left_join.

I hope that makes sense! I've included desired output at the bottom to hopefully clarify things

library(tidyverse)

tbl1 <- read_table2(
"tbl1_id number
1 5
2 2
3 2.25
4 2.6
5 3.2
6 10
6, 3
7, 5"
)

tbl2 <- read_table2(
"tbl2_id  start end
1  4  6
2  2.5  3.3
3  7  8"
)

tbl1 %>%
  mutate(tbl2_id = ifelse((number > tbl2$start & number < tbl2$end), tbl2$tbl2_id, NA))
#> Warning in number > tbl2$start: longer object length is not a multiple of
#> shorter object length
#> Warning in number < tbl2$end: longer object length is not a multiple of
#> shorter object length
#> # A tibble: 8 x 3
#>   tbl1_id number tbl2_id
#>     <dbl>  <dbl>   <dbl>
#> 1       1   5          1
#> 2       2   2         NA
#> 3       3   2.25      NA
#> 4       4   2.6       NA
#> 5       5   3.2        2
#> 6       6  10         NA
#> 7       6   3         NA
#> 8       7   5         NA


# Desired outcome
desired_output <- read_table2(
"tbl1_id number tbl2_id
1 5 1
2 2 NA
3 2.25  NA
4 2.6 2
5 3.2 2
6 10  NA
6 3 2
7 5 1"
)
desired_output
#> # A tibble: 8 x 3
#>   tbl1_id number tbl2_id
#>     <dbl>  <dbl>   <dbl>
#> 1       1   5          1
#> 2       2   2         NA
#> 3       3   2.25      NA
#> 4       4   2.6        2
#> 5       5   3.2        2
#> 6       6  10         NA
#> 7       6   3          2
#> 8       7   5          1

Created on 2019-02-11 by the reprex package (v0.2.1)

You can use fuzzyjoins

library(dplyr)
library(fuzzyjoin)

tbl1 <- tibble::tribble(
    ~tbl1_id, ~number,
    1,       5,
    2,       2,
    3,    2.25,
    4,     2.6,
    5,     3.2,
    6,      10,
    6,       3,
    7,       5
)

tbl2 <- tibble::tribble(
    ~tbl2_id, ~start, ~end,
    1,      4,    6,
    2,    2.5,  3.3,
    3,      7,    8
)

tbl1 %>% 
    fuzzy_left_join(tbl2,
                    by = c("number" = "start",
                           "number" = "end"),
                    match_fun = list(`>=`, `<=`)
    ) %>% 
    select(tbl1_id, number, tbl2_id)
#> # A tibble: 8 x 3
#>   tbl1_id number tbl2_id
#>     <dbl>  <dbl>   <dbl>
#> 1       1   5          1
#> 2       2   2         NA
#> 3       3   2.25      NA
#> 4       4   2.6        2
#> 5       5   3.2        2
#> 6       6  10         NA
#> 7       6   3          2
#> 8       7   5          1

Created on 2019-02-11 by the reprex package (v0.2.1)

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.