Comparing one column is within range of two columns in another data frame

Hi all, I'm trying to compare one column in one dataframe against another two columns in another dataframe. So in the below example, I want to check whether for example 5 in tbl1 is between the start and end of tbl2; if it is, assign tbl2_id to the relevant row in tbl1. In reality, tbl2 has many columns that I want to eventually combine into tbl1 based off tbl2_id in tbl2, which I can do with left_join.

I hope that makes sense! I've included desired output at the bottom to hopefully clarify things

library(tidyverse)

tbl1 <- read_table2(
"tbl1_id number
1 5
2 2
3 2.25
4 2.6
5 3.2
6 10
6, 3
7, 5"
)

tbl2 <- read_table2(
"tbl2_id  start end
1  4  6
2  2.5  3.3
3  7  8"
)

tbl1 %>%
  mutate(tbl2_id = ifelse((number > tbl2$start & number < tbl2$end), tbl2$tbl2_id, NA))
#> Warning in number > tbl2$start: longer object length is not a multiple of
#> shorter object length
#> Warning in number < tbl2$end: longer object length is not a multiple of
#> shorter object length
#> # A tibble: 8 x 3
#>   tbl1_id number tbl2_id
#>     <dbl>  <dbl>   <dbl>
#> 1       1   5          1
#> 2       2   2         NA
#> 3       3   2.25      NA
#> 4       4   2.6       NA
#> 5       5   3.2        2
#> 6       6  10         NA
#> 7       6   3         NA
#> 8       7   5         NA


# Desired outcome
desired_output <- read_table2(
"tbl1_id number tbl2_id
1 5 1
2 2 NA
3 2.25  NA
4 2.6 2
5 3.2 2
6 10  NA
6 3 2
7 5 1"
)
desired_output
#> # A tibble: 8 x 3
#>   tbl1_id number tbl2_id
#>     <dbl>  <dbl>   <dbl>
#> 1       1   5          1
#> 2       2   2         NA
#> 3       3   2.25      NA
#> 4       4   2.6        2
#> 5       5   3.2        2
#> 6       6  10         NA
#> 7       6   3          2
#> 8       7   5          1

Created on 2019-02-11 by the reprex package (v0.2.1)

You can use fuzzyjoins

library(dplyr)
library(fuzzyjoin)

tbl1 <- tibble::tribble(
    ~tbl1_id, ~number,
    1,       5,
    2,       2,
    3,    2.25,
    4,     2.6,
    5,     3.2,
    6,      10,
    6,       3,
    7,       5
)

tbl2 <- tibble::tribble(
    ~tbl2_id, ~start, ~end,
    1,      4,    6,
    2,    2.5,  3.3,
    3,      7,    8
)

tbl1 %>% 
    fuzzy_left_join(tbl2,
                    by = c("number" = "start",
                           "number" = "end"),
                    match_fun = list(`>=`, `<=`)
    ) %>% 
    select(tbl1_id, number, tbl2_id)
#> # A tibble: 8 x 3
#>   tbl1_id number tbl2_id
#>     <dbl>  <dbl>   <dbl>
#> 1       1   5          1
#> 2       2   2         NA
#> 3       3   2.25      NA
#> 4       4   2.6        2
#> 5       5   3.2        2
#> 6       6  10         NA
#> 7       6   3          2
#> 8       7   5          1

Created on 2019-02-11 by the reprex package (v0.2.1)

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.