How to automatically interpolate values for one data frame based on another lookup table/data frame?

I have one data frame and one look up table. What I want is to compare df_dat$value with df_lookup$threshold . If the value falls into threshold range, then create a new column transfer in df_dat so that its values are linearly interpolated from the transfer column in df_lookup


library(dplyr)

df_lookup <- tribble(
  ~threshold, ~transfer,
  0,   0,
  100,   15,
  200,   35
)
df_lookup
#> # A tibble: 3 x 2
#>   threshold transfer
#>       <dbl>    <dbl>
#> 1         0        0
#> 2       100       15
#> 3       200       35

df_dat <- tribble(
  ~date, ~value,
  "2009-01-01", 0,
  "2009-01-02", 30,
  "2009-01-06", 105,
  "2009-01-09", 150
)
df_dat
#> # A tibble: 4 x 2
#>   date       value
#>   <chr>      <dbl>
#> 1 2009-01-01     0
#> 2 2009-01-02    30
#> 3 2009-01-06   105
#> 4 2009-01-09   150

I can manually do it like this but wondering if there is an automatic way based on the values from the df_lookup table? Thank you.

df_dat %>% 
  mutate(transfer = case_when(value > 0 & value < 100 ~ 0 + (value - 0)*(15 - 0)/(100 - 0),
                              value >= 100 & value < 200 ~ 15 + (value - 100)*(35 - 15)/(200 - 100),
                              TRUE ~ 0)
  )
#> # A tibble: 4 x 3
#>   date       value transfer
#>   <chr>      <dbl>    <dbl>
#> 1 2009-01-01     0      0  
#> 2 2009-01-02    30      4.5
#> 3 2009-01-06   105     16  
#> 4 2009-01-09   150     25

You can write a function to do this using findInterval. Below is an example. Why the code's long (beyond it being for explanation): df_lookup is not in a "tidy" format for your purpose. You want to use it to define spans, where the lower and upper bounds are important. Each span is an object, so each span should get its own row.

Still, I left df_lookup in it's original form, because I don't know how your data comes in.

library(dplyr)
library(tibble)

interpolate_transfer <- function(x) {
  # A row id helps with lookup joining later
  df_lookup <- tibble(
    row_id = 1:3,
    threshold = c(0, 100, 200),
    transfer = c(0, 15, 35)
  )
  # Find the span each x belongs to
  tibble(lower = findInterval(x, df_lookup[["threshold"]])) %>%
    # Then get the lower and upper thresholds and transfer values
    # Beware x values not inside any spans
    mutate(upper = lower + 1) %>%
    mutate(is_out = lower < 1 | upper > nrow(df_lookup)) %>%
    mutate(
      lower = ifelse(is_out, NA, lower),
      upper = ifelse(is_out, NA, upper)
    ) %>%
    left_join(df_lookup, by = c(lower = "row_id")) %>%
    rename(
      threshold_lower = threshold,
      transfer_lower = transfer
    ) %>%
    left_join(df_lookup, by = c(upper = "row_id")) %>%
    rename(
      threshold_upper = threshold,
      transfer_upper = transfer
    ) %>%
    # Find how far each x is along its threshold span (0 to 1)
    # Then go that far along its transfer span
    mutate(
      unit_ratio = (x - threshold_lower) / (threshold_upper - threshold_lower)
    ) %>%
    mutate(
      value = transfer_lower + unit_ratio * (transfer_upper - transfer_lower)
    ) %>%
    pull(value)
}


interpolate_transfer(c(0, 30, 105, 150))
# [1]  0.0  4.5 16.0 25.0
3 Likes

That works. Thank you @nwerth!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.