Custom function with mutate gives Input error

I would like to lookup a value in df$var to see if it corresponds via str_detect() to a value in another data frame.

example_df <- data.frame(
  url = c('blog/blah', 'blog/?utm_medium=foo', 'UK/something')
)

lookup_df <- data.frame(
  lookup_string = c('blog/blah', 'subscription', 'UK'),
  group = c('blog', 'subs', 'UK')
)

Want to check each value of example_df$url against lookup_df$lookup_string. I can do this for a single example for the first value, I want to see if string 'blog/blah' appears in lookup_df$lookup_string and if so, return the corresponding value in lookup_df$group, which in this case would be 'blog'. I can do this manually one at a time:

str = 'blog'
   lut = lookup_df
   lut %>% filter(str_detect(lookup_string, str)) %>% head(1) %>% pull(group)
[1] "blog"

Good, that is what I want. However, if I try to make this a function to use within a dplyr chain, in this case with mutate:

lookup_func <- function(str, lut) {
  lut %>% filter(str_detect(lookup_string, str)) %>% head(1) %>% pull(group)
}

example_df %>% mutate(blah = lookup_func(url, lookup_df))
Error: Problem with `mutate()` input `blah`.
x Input `blah` can't be recycled to size 3.
ℹ Input `blah` is `lookup_func(url, lookup_df)`.
ℹ Input `blah` must be size 3 or 1, not 0.

How can I use lookup_func() within mutate() in this way?

I think the following does most of what you want. Notice that I changed the values of url. I don't think str_detect will find part of the pattern within the string but it will find the pattern as part of the string.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)
example_df <- data.frame(
  url = c('blog/blah','UK','subscription' ,'blog')#'blog/?utm_medium=foo', 'UK/something'
)

lookup_df <- data.frame(
  lookup_string = c('blog/blah', 'subscription', 'UK'),
  group = c('blog', 'subs', 'UK')
)
lookup_func <- function(str, lut) {
  lut %>% filter(str_detect(lookup_string, str)) %>% head(1) %>% pull(group)
}

example_df %>% rowwise() %>% mutate(blah = lookup_func(url,lookup_df))
#> # A tibble: 4 x 2
#> # Rowwise: 
#>   url          blah 
#>   <chr>        <chr>
#> 1 blog/blah    blog 
#> 2 UK           UK   
#> 3 subscription subs 
#> 4 blog         blog

Created on 2021-03-03 by the reprex package (v0.3.0)

Thanks a lot. So the issue here was with my input data, opposed to the function itself?

I did add a call to rowwise() before the mutate(). This causes each row of the the data frame to be treated as an independent group.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.