Currently what you do in mutate for obtaining good_name is equivalent to
df <- tibble::tribble(
~bad_name, ~expected,
"newyork", "New York",
"alabama", "Alabama"
)
agrep(df$bad_name, state.name, max.distance = 3, value = TRUE)
#> Warning in agrep(df$bad_name, state.name, max.distance = 3, value = TRUE):
#> l'argument pattern a une longueur > 1 et seul le premier élément est utilisé
#> [1] "New York"
You see I obtain the same error. This is because, doing that, you are passing a character vector to pattern argument in agrep but it doesn't accept it, so it takes only the first one. see ?agrep. The agrep function is not vectorized, you need to vectorize it, or apply element by element.
Example by using a vectorise version:
df <- tibble::tribble(
~bad_name, ~expected,
"newyork", "New York",
"alabama", "Alabama"
)
vagrep <- Vectorize(agrep, "pattern")
vagrep(df$bad_name, state.name, max.distance = 3, value = TRUE)
#> $newyork
#> [1] "New York"
#>
#> $alabama
#> [1] "Alabama" "Oklahoma"
dplyr::mutate(
df,
good_name = vagrep(bad_name, state.name, max.distance = 3, value = TRUE)
)
#> # A tibble: 2 x 3
#> bad_name expected good_name
#> <chr> <chr> <named list>
#> 1 newyork New York <chr [1]>
#> 2 alabama Alabama <chr [2]>
You see you get the correct result now, with sometime several results on your fuzzy matching. You can use directly in mutate but you'll get a list result you need to proceed. (by selecting the first on as you did for example)
Without Vectorise, you need to apply on each row. With dplyr here is a way using purrr iteration:
df <- tibble::tribble(
~bad_name, ~expected,
"newyork", "New York",
"alabama", "Alabama"
)
library(dplyr)
df %>%
mutate(
good_name = purrr::map(bad_name,
~ agrep(.x, state.name, max.distance = 3, value = TRUE)
)
)
#> # A tibble: 2 x 3
#> bad_name expected good_name
#> <chr> <chr> <list>
#> 1 newyork New York <chr [1]>
#> 2 alabama Alabama <chr [2]>
With new dplyr 1.0.0, I think it will be easier as using improved rowise() operation
You need to install dev version for now to try that.
Hope it helps