Need help troubleshooting step_textfeature function inside of textrecipes package

Hi all - just want to say big thanks first before I ask this question. Also want to acknowledge the incredible work by Emil and Julia who have taught me much about ML. Speaking of which, I am following along with the still-in-press SMLTAR ebook. I am working to engineer some features of my corpus and put them into the tidy recipes workflow. However, I am a bit stuck with the step_textfeature function that is a part of the textrecipes package. Hopefully my ignorance will be easy to remedy!

So in my feature engineering step I am making a handful of simple functions that should help improve the model's feature space. For example, here is a simple function that counts the number of characters in the given text record:

#simply count the length of the text
response_length <- function(x) {
  str_count(x)
}

And here is another function I wrote to analyze the total sentiment value of all the words in the text:

afinn <- get_sentiments("afinn")

derive_sentiment <- function(response) {
  df <- tibble(response) %>%
    unnest_tokens(word, response) %>%
    inner_join(afinn, by = "word")

  summed <- sum(df$value)
  return (summed)
}

When I call both of these functions outside of the recipe flow, they both return an integer (number) as expected. But, when I put them into the recipe using the step_textfeature function from the textrecipes package, I get some errors at model training. Here is the remainder of the code showing the appropriate inclusion of the two above functions in the workflow:

#define a list of these custom functions to be put into the recipe later
custom_functions <- list(
  response_length = response_length,
  derive_sentiment = derive_sentiment
)

#make a 'recipe' that pre-processes the text 
#here we also add in our custom functions that help build the feature space
preprocessing_recipe <-
  recipe(label ~ response,
         data = train
  ) %>%
  step_mutate(response_copy = response) %>%
  step_textfeature(response_copy, extract_functions = custom_functions) %>%
  step_tokenize(response) %>%
  step_stopwords(response) %>%
  step_tokenfilter(response, max_tokens = 500, min_times = 50) %>%
  step_tfidf(response) %>%
  step_downsample(label)


#cross-validation object
folds <- vfold_cv(train)

#declare a SVM classification model
svm_spec <- svm_rbf() %>%
  set_mode("classification") %>%
  set_engine("kernlab")
svm_spec

#build a SVM 'workflow' by passing the model and the recipe
svm_wf <- workflow() %>%
  add_recipe(preprocessing_recipe) %>%
  add_model(svm_spec)
svm_wf


#fit the models
#warning - takes a long time!
svm_rs <- fit_resamples(
  svm_wf,
  folds,
  metrics = metric_set(recall, precision, sensitivity, specificity, accuracy),
  control = control_resamples(save_pred = TRUE)
)
svm_rs

So the errors in question are simply that: the step_textfeature function throws:

Or at least I am assuming this comes from step_textfeature because in the docs it states:

All the functions passed to extract_functions must take a character vector as input and return a numeric vector of the same length, otherwise an error will be thrown.
(https://www.rdocumentation.org/packages/textrecipes/versions/0.3.0/topics/step_textfeature)

So, as I mentioned, I am under the impression that both of my functions are passed a single character vector and return a single real number. In fact, the first function (resposne_length) makes it through the recipe just fine. The error shows itself when I include the derive_sentiment function. I think I'm missing something super basic but if anyone can shine the light that would be awesome!

Cheers.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.