I am currently working my way through the book Text Mining with R and am at the tokenizing portion of the book. My question may appear a bit simplistic but bare with me.
In the example below, we take a column text and tokenize it into two ngrams. If i wished to model something like this for classification, i would need to take these tokens and convert them to a matrix of 1s and 0s where my original column has the bigram or not (1 where it does, 0 where it does not). Does anyone know how to accomplish this.
library(janeaustenr) library(tidyverse) library(tidytext) d <- tibble(txt = prideprejudice) d %>% unnest_tokens(bigram, txt, token = "ngrams", n = 2)