Writing code to do word counts for a large corpus

Have you looked at the "Analyzing word and document frequency" chapter of Tidy Text Mining? It takes you through the process of getting word counts step-by-step in a really nice way.

Right now, you have quite a bit of code here, and it's not immediately clear which part is problematic— a key piece of the minimal reproducible example. (Note also that it's best to refrain from including rm(list=ls()) in your example, as that would remove everything from the environment of anyone else trying to reproduce your issue).

For learning how to make a reprex, check out the community reprex FAQ. There's also a video tutorial that will take you through the process:

1 Like