Hi, couple of prelims
It's considered bad form to include
in sample code.
Putting the example in reproducible example, called a reprex form cuts down on cut-and-paste errors and makes it possible to follow the code without having to run it.
My suggestion is that the approach being taken is too granular and doesn't need to be random, merely representative of values on which sentiment analysis of your real data can be run.
Consider this example, starting from
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> filter, lag
#> The following objects are masked from 'package:base':
#> intersect, setdiff, setequal, union
d <- tibble(txt = prideprejudice)
placeholder <- d %>% unnest_tokens(sentence, txt, token = "paragraphs")
placeholder <- placeholder[1:2000,]
placeholder <- rbind(placeholder,placeholder,placeholder,placeholder, placeholder)
#> # A tibble: 10,000 x 1
#> 1 pride and prejudice
#> 2 by jane austen
#> 3 chapter 1
#> 4 " it is a truth universally acknowledged, that a single man in possession of…
#> 5 however little known the feelings or views of such a man may be on his first…
#> 6 "\"my dear mr. bennet,\" said his lady to him one day, \"have you heard that…
#> 7 mr. bennet replied that he had not.
#> 8 "\"but it is,\" returned she; \"for mrs. long has just been here, and she to…
#> 9 mr. bennet made no answer.
#> 10 "\"do you not want to know who has taken it?\" cried his wife impatiently."
#> # … with 9,990 more rows
Created on 2019-11-21 by the reprex package (v0.3.0)
It gets you a 10,000 row tibble filled with paragraphs (each repeated 5 times).