Annotated Text Extraction: extracting paragraphs from a large annotated plain text document

Hi, and welcome!

Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers.

Many are able to help even without being deeply knowledgable without NLP. They outnumber NLP experts but are unlikely to address a question without a reprex.

This almost makes it, but it missing an essential ingredient, the data represented by the doc argument.

Without it, the most I can help with is the bracketing operator.

Basically, it selects parts of a list. For example

head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
head(mtcars[1])
#>                    mpg
#> Mazda RX4         21.0
#> Mazda RX4 Wag     21.0
#> Datsun 710        22.8
#> Hornet 4 Drive    21.4
#> Hornet Sportabout 18.7
#> Valiant           18.1
mtcars[1,1]
#> [1] 21

Created on 2020-04-06 by the reprex package (v0.3.0)

Lists are also objects that can contain other objects, nested within them. And those objects, also.

So for a list of lists the double brackets address the list itself rather than the list's contents.