Corpus definition for text analysis

Dk_2810 · May 3, 2022, 1:04pm

Hello together,

I am still a beginner with R, so please excuse the probably easy question.
I am using the code:
textstat_readability(txt, measure = "all")
to measure some text statistics of txt files (e.g. FOG, number of words, number of sentences, etc). Txt in this case represents a place holder for the corpus I need.

The following code reads out my files in a table with the actual text I want to read out under "text".
readtext(paste0("~/Documents/Master/Masterarbeit/Datenset/Neu/Test/*"))
readtext object consisting of 2 documents and 0 docvars.

Description: df [2 × 2]

doc_id text

1 45404-12In-13348776D3799406720P-Gl.txt "" Pierson "..."
2 80749-16In-18733768Y12826817152N-Gl.txt ""\xff\xfeT\n\n\n\n\n\n\n"..."

Unfortunately, when I want to define this as a corpus to be used in the textstat_readability function with:
Example_Corpus <- Corpus(readtext(paste0("~/Documents/Master/Masterarbeit/Datenset/Neu/Test/*")))
I get the error:
inherits(x, "Source") is not TRUE

The ideal end result would give me a table with the file names in the beginning and following the various textstat_readability measures.

I would appreciate any help a lot! Thank you!

Best, Daniel

nirgrahamuk · May 3, 2022, 5:16pm

Corpus is coming from tm package ? I'll assume so...
the examples I've seen show the the main argument to it can be set up with a directory source that would seem to make your use of readtext redundant. but Im not familiar with readtext, and dont know what package its from.
I would try :

(Example_Corpus <- Corpus(DirSource("~/Documents/Master/Masterarbeit/Datenset/Neu/Test")))

system · May 24, 2022, 5:17pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.