R Edgar 10-K annual report crawling

I want to get Edgar 10-k annual report data using crawling.
My question is,

  1. Is it possible to get data in EDGAR by crawling
  2. If is possible, how can I do it?

This can be done with the {edgar} package, but the process can be challenging depending on whether you already have the central index key (CIK) for the target registrant, know its fiscal year, you can stay under the rate throttling limits, and how well you understand how to parse the EDGAR data file structure. Unless the {edgar} package has since fixed an issue with how it implements the SEC's API, it may be necessary to do part of the processing in Python, using {reticulate} and installing the Python package that gets around this.

Before, however, embarking on a toolchain to do the extraction, be sure that you clearly understand what you are going to do with the extracted files. If you only require pdfs that won't be much of a problem, but if you plan on extracting, say, specific sections, be aware that the plain text files are generally 90+% HTML markup.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.