Open source data extractor

12/18/2022

High level of maturity: We strongly prefer dependencies with mature code bases, including years of development and testing. Standard open source licensing: We strongly prefer dependencies with standard open source licensing options like MIT, Apache, or GPL-family licenses. As we have outlined in allied work ( ), our guiding package selection and design principles are stated below: This goal is accomplished by building on top of high-quality open source packages and through careful architecture choices that enable researchers to attack problems both large and small. OpenEDGAR is designed to provide an open source Python framework for working with EDGAR data at any scale. Support is provided through GitHub issue tracking at. OpenEDGAR is released under the MIT license, allowing for permissive use commercially and GPL compatibility if desired. LexPredict announced the open sourcing of OpenEDGAR in May 2018. On December 30, 2016, the SEC decommissioned FTP distribution and enabled its new HTTPS delivery mechanism much of OpenEDGAR was rewritten and modernized at this time. Over time, these scripts developed into a set of backend services behind a data product, the LexPredict Agreement Database, released in 2015. LexPredict first began archiving and indexing data from EDGAR in 2013 in order to develop corpora of legal and regulatory text for natural language and machine learning tasks. We believe that this data, especially when combined with the increasing number of open source resources for natural language processing and machine learning, can unlock answers for many important research questions ( ). OpenEDGAR allows the community of researchers and developers to share the cost and benefits of this core functionality, increasing access to this research data and lowering the cost of reproducing important research. Previously, researchers would independently spend time or money to redevelop the same data retrieval and parsing code over and over. OpenEDGAR is an open source Python framework designed to address these problems of access and reproducibility. However, despite the breadth of research conducted over two decades, it is still difficult for many scholars to carry out or reproduce research based on EDGAR. Over the last two decades, researchers around the world and in many disciplines have analyzed this data to ask and answer many important questions ( ).

Information disclosure through EDGAR has been a requirement for most publicly listed or registered investment companies for the last quarter century it now contains terabytes of documents and data including press releases, annual corporate filings, executive employment agreements, asset-backed security (ABS) performance, and investment company holdings.

0 Comments

Open source data extractor

Leave a Reply.

Author

Archives

Categories