Skip to content

Latest commit

 

History

History
38 lines (22 loc) · 1.67 KB

README.md

File metadata and controls

38 lines (22 loc) · 1.67 KB

Project code organisation

Main code deliverables

exploratory/Clustering analysis and community detection.ipynb: Combination notebook, the final notebook that combines all intermediary work.

requirements.txt: all packages that need to be installed to run the downloader, parser and notebooks

Data scraping, parsing and combining code

src/

form_13f_downloader.py: loads the 13F forms from the sec.gov website

form_13f_parser: parse both the XML and Tabular formatted 13F files, saves it contents in data/all_submission_files.xlsx

cusip_to_ticker_converter.py: uses api.openfigi.com and marketwatch.com to fetch for each cusip in the list the corresponding ticker symbol and extra information. The ticker symbols will be stored in data/all_submission_files2.xlsx, all metadata will be stored in data/stock_info.json

exploratory/Read_the_extra_data.ipynb: reads data/stock_info.json, parses the company description to extract the year of foundation and saves all in data/investee_info.xlsx.

test/ Unit tests for the 13F form parser

Supporting tools and intermediate notebook versions

cleanup_notebook.sh: script to remove all output from notebooks, to be used before committing the changs to the git repository.

exploratory/

exploratory/exploratory_data_analysis.ipynb: explores the 13F forms that we collected.

exploratory/networkX_community_detection_yearOfFoundation.ipynb, exploratory/networkX_community_detection_sector.ipynb: Implements community detection using networkX_ respectively by year of foundation of the investees and industry/sector of the investees.

clustering.py: supporting code for the kmeans clustering notebook.