Restructure SEC company information tables #4079
Draft
+665
−1,260
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Closes #4078 . The first of two (or more) SEC table restructuring PRs - this handles the quarterly filings table and company information tables. It doesn't include the ownership tables.
What problem does this address?
Makes changes to the SEC table structures to be more well normalized and usable.
What did you change?
core_sec10k__quarterly_filings
This table is largely the same. Only minor changes were made.
filing_date
column description to indicate that it's daily frequencyreport_date
is the quarter that thefiling_date
pertains toexhibit_21_version
with a regex - enforce this with a constraint in the field metadata?Company Information Tables
raw_sec10k__quarterly_company_information
core_sec10k__quarterly_company_information
to be a raw tablecore_sec10k__quarterly_company_information
report_date
andcentral_index_key
the primary keyQuestions:
central_index_key
andreport_date
are not the natural primary key because there are slight differences in headers harvested from different filings. I prioritized headers from filings where the filer is the same as the record we're harvesting but we do lose some records by forcing the CIK + report date primary key. Any reason to leave filename as the primary key?out_sec10k__quarterly_company_information
utility_id_eia
andutility_name_eia
onto the core company information table.core_sec10k__changelog_company_name
central_index_key
,report_date
,name_change_date
,former_conformed_name
,current_name
out of raw company information table and see ifcentral_index_key
,report_date
,name_change_date
is a natural primary keyDocumentation
Make sure to update relevant aspects of the documentation.
Tasks
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list