Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XBRL Extractor badzipfile error #251

Open
broscious4peg opened this issue Aug 1, 2024 · 3 comments
Open

XBRL Extractor badzipfile error #251

broscious4peg opened this issue Aug 1, 2024 · 3 comments

Comments

@broscious4peg
Copy link

broscious4peg commented Aug 1, 2024

I am encountering a badzipfile error around the taxonomy file for FERC form 1. I am using the taxonomy file from the FERC website:
https://ecollection.ferc.gov/taxonomyHistory

Please let me know ASAP if you have any comments or ideas, and we can get to talking!

Error:

C:\Users\PEG Intern>xbrl_extract "C:\Users\PEG Intern\downloads\Puget Sound Files" --db-path "ferc1-2021-sample.sqlite" --taxonomy "C:\Users\PEG Intern\Downloads\Form 1_2023-04-01_976 (1).zip"
2024-08-01 15:18:27 [ INFO] catalystcoop.ferc_xbrl_extractor.xbrl:247 Parsing taxonomy from Form 1_2023-04-01_976/
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Scripts\xbrl_extract.exe_main
.py", line 7, in
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\cli.py", line 156, in main
return run_main(**vars(parse()))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\cli.py", line 134, in run_main
extracted = xbrl.extract(
^^^^^^^^^^^^^
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\xbrl.py", line 58, in extract
table_defs = get_fact_tables(
^^^^^^^^^^^^^^^^
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\xbrl.py", line 254, in get_fact_tables
taxonomy = Taxonomy.from_source(f, entry_point=taxonomy_entry_point)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\taxonomy.py", line 251, in from_source
taxonomy, view = load_taxonomy_from_archive(taxonomy_source, entry_point)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\arelle_interface.py", line 57, in load_taxonomy_from_archive
file_source = FileSource.openFileSource(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\arelle\FileSource.py", line 44, in openFileSource
filesource.openZipStream(sourceZipStream)
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\arelle\FileSource.py", line 351, in openZipStream
self.fs = zipfile.ZipFile(sourceZipStream, mode="r")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\zipfile_init
.py", line 1349, in init
self.RealGetContents()
File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\zipfile_init
.py", line 1416, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

@zaneselvans
Copy link
Member

Is your error reproducible? We've gotten this error very sporadically both in this data source and some others, and it seems to be something random -- like it works 99.9% of the time.

If it is reporducible:

  • What version of the xbrl-ferc-extractor package are you using?
  • What XBRL inputs are you using?
  • What version of Arelle are you using?

Also, are you just trying to access the FERC Form 1 data more generally? We publish complete extracted versions of the FERC Forms 1, 2, 6, 60, and 714. See the Nightly Builds section of the PUDL Data Access docs. The 2023 FERC Form 1 data is included as of 2 weeks ago.

If you'd like to take the data for a spin without needing to set anything up, you can also go play with our example notebooks on Kaggle. The data there is updated once a week, and will also have the 2023 FERC data.

Also, depending on what data you are trying to access in the Form 1, you may want to look at the tables which we've cleaned up and integrated into our main PUDL Database. It's only a few dozen out of the many that are available in the XBRL derived SQLite database, but they're way easier to work with, and are also integrated with the older DBF data going back to 1994.

@broscious4peg
Copy link
Author

Thanks for responding to this, I am looking for access to the FERC Form 1 data from the previous years of 2020 - 2023. Where could I find the database for all of this?

@zaneselvans
Copy link
Member

Download links can be found in the nightly builds section of the Data Access documentation.

I would recommend first looking at the FERC Form 1 tables which have been integrated into our main PUDL database, since it covers all years of data (1994-2023) and is much cleaner and more usable than the original DBF and XBRL data. However, there are only a couple dozen tables in there, so what you need may not be in there. Any table whose name contains ferc1 will be derived from the FERC Form 1.

If the table(s) you need have not been fully integrated into PUDL, then you will need to access the SQLite DBs that we produce which are just conversions of the old DBF and newer XBRL data formats into a modern relational database format:

You can also browse these databases online first if you want:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants