Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanitisers, percentage processing and other minor changes #51

Merged
merged 20 commits into from
Oct 21, 2024

Conversation

prakaa
Copy link
Member

@prakaa prakaa commented Oct 10, 2024

Major functionality

  1. Sanitisation

    • column names
    • values
      • attempt conversion to numeric, otherwise:
        • replace newlines with whitespaces
        • remove double whitespaces
        • strip leading and trailing whitespaces
        • remove trailing asterisks
        • remove trailing footnotes
        • remove thousands commas
        • remove notes
      • then re-attempt conversion
  2. Postprocessing data to convert any % values to a number between 0 and 100

Minor functionality

  1. Add suggestion of closest table name if the user-supplied table name is not an available table

Bug fixes

  1. ValueErrors were not being raised in parser.py

Minor changes:

  • ruff import reorganisation
  • change use of type() to isinstance()
  • Add warning about sanitisation to README

@prakaa prakaa requested a review from nick-gorman October 10, 2024 04:13
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 93.54839% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/isp_workbook_parser/parser.py 86.53% 6 Missing and 1 partial ⚠️
src/isp_workbook_parser/read_table.py 96.15% 1 Missing and 1 partial ⚠️
src/isp_workbook_parser/sanitisers.py 97.77% 0 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
src/isp_workbook_parser/__init__.py 100.00% <100.00%> (ø)
src/isp_workbook_parser/config_model.py 100.00% <100.00%> (ø)
src/isp_workbook_parser/sanitisers.py 97.77% <97.77%> (ø)
src/isp_workbook_parser/read_table.py 94.38% <96.15%> (-0.25%) ⬇️
src/isp_workbook_parser/parser.py 86.61% <86.53%> (-1.21%) ⬇️

@prakaa prakaa changed the title Add sanitisers and other minor changes Add sanitisers, percentage processing and other minor changes Oct 17, 2024
Copy link
Member

@nick-gorman nick-gorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, the changes to the output should make many of the table much more usable.

See a few minor comments on specific files.

@prakaa prakaa merged commit 5956f01 into main Oct 21, 2024
15 checks passed
@prakaa prakaa deleted the sanitisation branch October 21, 2024 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants