-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to OSD formatting #64
Comments
Thanks for spelling this out here, I was aware of changes to Part 614 OSD section but didn't quite realize these minor formatting changes. Item 1 should be easily fixable right now by adding
Seriously? We handle this in some instances (for Typical Pedon specifically). But this very well could break tons of things with little to no benefit. I want to hold off on any changes to the codebase until we actually see these changes coming in via OSDRegistry. No need to change anything unless it is causing parsing problems. |
Yes, these changes, if implemented, will change the encoding (or inferred encoding) of the files. Currently the HTML has no declared encoding, but W3C validator detects as windows-1252. e.g. https://validator.w3.org/nu/?doc=https%3A%2F%2Fsoilseries.sc.egov.usda.gov%2FOSD_Docs%2Fb%2FBOOMER.html |
Oops, my mistake, if encoding is indeed intended to be "windows-1252" then the emdash is included in that set. |
Anticipated changes in OSD Style for #64 item 1
Closing this issue as there have been no significant systematic changes to OSD formatting. We can address specific problems if/when they trickle in |
Several changes to the OSD formatting standards (NSSH) may cause further inconsistency among OSD formatting styles encountered within the entire collection.
Conversion of doubled hyphen-minus delimiters (
--
) to em dash (—
) in all sections. This is most likely to affect parsing of the TYPICAL PEDON section. See.extractHzData()
.Section headers will now use title case: "TYPICAL PEDON:" → "Typical Pedon:" . This may affect all parsing related to finding section headers, and the downstream use of list element names if those are changed to match.
It is not clear if the encoding of the text or HTML files will change, will the new files be Unicode?
The TYPICAL PEDON section is also modified such that the short narrative is on its own line:
Ideas on checking encoding of text files. I have no idea if this will change, or how the download process modifies (?) the encoding.
The text was updated successfully, but these errors were encountered: