Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Whitespace Handling #18

Open
gleporeNARA opened this issue Jan 7, 2021 · 0 comments
Open

Better Whitespace Handling #18

gleporeNARA opened this issue Jan 7, 2021 · 0 comments

Comments

@gleporeNARA
Copy link

When developing signatures for text based formats it would be useful to have a built-in ability to manage whitespace, and potentially linebreaks as well.

Many programming languages are whitespace agnostic - whitepaces do not affect the processing of the program. Python is one exception.

Consider the following excerpts of formats in the Simple Game Format (https://www.red-bean.com/sgf/)

(
;GM[1]FF[3]

(;GM[1]FF[3]

( ;GM[1]FF[3]

Each file contains the same code, however, the first example has possible whitespace and a line break after the initial parentheses, the second example has no whitespace, and the third example has a single space after the semicolon.

Functionally, all three excerpts are valid (as they would be with HTML, Perl, etc.), but the PRONOM signatures for all three would be different.

I'm thinking of a new signature value which indicates "some number of blank spaces, tabs, and/or linebreaks here".

Does this make sense, or am I missing some easier method of creating signatures that cover all of the above possibilities (plus all those allowed in many text based formats)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant