Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require-ascii doesn’t do what it says on the tin #104

Open
Jayman2000 opened this issue Apr 25, 2022 · 0 comments
Open

require-ascii doesn’t do what it says on the tin #104

Jayman2000 opened this issue Apr 25, 2022 · 0 comments

Comments

@Jayman2000
Copy link

According to the README:

require-ascii

What it does

Requires that text files have ascii-encoding, including the
extended ascii set.
This is useful to detect files that have unicode characters.

require-ascii will fail on files that are encoded in extended ASCII if:

  1. the file uses characters in the 128–255 range, and
  2. those characters aren’t followed by other characters that coincidentally make the sequence valid UTF-8 (see this table).

This script will generate a bunch of files that contain valid extended ASCII but fail when tested by require-ascii:

# The README links to <https://theasciicode.com.ar/>. There's many different
# ways you could extend ASCII, but that site in particular says "In 1981,
# IBM developed an extension of 8-bit ASCII code, called 'code page 437'..."
extended_ascii = "cp437"

for code_point in range(128, 256):
	# Create a file that should pass require-ascii, but won't.
	with open(f"{code_point}.cp437.txt", mode='wb') as file:
		file.write(code_point.to_bytes(1, 'little'))
	# Make sure that that file really does contain valid extended ASCII.
	with open(f"{code_point}.cp437.txt", mode='rt', encoding=extended_ascii) as file:
		# This should cause a UnicodeDecodeError if file contains
		# invalid extended ASCII.
		file.read()

A more accurate description of require-ascii would be:

require-ascii

What it does

Requires that text files use UTF-8 and only use code points ≤ 255.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant