You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Requires that text files have ascii-encoding, including the extended ascii set.
This is useful to detect files that have unicode characters.
require-ascii will fail on files that are encoded in extended ASCII if:
the file uses characters in the 128–255 range, and
those characters aren’t followed by other characters that coincidentally make the sequence valid UTF-8 (see this table).
This script will generate a bunch of files that contain valid extended ASCII but fail when tested by require-ascii:
# The README links to <https://theasciicode.com.ar/>. There's many different# ways you could extend ASCII, but that site in particular says "In 1981,# IBM developed an extension of 8-bit ASCII code, called 'code page 437'..."extended_ascii="cp437"forcode_pointinrange(128, 256):
# Create a file that should pass require-ascii, but won't.withopen(f"{code_point}.cp437.txt", mode='wb') asfile:
file.write(code_point.to_bytes(1, 'little'))
# Make sure that that file really does contain valid extended ASCII.withopen(f"{code_point}.cp437.txt", mode='rt', encoding=extended_ascii) asfile:
# This should cause a UnicodeDecodeError if file contains# invalid extended ASCII.file.read()
A more accurate description of require-ascii would be:
require-ascii
What it does
Requires that text files use UTF-8 and only use code points ≤ 255.
The text was updated successfully, but these errors were encountered:
According to the README:
require-ascii
will fail on files that are encoded in extended ASCII if:This script will generate a bunch of files that contain valid extended ASCII but fail when tested by
require-ascii
:A more accurate description of
require-ascii
would be:The text was updated successfully, but these errors were encountered: