-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest release v0.7.5 does not include the fix for quoted filenames (for non ASCII filenames) #113
Comments
There were 5 cases of error when processing a commit (with ChatGPT sharing link in commit message): - 4 caused by pathname with characters outside 7-bit ASCII, which makes git-diff to use quoted format for pathnames; unidiff library includes the fix, but it is not yet released matiasb/python-unidiff#113 - 1 caused by change being to a submodule rather than to file, or to be more exact moving from one version of subproject to the other (clone was not done using --recursive option) - 1 UnidiffParseError('Target without source: ...') with a creation diff (source is /dev/null) with quoted destination name (containing spaces) Lines survival stats: 76.89% lines survived (in 694 commits in 76 projects).
There were 5 cases of error when processing a commit (with ChatGPT sharing link in commit message): - 4 caused by pathname with characters outside 7-bit ASCII, which makes git-diff to use quoted format for pathnames; unidiff library includes the fix, but it is not yet released matiasb/python-unidiff#113 - 1 caused by change being to a submodule rather than to file, or to be more exact moving from one version of subproject to the other (clone was not done using --recursive option) - 1 UnidiffParseError('Target without source: ...') with a creation diff (source is /dev/null) with quoted destination name (containing spaces) Lines survival stats: 76.89% lines survived (in 694 commits in 76 projects).
Will prepare a release in the upcoming days 👍 |
Unfortunately, commit 2771a87 does not fully solve the problem of c-style quoted filenames. It makes unidiff to be able to parse patch with quoted filenames, but it then reproduces those filenames in their original quoted format. Shouldn't unidiff decode such filename to All the code does it makes unidiff be able to remove "a/" or "b/" prefix from filenames even if they are in their c-quoted form. |
Here is a bit ugly code that actually tries to decode c-quoted filename; not tested for Python 2 def decode_c_quoted_str(text):
"""C-style name unquoting
See unquote_c_style() function in 'quote.c' file in git/git source code
https://github.com/git/git/blob/master/quote.c#L401
This is subset of escape sequences supported by C and C++
https://learn.microsoft.com/en-us/cpp/c-language/escape-sequences
:param str text: string which may be c-quoted
:return: decoded string
:rtype: str
"""
# TODO?: Make it a global variable
escape_dict = {
'a': '\a', # Bell (alert)
'b': '\b', # Backspace
'f': '\f', # Form feed
'n': '\n', # New line
'r': '\r', # Carriage return
't': '\t', # Horizontal tab
'v': '\v', # Vertical tab
}
quoted = text.startswith('"') and text.endswith('"')
if quoted:
text = text[1:-1] # remove quotes
buf = bytearray()
escaped = False # TODO?: switch to state = 'NORMAL', 'ESCAPE', 'ESCAPE_OCTAL'
oct_str = ''
for ch in text:
if not escaped:
if ch != '\\':
buf.append(ord(ch))
else:
escaped = True
oct_str = ''
else:
if ch in ('"', '\\'):
buf.append(ord(ch))
escaped = False
elif ch in escape_dict:
buf.append(ord(escape_dict[ch]))
escaped = False
elif '0' <= ch <= '7': # octal values with first digit over 4 overflow
oct_str += ch
if len(oct_str) == 3:
byte = int(oct_str, base=8) # byte in octal notation
if byte > 256:
raise ValueError(f'Invalid octal escape sequence \\{oct_str} in "{text}"')
buf.append(byte)
escaped = False
oct_str = ''
else:
raise ValueError(f'Unexpected character \'{ch}\' in escape sequence when parsing "{text}"')
if escaped:
raise ValueError(f'Unfinished escape sequence when parsing "{text}"')
text = buf.decode()
return text |
I was wondering why unidiff fails on changes to files with filenames that include characters outside 7-bit ASCII, and it turns out that the latest release v0.7.5 does not include commit 2771a87 (Support quoted filenames, 2023-06-02).
Could we please get a new release with this fix included?
Thanks in advance.
The text was updated successfully, but these errors were encountered: