Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added the newline parameter to PatchSet::from_filename and a test cas… #83

Closed
wants to merge 11 commits into from

Conversation

huichen-cs
Copy link

I wonder if it is a good idea to add an "newline" parameter to PatchSet::from_filename. I am motivated by the following observations.

  • Git internally treats newline as '\n'. However, it doesn't prevent someone checking in a file that has '\r' or '\r\n' to a Git repository. Although it may be argued that it is a Git configuration error on the user's part, there are Git repositories that contains '\r', for instance, line 223 and 238 in a version of the ExpandableDoubleArrayTest.java of the Apache Commons Math project.
  • Although Git-Diff has options to control whether to ignore '\r' or spaces, it does not remove '\r' when generating the diff file. As the result, a diff file may contain '\r'. unidiff cannot parse those diff's. PatchSet::from_filename would throw a UnidiffParseError exception.
  • I realize that we can pass a file to PatchSet::init or create a PatchSet object from a string where we treat '\n' as the newline (turns off python's universal newline handling). However, I struggled to discover that it was the CR that caused my problems since I wished to write a one liner from the ouset (like PathSet.from_filename(diff_file)). I though adding this newline parameter would 1) raise the user's attention about the newline handling and 2) avoid a misuse (at present, we can argue that it is a misuse to do PathSet.from_filename(diff_file) when diff_file is a text file that has CR as a non-line-ending character).

This patch contains two lines of change to parse.py, and a simple test diff file, and a test case in test_parse.py.

Thank you.

@johnyf
Copy link

johnyf commented Sep 22, 2021

Maybe relevant to the point about git configuration and \r characters are .gitattributes files: https://git-scm.com/docs/gitattributes.

@matiasb
Copy link
Owner

matiasb commented Oct 6, 2021

Thanks for the detailed information and the PR, will take a look and give it some thought 👍

Copy link
Owner

@matiasb matiasb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay getting to this (I had a couple of complicated weeks!). At first sight change makes sense, I would split the github actions update to a different PR though.


# Runs a set of commands using the runners shell
- name: Run a multi-line script
run: |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't need this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matiasb ,

Thank you for reviewing this.

You are right. We don't need this in main.yml. My initial intention was not to submit this as a pull request, but to use it as a quick way to run tests with both Python 2 and 3. I'll remove this and commit it to the fork.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to split the work, thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matiasb ,

I have split the work, and submitted two new pull requests. They are pull request #84 and #85 . Thank you.

@@ -0,0 +1,44 @@
# This is a basic workflow to help you get started with Actions

name: CI
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering moving CI to actions, but maybe we can propose this change in a different PR?

Copy link
Author

@huichen-cs huichen-cs Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @matiasb ,

Thanks.

My original intention was to use this to run tests for Python 2.7.

As related, my initial pull request failed the test for Python 2.7 because codecs.open does not have the newline parameter. In a commit that follows, I revise it to use io.open for Python 2.7, and all tests passed.

If you think these changes are OK, I will resubmit the proposals, perhaps, in two pull requests, one is to add the newline parameter, and the other to add the configuration file for Github actions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment motivated by the issues related to Python 2.7. The package unidiff supports Python 2.7:

python-unidiff/setup.py

Lines 48 to 49 in 3353080

"Programming Language :: Python :: 2",
'Programming Language :: Python :: 2.7',

The final release of Python 2.7 was about 2 years ago (PEP 373). The final release of Python 3.6 is expected to be in about 2 months (PEP 494). So it seems that moving to code compatible with only Python 3 would not cause issues.

Supporting only Python 3 will enable using new syntax, for example formatted string literals, parenthesization within with statements, and structural pattern matching (when the supported Python versions will all be >= 3.10).

For the full new syntax of formatted string literals, Python >= 3.8 is required. Nonetheless, this would still mean that three Python versions are supported (3.8, 3.9, 3.10), which aligns with the extent of support for past Python versions in packages like numpy, scipy, and networkx.

(Sidenote: Python 3.9 is in the CI test environments:

but not in the Trove classifiers:

python-unidiff/setup.py

Lines 45 to 54 in 3353080

classifiers=[
'Intended Audience :: Developers',
'Development Status :: 4 - Beta',
"Programming Language :: Python :: 2",
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
],

)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should be ok dropping the Python 2 support. Will do the paperwork (so feel free to not keep things py2 compatible). And right, it seems we need to update the metadata. Thanks 👍

@matiasb
Copy link
Owner

matiasb commented Nov 1, 2021

Closing in favor of the split PRs.

@matiasb matiasb closed this Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants