Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GH action format check files with JuliaFormatter #2074

Closed
wants to merge 2 commits into from

Conversation

Saransh-cpp
Copy link
Member

@Saransh-cpp Saransh-cpp commented Oct 2, 2022

Most of the Julia packages I have seen lack a proper style guide for their codebase. I feel JuliaFormatter coupled with a clever workflow can help us maintain a clean codebase. This PR adds a bot-like workflow that fixes the formatting issues associated with a PR and pushes the fixed files in the same branch.

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable

@Saransh-cpp Saransh-cpp closed this Oct 2, 2022
@Saransh-cpp Saransh-cpp reopened this Oct 2, 2022
@Saransh-cpp Saransh-cpp marked this pull request as draft October 2, 2022 18:20
@ToucheSir
Copy link
Member

Perhaps the title would be better with s/lint/(auto)format/? That is, unless you're planning on adding some actions for static analysis tools as well.

On formatting, we should use the same style across all FluxML repos. Just copying https://github.com/FluxML/Metalhead.jl/blob/master/.JuliaFormatter.toml should be enough, I think.

@Saransh-cpp
Copy link
Member Author

Saransh-cpp commented Oct 3, 2022

Perhaps the title would be better with s/lint/(auto)format/? That is, unless you're planning on adding some actions for static analysis tools as well.

Are you referring to the PR's title or the commit message of the bot's commit? (Should it be - autoformat: format jl files with JuliaFormatter?)

On formatting, we should use the same style across all FluxML repos. Just copying https://github.com/FluxML/Metalhead.jl/blob/master/.JuliaFormatter.toml should be enough, I think.

Thanks! I will add this in!

@Saransh-cpp Saransh-cpp marked this pull request as ready for review October 3, 2022 14:49
@Saransh-cpp
Copy link
Member Author

This works now!

Working on a PR created from the base repository - Saransh-cpp#53
Working on a PR created from a fork - Saransh-cpp#55


Note

  1. This will work once this PR is merged (cannot test it before due to security reasons, explained below).
  2. To accomplish this, the repository secrets must contain a personal access token (PAT) of an account with write access to FluxML. The token should be added to repository secrets with the name - "AUTO_LINT_PAT".

Why is PAT required?

A bot commit without a PAT does not trigger workflows. This means that the reformatted files will not go through the conventional CI, which can potentially be dangerous. Let's say a bug in JuliaFormatter rewrites a file with some broken syntax -> this broken formatting update will not be detected by the CI as the CI will not run on the bot commit.


Is this a security concern?

Since this workflow runs on pull_request_target, the PAT itself cannot be leaked. I have successfully tested this here -

Saransh-cpp#55 (see the files changed in the last commit - I tried printing out the PAT that, in a sense, is an attempt to hack FluxML's private tokens) -> https://github.com/Saransh-cpp/Flux.jl/actions/runs/3175153155/jobs/5172838261 (see how the "HACK" step was not executed at all).

This shows that GitHub will run the "Lint" workflow according to the instructions written in the base FluxML repository and not the fork repository.

Due to this reason, we cannot test this action on this PR (as pull_request_merged does not run the workflow using the instructions written in a fork). For more reference - https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

Having said that, mixing pull_request_target with an explicit PR checkout is not always vulnerable. The workflow may, for example:

  • Reformat and commit the code
  • Checkout both base and head repositories and generate a diff
  • Run grep on the checked out source.

Generally speaking, when the PR contents are treated as passive data, i.e. not in a position of influence over the build/testing process, it is safe. But the repository owners must be extra careful not to trigger any script that may operate on PR controlled contents like in the case of npm install.

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining pull_request_target workflow trigger with an explicit checkout of an untrusted PR is a dangerous practice that may lead to repository compromise.

https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

Copy link
Member

@DilumAluthge DilumAluthge Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you already quoted this portion above:

But the repository owners must be extra careful not to trigger any script that may operate on PR controlled contents like in the case of npm install.

This is the problem. Combining pull_request_target with a checkout is a huge footgun. Can it be done safely? Potentially, yes. But it is so easy to make a mistake here. All it takes is for a committer to merge a PR with an innocent looking change (e.g. changing julia to julia --project in the lint workflow), and the repository has been compromised.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation would be to not go with this approach, because it is too easy to end up compromising the repository.

Instead, if the goal is to enforce a common format across all repositories in the FluxML organization, I would recommend:

  1. Have a CI job that checks if the source code is formatted correctly, and errors if any formatting problems are detected.
  2. Use the needs construct to force all other CI jobs to wait for the successful completion of the formatting check.
  3. If the formatting check does not succeed, do not run any other CI jobs.

Copy link
Member

@DilumAluthge DilumAluthge Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in my above comment, change "potentially compromise" to "compromise". This PR already compromises the repository.

Copy link
Member Author

@Saransh-cpp Saransh-cpp Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @DilumAluthge! Good thing I cross-posted this in ci-dev. I agree, this workflow is potentially risky. If someone merges a PR that changes the workflows then Flux would be at a security risk. Leaving out the PAT completely is equally risky as that would allow the GitHub actions bot to commit anything without the CI running on it.

This PR already compromises the repository.

Could you let me know? Is it that one can write a script to leak the PAT in another workflow? I tried this in one of my PRs, but GitHub does not allow the user to access the PAT -

- name: POTENTIAL HACK
   run: echo ${{ secrets.PAT }}

Screenshot 2022-10-04 at 7 49 00 PM

@ToucheSir
Copy link
Member

After another read-through, I agree with @DilumAluthge's concerns and raise one of my own. If the bot is appending formatting commits to PR branches, how are contributors meant to continue working on their feature branches from another checkout without running into conflicts? #2074 (comment) seems like a more reasonable approach and is already used in projects like https://github.com/JuliaDiff/ChainRulesCore.jl/blob/main/.github/workflows/format.yml. To ensure code is formatted in the remote branch, we could also consider a pre-commit hook.

Are you referring to the PR's title or the commit message of the bot's commit?

Both. Your proposed rename looks fine.

@Saransh-cpp
Copy link
Member Author

A pre-commit hook would ideally solve all of the problems. I will look into how one develops that and how pre-commit.ci can support JuliaFormatter. For the time being, I think the suggestion of adding an action that fails on wrong formatting sounds best.

@Saransh-cpp
Copy link
Member Author

Saransh-cpp commented Oct 5, 2022

is already used in projects like https://github.com/JuliaDiff/ChainRulesCore.jl/blob/main/.github/workflows/format.yml.

ChainRulesCore.jl tries to do something similar but by using an existing tool - reviewdog. As expected, review dog requires a PAT too - see this CI run -

reviewdog: This GitHub token doesn't have write permission of Review API [1], 
so reviewdog will report results via logging command [2] and create annotations similar to
github-pr-check reporter as a fallback.
[1]: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request_target, 
[2]: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/development-tools-for-github-actions#logging-commands

But printing the diff in the CI is useful too. I think I will use reviewdog without the commenting thing for FluxML too. (@DilumAluthge could you confirm if it is safe to pass in PAT in the reviewdog workflow?)

@Saransh-cpp Saransh-cpp changed the title Add GH Action to lint every PR Add GH action to format files with JuliaFormatter Oct 5, 2022
@mcabbott
Copy link
Member

mcabbott commented Oct 5, 2022

FWIW the formatter on ChainRulesCore.jl seems more irritating than helpful. It clutters actual review with lots of comments telling you that [1,2,3] is wrong. Code by different authors disagreeing on minor things like that seems far down the list of style problems -- e.g. there are some easy-to-read straightforward tests, and some where it's a really clever puzzle to figure out what the test is doing.

An ideal auto-formatter that results in less time spent thinking about spacing might be nice. But we spend very little. And a half-baked one which creates more fiddling seems much worse than where we are now.

Other concerns here are:

  • If we adopt 4 spaces, has someone figured out how to preserve git blame, including on the website?
  • Doing so will mean every open PR needs a pretty big rebase. I presume these would be smaller if it's set to two spaces.
  • Note that it edits examples to e.g. write m = Upsample(; scale = (2, 3)), I'm not sure we want that as the documented thing to type in.
  • It also seems to want a pretty short line length, e.g. this is a lot of lines for a simple thing.
  • Edit: likewise many throw(...) lines. Other grips include breaking a group of one-line definitions as one was too long, also here, replacing in in (f() for _ in 1:n_alts) with = which is just preference but one is better, adding return print(io, ")") which is pretty close to actively misleading. Maybe better config could avoid these?

@ToucheSir
Copy link
Member

IMO the most important part of an autoformatter is that it enforces format(file) is the identity operation for all checked-in source files. Especially on FluxML repos, it can get quite frustrating to make a small change, then spend 2-3 minutes reverting or ignoring parts of the diff because you accidentally (or intentionally) bumped the format shortcut (git add -p helps but does not solve the problem).

If we adopt 4 spaces, has someone figured out how to preserve git blame, including on the website?

GitHub supports using .git-blame-ignore-revs for this. Metalhead uses this post FluxML/Metalhead.jl#168.

@Saransh-cpp
Copy link
Member Author

Maybe better config could avoid these?

Yes! Right now this uses the config specified in Metalhead.jl, but we can change everything according to our preferences and needs.

@Saransh-cpp
Copy link
Member Author

Saransh-cpp commented Oct 6, 2022

TODO

  • add .git-blame-ignore-revs
  • discuss formatting options

@Saransh-cpp Saransh-cpp changed the title Add GH action to format files with JuliaFormatter Add GH action format check files with JuliaFormatter Oct 15, 2022
@Saransh-cpp
Copy link
Member Author

Looks better now. Should the formatting options be discussed in the next community call? Here is the current configuration -

style = "sciml"
whitespace_in_kwargs = true
format_docstrings = true
always_for_in = true
join_lines_based_on_source = true
separate_kwargs_with_semicolon = true
always_use_return = true
margin = 92
indent = 4

@Saransh-cpp
Copy link
Member Author

separate_kwargs_with_semicolon is false now. See #2086.

@Saransh-cpp
Copy link
Member Author

Given that the PR to add pre-commit support to Julia's main repository was closed, and several developers do not like how JuliaFormatter can be noisy and unconfigurable sometimes, I think it would be best to close this PR for now. There have been some initiatives (thank you!) to support Julia on pre-commit, and I am keeping a close eye on this issue - pre-commit/pre-commit#2689 - for further updates.

Pre-commit support will make setting up a formatting workflow extremely easy and reliable. Please let me know if this PR is still required. Closing this for now :(

@ToucheSir
Copy link
Member

I think it would be better if we didn't have to rely on pre-commit (which is a heavy dependency and requires a Python interpreter). A solution which works with pure Julia would be better and should be doable, someone just has to write it.

The other blocker is the JuliaFormatter latency from the command line is really only acceptable on 1.9+. If the binary distribution story can be figured out before 1.9 becomes ubiquitous, then we can consider that route instead.

@Saransh-cpp
Copy link
Member Author

I hadn't factored in adding pre-commit as a dependency.

Trying out JuliaFormatter after 1.9 is out of Beta sounds great. I think it would be better to create a new PR when 1.9 is out. Resolving conflicts in this PR is getting more and more difficult with every update in Flux's codebase 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants