Skip to content

Commit

Permalink
Add Pre-commit Copyright Check (#181)
Browse files Browse the repository at this point in the history
* add auto copyright scripts

Signed-off-by: YanxuanLiu <[email protected]>

* add license head check

Signed-off-by: YanxuanLiu <[email protected]>

* change json path

Signed-off-by: YanxuanLiu <[email protected]>

* fix path

Signed-off-by: YanxuanLiu <[email protected]>

* change license config to regex

Signed-off-by: YanxuanLiu <[email protected]>

* add slash

Signed-off-by: YanxuanLiu <[email protected]>

* change back to license tt

Signed-off-by: YanxuanLiu <[email protected]>

* change license path

Signed-off-by: YanxuanLiu <[email protected]>

* file name

Signed-off-by: YanxuanLiu <[email protected]>

* change license content

Signed-off-by: YanxuanLiu <[email protected]>

* change license content

Signed-off-by: YanxuanLiu <[email protected]>

* add exclude for md

Signed-off-by: YanxuanLiu <[email protected]>

* change include and exclude

Signed-off-by: YanxuanLiu <[email protected]>

* exclude all md

Signed-off-by: YanxuanLiu <[email protected]>

* add python script

Signed-off-by: YanxuanLiu <[email protected]>

* add more excludes

Signed-off-by: YanxuanLiu <[email protected]>

---------

Signed-off-by: YanxuanLiu <[email protected]>
  • Loading branch information
YanxuanLiu authored Feb 1, 2024
1 parent e3cfd1c commit df7fa26
Show file tree
Hide file tree
Showing 5 changed files with 211 additions and 0 deletions.
67 changes: 67 additions & 0 deletions .github/workflows/license-check/license-check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/usr/bin/env python

# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""A license header check
The tool checks if all files in the repo contain the license header.
NOTE: this script is for github actions only, you should not use it anywhere else.
"""
import os
import sys
import glob
from argparse import ArgumentParser

# Configs of the license header check
includes = ['**/*']
excludes = ['**/*.md', 'NOTICE', 'TPC EULA.txt', '**/*.patch']
LICENSE_PATTERN = "Licensed under the Apache License"

def filterFiles(repo_path):
files = []
files_excluded = []
os.chdir(repo_path)
for pattern in includes:
files.extend([ f for f in glob.glob(repo_path + '/' + pattern, recursive=True) if os.path.isfile(f)])
for pattern in excludes:
files_excluded.extend([ f for f in glob.glob(repo_path + '/' + pattern, recursive=True) if os.path.isfile(f)])
return list(set(files) - set(files_excluded)), list(set(files_excluded))

def checkLicenseHeader(files):
no_license_files = []
for file in files:
with open(file, 'r') as f:
print("Checking file: {}".format(file))
content = f.read()
if LICENSE_PATTERN not in content:
no_license_files.append(file)
return no_license_files

if __name__ == '__main__':
try:
repo_path = '.'
files, files_excluded = filterFiles(repo_path)
no_license_files = checkLicenseHeader(files)
warning_message = ""
for file in files_excluded:
warning_message += "WARNING: {} is excluded from this check.\n".format(file)
print(warning_message)
if no_license_files:
error_message = ""
for file in no_license_files:
error_message += "ERROR: {} does not contain license header.\n".format(file)
raise Exception(error_message)
except Exception as e:
print(e)
sys.exit(1)
33 changes: 33 additions & 0 deletions .github/workflows/license-header-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# A workflow to check license header of files

name: Check License Headers

on:
pull_request:
types: [opened, synchronize, reopened]

jobs:
check-headers:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2

- name: license-check job
run: |
set -x
python3 .github/workflows/license-check/license-check.py
23 changes: 23 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

repos:
- repo: local
hooks:
- id: auto-copyrighter
name: Update copyright year
entry: scripts/auto-copyrighter.sh
language: script
pass_filenames: true
verbose: true
47 changes: 47 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,5 +131,52 @@ By making a contribution to this project, I certify that:
this project or the open source license(s) involved.
```

### Pre-commit hooks

We provide a basic config `.pre-commit-config.yaml` for [pre-commit](https://pre-commit.com/) to
automate some aspects of the development process. As a convenience you can enable automatic
copyright year updates by following the installation instructions on the
[pre-commit homepage](https://pre-commit.com/).

To this end, first install `pre-commit` itself using the method most suitable for your development
environment. Then you will need to run `pre-commit install` to enable it in your local git
repository. Using `--allow-missing-config` will make it easy to work with older branches
that do not have `.pre-commit-config.yaml`.

```bash
pre-commit install --allow-missing-config
```

and setting the environment variable:

```bash
export SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=ON
```
The default value of `SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER` is `OFF`.

When automatic copyright updater is enabled and you modify a file with a prior
year in the copyright header it will be updated on `git commit` to the current year automatically.
However, this will abort the [commit process](https://github.com/pre-commit/pre-commit/issues/532)
with the following error message:
```
Update copyright year....................................................Failed
- hook id: auto-copyrighter
- duration: 0.01s
- files were modified by this hook
```
You can confirm that the update has actually happened by either inspecting its effect with
`git diff` first or simply re-executing `git commit` right away. The second time no file
modification should be triggered by the copyright year update hook and the commit should succeed.

There is a known issue for macOS users if they use the default version of `sed`. The copyright update
script may fail and generate an unexpected file named `source-file-E`. As a workaround, please
install GNU sed

```bash
brew install gnu-sed
# and add to PATH to make it as default sed for your shell
export PATH="/usr/local/opt/gnu-sed/libexec/gnubin:$PATH"
```

## Attribution
Portions adopted from https://github.com/NVIDIA/spark-rapids/blob/main/CONTRIBUTING.md
41 changes: 41 additions & 0 deletions scripts/auto-copyrighter.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/bin/bash

# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=${SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER:-OFF}

case "$SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER" in

OFF)
echo "Copyright updater is DISABLED. Automatic Copyright Updater can be enabled/disabled by setting \
SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=ON or SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=OFF, \
correspondingly"
exit 0
;;

ON)
;;

*)
echo "Invalid value of SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=$SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER.
Only ON or OFF are allowed"
exit 1
;;
esac

set -x
echo "$@" | xargs -L1 sed -i -E \
"s/Copyright *\(c\) *([0-9,-]+)*-([0-9]{4}), *NVIDIA *CORPORATION/Copyright (c) \\1-`date +%Y`, NVIDIA CORPORATION/; /`date +%Y`/! s/Copyright *\(c\) ([0-9]{4}), *NVIDIA *CORPORATION/Copyright (c) \\1-`date +%Y`, NVIDIA CORPORATION/"

0 comments on commit df7fa26

Please sign in to comment.