Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pre-commit Copyright Check #181

Merged
merged 16 commits into from
Feb 1, 2024
67 changes: 67 additions & 0 deletions .github/workflows/license-check/license-check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/usr/bin/env python

# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""A license header check
The tool checks if all files in the repo contain the license header.
NOTE: this script is for github actions only, you should not use it anywhere else.
"""
import os
import sys
import glob
from argparse import ArgumentParser

# Configs of the license header check
includes = ['**/*']
excludes = ['**/*.md', 'NOTICE', 'TPC EULA.txt', '**/*.patch']
LICENSE_PATTERN = "Licensed under the Apache License"

def filterFiles(repo_path):
files = []
files_excluded = []
os.chdir(repo_path)
for pattern in includes:
files.extend([ f for f in glob.glob(repo_path + '/' + pattern, recursive=True) if os.path.isfile(f)])
for pattern in excludes:
files_excluded.extend([ f for f in glob.glob(repo_path + '/' + pattern, recursive=True) if os.path.isfile(f)])
return list(set(files) - set(files_excluded)), list(set(files_excluded))

def checkLicenseHeader(files):
no_license_files = []
for file in files:
with open(file, 'r') as f:
print("Checking file: {}".format(file))
content = f.read()
if LICENSE_PATTERN not in content:
no_license_files.append(file)
return no_license_files

if __name__ == '__main__':
try:
repo_path = '.'
files, files_excluded = filterFiles(repo_path)
no_license_files = checkLicenseHeader(files)
warning_message = ""
for file in files_excluded:
warning_message += "WARNING: {} is excluded from this check.\n".format(file)
print(warning_message)
if no_license_files:
error_message = ""
for file in no_license_files:
error_message += "ERROR: {} does not contain license header.\n".format(file)
raise Exception(error_message)
except Exception as e:
print(e)
sys.exit(1)
33 changes: 33 additions & 0 deletions .github/workflows/license-header-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# A workflow to check license header of files

name: Check License Headers

on:
pull_request:
types: [opened, synchronize, reopened]

jobs:
check-headers:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2

- name: license-check job
run: |
set -x
python3 .github/workflows/license-check/license-check.py
23 changes: 23 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

repos:
- repo: local
hooks:
- id: auto-copyrighter
name: Update copyright year
entry: scripts/auto-copyrighter.sh
language: script
pass_filenames: true
verbose: true
47 changes: 47 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,5 +131,52 @@ By making a contribution to this project, I certify that:
this project or the open source license(s) involved.
```

### Pre-commit hooks

We provide a basic config `.pre-commit-config.yaml` for [pre-commit](https://pre-commit.com/) to
automate some aspects of the development process. As a convenience you can enable automatic
copyright year updates by following the installation instructions on the
[pre-commit homepage](https://pre-commit.com/).

To this end, first install `pre-commit` itself using the method most suitable for your development
environment. Then you will need to run `pre-commit install` to enable it in your local git
repository. Using `--allow-missing-config` will make it easy to work with older branches
that do not have `.pre-commit-config.yaml`.

```bash
pre-commit install --allow-missing-config
```

and setting the environment variable:

```bash
export SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=ON
```
The default value of `SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER` is `OFF`.

When automatic copyright updater is enabled and you modify a file with a prior
year in the copyright header it will be updated on `git commit` to the current year automatically.
However, this will abort the [commit process](https://github.com/pre-commit/pre-commit/issues/532)
with the following error message:
```
Update copyright year....................................................Failed
- hook id: auto-copyrighter
- duration: 0.01s
- files were modified by this hook
```
You can confirm that the update has actually happened by either inspecting its effect with
`git diff` first or simply re-executing `git commit` right away. The second time no file
modification should be triggered by the copyright year update hook and the commit should succeed.

There is a known issue for macOS users if they use the default version of `sed`. The copyright update
script may fail and generate an unexpected file named `source-file-E`. As a workaround, please
install GNU sed

```bash
brew install gnu-sed
# and add to PATH to make it as default sed for your shell
export PATH="/usr/local/opt/gnu-sed/libexec/gnubin:$PATH"
```

## Attribution
Portions adopted from https://github.com/NVIDIA/spark-rapids/blob/main/CONTRIBUTING.md
41 changes: 41 additions & 0 deletions scripts/auto-copyrighter.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/bin/bash

# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=${SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER:-OFF}

case "$SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER" in

OFF)
echo "Copyright updater is DISABLED. Automatic Copyright Updater can be enabled/disabled by setting \
SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=ON or SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=OFF, \
correspondingly"
exit 0
;;

ON)
;;

*)
echo "Invalid value of SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER=$SPARK_RAPIDS_BENCHMARKS_AUTO_COPYRIGHTER.
Only ON or OFF are allowed"
exit 1
;;
esac

set -x
echo "$@" | xargs -L1 sed -i -E \
"s/Copyright *\(c\) *([0-9,-]+)*-([0-9]{4}), *NVIDIA *CORPORATION/Copyright (c) \\1-`date +%Y`, NVIDIA CORPORATION/; /`date +%Y`/! s/Copyright *\(c\) ([0-9]{4}), *NVIDIA *CORPORATION/Copyright (c) \\1-`date +%Y`, NVIDIA CORPORATION/"
Loading