Skip to content

Commit

Permalink
ready for review
Browse files Browse the repository at this point in the history
  • Loading branch information
LuminaScript committed Jul 22, 2024
1 parent 3f02a06 commit 5cd060c
Show file tree
Hide file tree
Showing 15 changed files with 1,823 additions and 0 deletions.
90 changes: 90 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Change-impact-analysis Tool

The Change Impact Analysis Tool generates a comprehensive visual report detailing changes in both header files and source code between two Linux versions (tags in the Linux kernel repository: old_tag and new_tag). This tool helps developers view updates from the old version.

The diff report includes a subset of files from the Linux repository that are included in building the kernel, contributing to a focused and detailed report on the compile-time source code in Linux.

## Table of Content

- [How to Use](#how-to-use)
- [Files Generated](#files-generated)
- [Structure of the Tool](#structure-of-the-tool)
- [I. Compilation File List Generation](#i-compilation-file-list-generation)
- [II. Git Diff Report Generation](#ii-git-diff-report-generation)
- [III. Commit Metadata Retrieval](#iii-commit-metadata-retrieval)
- [IV. Web Script Generation](#iv-web-script-generation)

## How to use

To utilize this tool in your Linux environment (compatible with Ubuntu and Debian), follow these steps:

Clone the repository:

```bash
git clone <repository_url>
```

Navigate to the cloned repository:

```bash
cd <repository_directory>
```

Execute the tool by specifying the old and new tags:

```bash
./run_tool.sh <old_tag> <new_tag>
```

## Files Generated

**/build_data:**

- `sourcefile.txt` - List of all built source code files
- `headerfile.txt` - List of all built dependency files
- `git_diff_sourcefile.txt` - Git diff report for source code files
- `git_diff_headerfile.txt` - Git diff report for dependency files
- `tokenize_header.json` - Metadata for commit git diff for dependency files
- `tokenize_source.json` - Metadata for commit git diff for source files

**/web_source_codes:**

- `index.html` - Click on to view the result

## Structure of the Tool

The tool operates through a structured process to generate a comprehensive change impact analysis report. Here's a detailed breakdown of its operation:

### I. Compilation File List Generation

#### Header File

During linux kernel compilation, `Makefile.build` calls `$K/scripts/basic/fixdep.c` to generate a .cmd file for each source that collects dependency information during compilation.

This tool incorporates a modification that applies a patch (`patch.file`) to `scripts/basic/fixdep.c`, enabling it to output dependency information into a **list of header files** when building the kernel.

#### Source code

This tool leverages the `$K/scripts/clang-tools/gen_compile_commands.py` script to generate a `compile_commands.json` file. This file documents all source files involved in the compilation process. The `gen_compile_commands.py` script traverses each `.cmd` file to aggregate the list of source files.

Then, the tool invokes `parse_json.py` to parse `compile_commands.json`, generating **a list of source files**.

### II. Git Diff Report Generation

Using the file lists, the tool generates 2 separate git diff reports (dependency diff report & source diff report) for updates from **old_tag** to **new_tag**.

### III. Commit Metadata Retrieval

Based on the git diff reports, the tool retrieves commit metadata for each newly added line in the reports.

- **Tokenization**: If multiple commits modify a single line between two tags, the tool breaks down each commit line into smaller parts and associates commit information with relevant tokens. The results after tokenization are stored in JSON files.

### IV. Web Script Generation

Using the git diff reports and metadata stored in JSON files, the tool generates a web report displaying the changes.

The web report contains three source html:

- `index.html`: with on-click directions to:
- `sourcecode.html`: renders the content in source diff report, with embedded url and on-hover metadata box for each newly added lines/tokens in new_tag.
- `header.html`: renders teh content in dependency diff report, with embedded url and on-hover metadata box for each newly added lines/tokens in new_tag.
162 changes: 162 additions & 0 deletions build_scripts/build_collect_diff.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
#!/bin/bash
#
# Script to build the kernel, collect compiled file lists using modified kernel scripts,
# and generate a git diff report based on the collected lists.

set -e

DEFAULT_TAG1="v6.9"
DEFAULT_TAG2="v6.10"
TAG1="${TAG1_ENV:-$DEFAULT_TAG1}"
TAG2="${TAG2_ENV:-$DEFAULT_TAG2}"

# check and install gcc-11 if not already installed
install_package_safe() {
if ! command -v gcc-11 &> /dev/null; then
sudo apt update
sudo apt install gcc-11
else
echo "GCC-11 is already installed."
fi
if ! command -v libssl-dev &> /dev/null; then
sudo apt-get update
sudo apt-get install -y libssl-dev
else
echo "libssl-dev is already installed."
fi
}

# safely apply a patch to linux kernel
apply_patch() {
# shellcheck disable=SC2154
local patch_path="$root_path/scripts/change-impact-tools/fixdep-patch.file"

# Stash any changes if there is any
if ! git diff --quiet; then
git stash
fi

# Abort `git am` only if there is a patch being applied
if git am --show-current-patch &> /dev/null; then
git am --abort
fi
echo "path check: $(pwd)"
git apply < "$patch_path"
echo "applied the git patch"
echo "path check: $(pwd)"
}

# parse the JSON file
parse_source_json_file() {
local python_path="$root_path/scripts/change-impact-tools/build_scripts/parse_json.py"
# shellcheck disable=SC2154
local cloned_repo_name="/$clone_dir/"
local input_path="$root_path/scripts/clang-tools/compile_commands.json"
local output_path="$root_path/scripts/change-impact-tools/build_data/sourcefile.txt"

display_file_head "$root_path/scripts/clang-tools" "compile_commands.json" 3
python3 "$python_path" "$cloned_repo_name" "$input_path" "$output_path"
display_file_head "$root_path/scripts/change-impact-tools/build_data" "sourcefile.txt" 3
}

# generate the build file list after building the kernel
generate_compiled_file_lists() {
# Generate compiled source files list
local json_output_path="$root_path/scripts/clang-tools/compile_commands.json"
echo "path check: $(pwd)"
python3 scripts/clang-tools/gen_compile_commands.py -o "$json_output_path"

parse_source_json_file
echo "source compiled filelist generated to sourcefile.txt"

# Generate compiled header files list

local output_list="$root_path/scripts/change-impact-tools/build_data/headerfile.txt"
local output_json="$root_path/scripts/change-impact-tools/build_data/source_dep.json"
local dep_path="dependency_file.txt"
local python_tool_path="$root_path/scripts/change-impact-tools/build_scripts/parse_dep_list.py"

python3 "$python_tool_path" "$dep_path" "$output_json" "$output_list"
echo "dependency compiled filelist generated to headerfile.txt$"

}

# clean up the working directory
cleanup_working_directory() {
git reset --hard
git clean -fdx
}

# generate diff for build between TAG1 and TAG2
generate_git_diff() {

# collect and setup input & output file
file_type=${1:-source}
local root_build_data_path="$root_path/scripts/change-impact-tools/build_data"
local diff_input="$root_build_data_path/sourcefile.txt"
local diff_output="$root_build_data_path/filtered_diff_source.txt"

if [ "$file_type" = "header" ]; then
echo "[generate_git_diff] Generating dependency git diff report ..."
diff_input="$root_build_data_path/headerfile.txt"
diff_output="$root_build_data_path/filtered_diff_header.txt"
else
echo "[generate_git_diff] Generating source git diff report ..."
fi

while IFS= read -r file
do
if git show "$TAG2:$file" &> /dev/null; then
local diff_result
diff_result=$(git diff "$TAG1" "$TAG2" -- "$file")
if [[ -n "$diff_result" ]]; then
{
echo "Diff for $file"
echo "$diff_result"
echo ""
} >> "$diff_output"

fi
fi
done < "$diff_input"
echo "[generate_git_diff] Git diff report for $file_type files save to compiled_data"

}


if [ $# -eq 2 ]; then
TAG1="$1"
TAG2="$2"
fi

# Fetch tags from the repository
git fetch --tags
echo "Generating source file list for $TAG1"
git checkout "$TAG1"
echo "starting to run make olddefconfig"
make olddefconfig
echo "finished make olddefconfig"


# Preparation before running make
apply_patch
install_package_safe

# Build linux kernel
echo "the current os-linux version: "
cat /etc/os-release

echo "start running make"
make HOSTCC=gcc-11 CC=gcc-11
echo "finished compile kernel using gcc 11"


# Collect build metadata
generate_compiled_file_lists

# Generate git diff report
generate_git_diff source
generate_git_diff header

# Clean up the working directory
cleanup_working_directory
20 changes: 20 additions & 0 deletions build_scripts/git_shortlog.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash
#
# Fetch name email information for linux kernel contributors

DEFAULT_TAG="v6.10"
TAG="${1:-$DEFAULT_TAG}"
git checkout "$TAG"

echo "Starting to generate the email name list ..."
# shellcheck disable=SC2154
git shortlog -e -s -n HEAD > "$curr_dir"/build_data/name_list.txt

# shellcheck disable=SC2154
if [ -s "$curr_dir"/build_data/name_list.txt ]; then
echo "build_data/name_list.txt created successfully"
else
echo "build_data/name_list.txt is empty or not created"
fi

echo "Finished generating name list"
71 changes: 71 additions & 0 deletions build_scripts/parse_dep_list.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
"""
The script parses the dependency list generated by patching `fixdep.c`.
This script takes three arguments:
1. The path of dependency list
2. The output path for a json file
3. The output path for the list of header files.
Usage:
parse_json.py <dep_list_path> <output_json_path>
<output_header_file_list_path>
"""
import re
import argparse
import json

# Regular expression patterns
source_file_pattern = re.compile(r'^source file := (.+)$')

# Function to parse the input data


def parse_dependencies(dep_list_file, output_json, output_dep_list):
"""Parse dependency file generated by 'fixdep.c'."""
dependencies = []
dep_set = set()
current_source_file = None

for line in dep_list_file:
line = line.strip()
if not line:
continue

source_match = source_file_pattern.match(line)
if source_match:
current_source_file = source_match.group(1)
dependencies.append({
'source_file': current_source_file,
'dependency_files': []
})
else:
dependencies[-1]['dependency_files'].append(line)
dep_set.add(line)

# Write dependency list to output file
with open(output_dep_list, 'w', encoding='utf-8') as output_list_file:
for header_file in dep_set:
output_list_file.write(header_file + '\n')

# Dump dependencies into JSON file
with open(output_json, 'w', encoding='utf-8') as json_file:
json.dump(dependencies, json_file, indent=4)


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Process dependency list generated while compiling kernel.")
parser.add_argument('input_file', type=str,
help="Path to input dependency file")
parser.add_argument('output_json', type=str,
help="Path to output JSON file")
parser.add_argument('output_header_list', type=str,
help="Path to output dependency list file")

args = parser.parse_args()

with open(args.input_file, 'r', encoding='utf-8') as input_file:
parse_dependencies(input_file, args.output_json,
args.output_header_list)

print("Dependency parsing complete.")
Loading

0 comments on commit 5cd060c

Please sign in to comment.