Skip to content

Commit

Permalink
Resolve dependabot alerts #3: refactor git clone implementation.
Browse files Browse the repository at this point in the history
  • Loading branch information
eli64s committed Aug 30, 2023
1 parent b63bab2 commit b013707
Show file tree
Hide file tree
Showing 5 changed files with 117 additions and 24 deletions.
51 changes: 42 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!--
## [Unreleased]
### ➕ Added
### 🛠 Changed
### 🚀 New Features and Enhancements
### 🛠 Changes
### ⚙️ Deprecated
### 🗑 Removed
### 🐛 Bug Fixes
Expand All @@ -14,6 +14,39 @@ All notable changes to the *readme-ai* project will be documented in this file.

---

## [v0.0.7] - *2023-08-30*

⚠️ Release v0.0.7 addresses a security vulnerability cloning git repositories via the *GitPython* package on Windows systems. This vulnerability could allow arbitrary command execution if code is run from a directory containing a malicious `git.exe` or `git` executable.

### 🔐 Security Fixes
#### *Arbitrary Command Execution Mitigation*

- Dependabot Alert [#3](https://github.com/eli64s/readme-ai/security/dependabot/3): GitPython untrusted search path on Windows systems leading to arbitrary code execution.
- The previous git clone implementation sets the `env` argument to the path of the git executable in the current working directory. This poses a security risk as the code is susceptible to running arbitrary `git` commands from a malicious repository.
```python
git.Repo.clone_from(repo_path, temp_dir, depth=1)
```
- Updated the `env` argument to explicitly set the absolute path of the git executable. This ensures that the git executable used to clone the repository is the one thats installed in the system path, and not the one located in the current working directory.
```python
git.Repo.clone_from(repo_path, temp_dir, depth=1, env=git_exec_path)
```
### 🚀 New Features and Enhancements

#### *Code Modularity*

- Introduced three methods to help isolate the Git executable discovery and validation logic.
- `find_git_executable()`: Determines the absolute path of the Git executable.
- `validate_git_executable()`: Validates the found Git executable path.
- `validate_file_permissions()`: Validates the file permissions of the cloned repository.

#### *File Permission Checks*

- For Unix systems, added checks to ensure the permissions of the cloned repository are set to `0o700`. This is a best practice for secure temporary directories and prevents unauthorized users from accessing the directory.

⚠️ These updates aim to mitigate the vulnerbility raised in Dependabot alert [#3](https://github.com/eli64s/readme-ai/security/dependabot/3). Users are advised to update *readme-ai* to the latest version, i.e ```pip install --upgrade readmeai```. Please be mindful of this vulnerability and use caution when cloning repositories from untrusted sources, especially for Windows users.

---

## [v0.0.6] - *2023-08-29*

### 🐛 Bug Fixes
Expand All @@ -26,7 +59,7 @@ All notable changes to the *readme-ai* project will be documented in this file.

## [v0.0.5] - *2023-07-31*

### ➕ Added
### 🚀 New Features and Enhancements

- Add [.dockerignore](./.dockerignore) file to exclude unnecessary files from the Docker image.

Expand All @@ -48,7 +81,7 @@ All notable changes to the *readme-ai* project will be documented in this file.

## [v0.0.4] - *2023-07-30*

### ➕ Added
### 🚀 New Features and Enhancements

- Publish *readme-ai* CLI to PyPI under the module name [readmeai](https://pypi.org/project/readmeai/).
- Refactored the codebase to use [Click](https://click.palletsprojects.com/en/8.1.x/), migrating from argparse.
Expand All @@ -75,7 +108,7 @@ All notable changes to the *readme-ai* project will be documented in this file.

## [v0.0.3] - *2023-06-29*

### ➕ Added
### 🚀 New Features and Enhancements

- Add [pydantic](https://pydantic-docs.helpmanual.io/) to validate the user's repository and api key inputs.
- Validation was moved from *main.py* to *conf.py*.
Expand All @@ -91,12 +124,12 @@ All notable changes to the *readme-ai* project will be documented in this file.

## [v0.0.2] - *2023-06-28*

### ➕ Added
### 🚀 New Features and Enhancements

- Add [CHANGELOG.md](./CHANGELOG.md) to track changes to the project.
- Add new directory [examples/video](./examples/video) to store mp4 videos to demonstrate the *readme-ai* tool.

### 🛠 Changed
### 🛠 Changes

- Update [Makefile](./Makefile) and [setup.sh](./setup/setup.sh) to use *poetry* for dependency management.

Expand All @@ -109,10 +142,10 @@ All notable changes to the *readme-ai* project will be documented in this file.

## [v0.0.1] - *2023-06-28*

### ➕ Added
### 🚀 New Features and Enhancements
- Initial release of *readme-ai* v0.0.1

### 🛠 Changed
### 🛠 Changes

- Refine the markdown template structure to be more readable.

Expand Down
6 changes: 3 additions & 3 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "readmeai"
version = "0.3.1"
version = "0.3.015"
description = "🚀 Generate awesome README.md files from the terminal, powered by OpenAI's GPT language model APIs 💫"
authors = ["Eli <[email protected]>"]
license = "MIT"
Expand Down
8 changes: 4 additions & 4 deletions readmeai/preprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ def __init__(
self.language_setup = language_setup
self.encoding_name = config.api.encoding

def analyze(self, root_path: str, is_remote: bool = False) -> List[Dict]:
def analyze(self, repo_path: str, is_remote: bool = False) -> List[Dict]:
"""Analyzes a local or remote git repository."""
with tempfile.TemporaryDirectory() as temp_dir:
if is_remote:
utils.clone_repository(root_path, temp_dir)
root_path = temp_dir
contents = self.generate_contents(root_path)
utils.clone_repository(repo_path, temp_dir)
repo_path = temp_dir
contents = self.generate_contents(repo_path)
contents = self.tokenize_content(contents)
contents = self.process_language_mapping(contents)
return contents
Expand Down
74 changes: 67 additions & 7 deletions readmeai/utils.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,81 @@
"""Utility methods for the readme-ai application."""

import os
import platform
import re
from pathlib import Path
from typing import List
from typing import List, Optional

import git
from tiktoken import get_encoding

from . import conf
from . import conf, logger

logger = logger.Logger(__name__)

def clone_repository(url: str, repo_path: Path) -> None:

def clone_repository(repo_path: str, temp_dir: Path) -> None:
"""Clone a repository to a temporary directory."""
git_exec_path = find_git_executable()

validate_git_executable(git_exec_path)

env = os.environ.copy()
env["GIT_PYTHON_GIT_EXECUTABLE"] = str(git_exec_path)

try:
git.Repo.clone_from(url, repo_path, depth=1)
except git.exc.GitCommandError as exc:
raise ValueError(f"Error cloning repository: {exc}") from exc
git.Repo.clone_from(repo_path, temp_dir, depth=1, env=env)
logger.info(f"Successfully cloned {repo_path} to {temp_dir}.")

except git.GitCommandError as excinfo:
raise ValueError(f"Git clone error: {excinfo}") from excinfo

except Exception as excinfo:
raise (f"Error cloning git repository: {excinfo}")

validate_file_permissions(temp_dir)


def find_git_executable() -> Optional[Path]:
"""Find the path to the git executable, if available."""

git_exec_path = os.environ.get("GIT_PYTHON_GIT_EXECUTABLE")

if git_exec_path:
return Path(git_exec_path)

# For Windows, set default known location for git executable
if platform.system() == "Windows":
default_windows_path = Path("C:\\Program Files\\Git\\cmd\\git.EXE")
if default_windows_path.exists():
return default_windows_path

# For other OS (including Linux), set executable by looking into PATH
paths = os.environ["PATH"].split(os.pathsep)
for path in paths:
git_path = Path(path) / "git"
if git_path.exists():
return git_path

return None


def validate_git_executable(git_exec_path: Optional[str]) -> None:
"""Validate the path to the git executable."""
if not git_exec_path or not Path(git_exec_path).exists():
raise ValueError(f"Git executable not found at {git_exec_path}")


def validate_file_permissions(temp_dir: Path) -> None:
"""Validates file permissions of the cloned repository."""
if platform.system() != "Windows":
if isinstance(temp_dir, str):
temp_dir = Path(temp_dir)
permissions = temp_dir.stat().st_mode & 0o777
if permissions != 0o700:
raise ValueError(
"Error: file permissions of cloned repository must be set to 0o700."
)


def get_github_file_link(file: str, user_repo_name: str) -> str:
Expand All @@ -32,7 +92,7 @@ def get_user_repository_name(url) -> str:
username, reponame = match.groups()
return f"{username}/{reponame}"
else:
return "Invalid remote git URL."
raise ("Error: invalid remote repository URL.")


def adjust_max_tokens(max_tokens: int, prompt: str, target: str = "Hello!") -> int:
Expand Down

0 comments on commit b013707

Please sign in to comment.