Skip to content

Commit

Permalink
docs: update documentation and readme with recent changes
Browse files Browse the repository at this point in the history
  • Loading branch information
davhofer committed Sep 11, 2024
1 parent 9d4d39f commit 806e7e6
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 53 deletions.
30 changes: 9 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# py2spack: Automating conversion of standard python packages to Spack package recipes

Github repository for the CSCS internship project with the goal of developing a Python tool for automatically generating Spark package recipes based on existing Python packages, with the ability to handle direct and transitive dependencies and flexible versions.
Github repository for the CSCS internship project with the goal of developing a Python tool for automatically generating Spack package recipes based on existing Python packages, with the ability to handle direct and transitive dependencies and flexible versions.

For more information, see the the [Documentation](#Documentation).
For more information, see the [Documentation](#Documentation).

## Installation

Expand Down Expand Up @@ -33,10 +33,7 @@ The package is still in development and not yet published to PyPI. It can howeve
## Usage

```
usage: py2spack [-h] [--max-conversions MAX_CONVERSIONS]
[--versions-per-package VERSIONS_PER_PACKAGE]
[--repo REPO] [--ignore [IGNORE ...]] [--testing]
package
usage: py2spack [-h] [--max-conversions MAX_CONVERSIONS] [--versions-per-package VERSIONS_PER_PACKAGE] [--repo REPO] [--allow-duplicate] package [--ignore [IGNORE ...]]
CLI for converting a python package and its dependencies to Spack.
Expand All @@ -48,16 +45,11 @@ options:
--max-conversions MAX_CONVERSIONS
Maximum number of packages that are converted
--versions-per-package VERSIONS_PER_PACKAGE
Versions per package to be downloaded and
converted
--repo REPO Name of or full path to local Spack
repository where packages should be saved
Versions per package to be downloaded and converted
--repo REPO Name of or full path to local Spack repository where packages should be saved
--ignore [IGNORE ...]
List of packages to ignore. Must be specified
last (after <package> argument) for the
command to work
--testing For testing purposes; adds the prefix 'test-'
when saving packages
List of packages to ignore. Must be specified last (after <package> argument) for the command to work
--allow-duplicate Convert the package, even if a package of the same name already exists in some Spack repo. Will NOT overwrite the existing package. Only applies to the main package to be converted, not to dependencies.
```

### Conversion from PyPI
Expand Down Expand Up @@ -95,16 +87,12 @@ You can then browse it locally, e.g.
firefox _build/html/index.html
```

## Running tests
## Testing

After installing the package, the tests can be run from the project root directory as follows:

```bash
python -m pytest
```

## Important links

- [Detailed Project Description](<CSCS Internship Project Description.md>)
- [Wiki](https://github.com/davhofer/py2spack/wiki)
- [Changelog](CHANGELOG.md)
Installation tests for converted packages are run through GitHub Actions in a Docker container, see `.github/workflows/run-installation-tests.yaml`.
19 changes: 12 additions & 7 deletions docs/implementation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

## Implementation

### Modules overview
Expand All @@ -22,6 +21,10 @@ Utilities for converting python packaging requirements, dependencies, versions,

Utilities for parsing pyproject.toml files. The code is partially adapted from [pyproject-metadata](https://github.com/pypa/pyproject-metadata) on GitHub, with customized error handling.

#### cmake_conversion.py

Utilities for parsing CMakeLists.txt files and converting specified dependencies to Spack.

#### cli.py

Makes the main `core.convert_package` method usable and configurable from the command line.
Expand All @@ -30,26 +33,28 @@ Makes the main `core.convert_package` method usable and configurable from the co

Various general utilities for file handling and downloading.

#### spack_utils.py

Various utilities for interacting with the local Spack installation.

### Main program

The main program is excecuted by the method `core.convert_package`. It performs or at least initiates all of the steps described in the workflow in the initial section of this page. Its arguments are

- `name`: The name of the package to be converted. If `name` is a GitHub repository url or a string of the form "user/repo-name", the package will be converted from GitHub instead of PyPI.
- `max_conversions`: The maximum number of packages that will be converted. If this limit is reached execution will stop, even if there still are uncoverted dependencies. Default: `10`
- `versions_per_package`: How many versions (at most) will be converted per package. Default: `10`
- `repo_path`: Path to the local Spack repository where converted packages will be stored and a lookup for already existing packages will be performed (in the future, all Spack repositories will be used for the existing packages lookup). If None is specified, py2spack will use the builtin Spack repository at `$SPACK_ROOT/var/spack/builtin/`
- `repo_path`: Path to the local Spack repository where converted packages will be stored. If None is specified, the user will be prompted to provide a path or choose a repo from the local Spack repositories.
- `ignore`: A list of packages that will not be converted
- `use_test_prefix`: Flag used for development

The method maintains a queue of packages yet to be converted, which initially just holds the package specified by the user via the CLI. In each iteration, it pops the next package name from the queue, and tries to convert it. If conversion was successful, it tries to create a new directory in the chosen Spack repository and writes the recipe to a new `package.py` file there. It then goes through all of the dependencies of the package, and checks if they are not already in the queue, not in the ignore list, do not exist in the Spack repo already, and have not been attempted to be converted but failed. If all of these are true, it adds the dependency to the queue. To check if a package `my-package` already exists in Spack, it checks if a directory called `py-my-package` exists in `spack_repository/packages/` and if it contains a `package.py` file.
- `allow_duplicate`: Boolean flag to convert the package even if a package of the same name already exists in some Spack repo. This will NOT overwrite the existing package. Only applies to the main package to be converted, not to dependencies.

> Note: Currently, only the provided, single Spack repository is checked for existing packages. We plan on changing this to check all repositories in the `repos.yaml` file, in accordance to how Spack looks for packages.
The method maintains a queue of packages yet to be converted, which initially just holds the package specified by the user via the CLI. In each iteration, it pops the next package name from the queue, and tries to convert it. If conversion was successful, it tries to create a new directory in the chosen Spack repository and writes the recipe to a new `package.py` file there. It then goes through all of the dependencies of the package, and checks if they are not already in the queue, not in the ignore list, do not exist in the Spack repo already, and have not been attempted to be converted but failed. If all of these are true, it adds the dependency to the queue. To check if a package `my-package` already exists in Spack, it uses the `spack list` which searches all local Spack repositories.

In the end, the method prints a small summary with all converted packages, packages that failed to convert, and unconverted dependencies not found in Spack. For each of the unconverted dependencies, it will also display a flag if there are dependency conflicts, errors or other information in the `package.py` file that require manual review.

### Package providers interface

In order to resolve dependencies, discover existing versions, and obtain source distributions and `pyproject.toml` files, we use the `PyProjectProvider` interface. It is defined as a Python Protocol and contains the methods `package_exists`, `get_versions`, `get_pyproject`, and `get_sdist_hash`. Any implementation must also implement these methods. Currently there are two implementations, `PyPIProvider` and `GitHubProvider`.
In order to resolve dependencies, discover existing versions, and obtain source distributions and `pyproject.toml` files, we use the `PackageProvider` interface. It is defined as a Python Protocol and contains the methods `package_exists`, `get_versions`, `get_pyproject`, `get_sdist_hash`, and `get_file_content_from_sdist`. Any implementation must also implement these methods. Currently there are two implementations, `PyPIProvider` and `GitHubProvider`.

#### PyPIProvider

Expand Down
4 changes: 2 additions & 2 deletions docs/overview.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Overview

py2spack is a tool for automatically generating Spark package recipes based on existing Python packages, with the ability to handle direct and transitive dependencies and flexible versions.
py2spack is a tool for automatically generating Spack package recipes based on existing Python packages, with the ability to handle direct and transitive dependencies and flexible versions.
Its main goal is to support users and developers in writing custom `package.py` recipes for existing packages (that could be installed via pip) and automate as much of this process as possible. Conversion for pure python packages should generally work out-of-the-box, meaning that the generated Spack package can be installed without further changes, **but it's always recommended to double-check the `package.py` files for errors or open FIXMEs.**

Conversion of python extensions, python bindings for compiled libraries, or any sort of python package that also includes compiled code like C++ is more complicated and in general not completely automatable. The objective there is to support the user as much as possible by providing hints and suggestions for external non-python dependencies, version constraints etc., but manual review of the generated `package.py` is **always required** (normal python dependencies should still be converted automatically and correctly).
Expand All @@ -17,6 +17,6 @@ Any build-backend specifying its metadata and dependency using the standard pypr
- flit/flit-core
- setuptools >= 61.0.0, with only a `pyproject.toml` file and no `setup.py` or `setup.cfg`

### Complex/compiled builds:
#### Complex/compiled builds:

- scikit-build-core: py2spack parses potential dependencies and version constraints for non-python dependencies from `CMakeLists.txt` and adds the converted dependencies to the generated package as comments. They are an approximation of the actual dependencies and serve as suggestions for the user to make conversion easier, but manual review is always required. For simple packages, uncommenting the generated suggestions can already be enough for successful conversion.
31 changes: 17 additions & 14 deletions docs/usage.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@
## Usage

```
py2spack [-h] [--max-conversions MAX_CONVERSIONS] [--versions-per-package VERSIONS_PER_PACKAGE] [--repo-path REPO_PATH] [--ignore pkg1 pkg2 ...] [--testing] package
usage: py2spack [-h] [--max-conversions MAX_CONVERSIONS] [--versions-per-package VERSIONS_PER_PACKAGE] [--repo REPO] [--allow-duplicate] package [--ignore [IGNORE ...]]
CLI for converting a python package and its dependencies to Spack.
positional arguments:
package Name of the package to be converted
options:
-h, --help show this help message and exit
--max-conversions MAX_CONVERSIONS
Maximum number of packages that are converted
--versions-per-package VERSIONS_PER_PACKAGE
Versions per package to be downloaded and converted
--repo REPO Name of or full path to local Spack repository where packages should be saved
--ignore [IGNORE ...]
List of packages to ignore. Must be specified last (after <package> argument) for the command to work
--allow-duplicate Convert the package, even if a package of the same name already exists in some Spack repo. Will NOT overwrite the existing package. Only applies to the main package to be converted, not to dependencies.
```

Positional arguments:

- `package`: Name of the package

Options:

- `-h`, `--help`: show this help message and exit
- `--max-conversions <n>`: Maximum number of packages/dependencies that are converted. Default: `10`
- `--versions-per-package <n>`: Versions per package to be downloaded and converted. Default: `10`
- `--repo-path <path>`: Path to local spack repository, converted packages will be stored here. If none is provided, py2spack will look for the default repo at `$SPACK_ROOT/var/spack/repos/builtin/` Default: `None`
- `--ignore [pkg1 pkg2 ...]`: List of packages to ignore for conversion
- `--testing`: Optional flag for testing purposes; adds the prefix 'test-' to the package name when saving it

### Conversion from PyPI

```bash
Expand Down
31 changes: 22 additions & 9 deletions docs/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,25 @@

The high level workflow or description of what happens when you convert a package with Spack, e.g. after running `py2spack my-package`, is as follows:

1. Try to find a local Spack repository (if not specified via CLI). This will be used to store converted packages and check for existing ones.
2. Check if `my-package` exists on PyPI.
1. Ask the user to provide a local Spack repository (if not specified via CLI) where converted packages will be stored.
2. Check if `my-package` exists on PyPI (or on GitHub, if it is a repository URL. We will assume it to be a PyPI package here).
3. Discover available versions and source distributions on PyPI.
4. For each version (or at most `--versions-per-package` many):
1. Download the source distribution
2. Parse the pyproject.toml file
3. Convert metadata and dependencies from the python packaging format to their Spack equivalent, where possible.
4. Depending on the build-backend, if supported, check for non-python dependencies and build steps and try to convert those **\[UNDER DEVELOPMENT\]**.
4. Depending on the build-backend, if supported, check for non-python dependencies and build steps and try to convert those.
5. Combine the dependencies of all downloaded versions and simplify the specs/constraints.
6. Check for conflicts/unsatisfiable dependency requirements.
7. If successful, save the converted package to the local Spack repository (by creating a directory `packages/py-my-package/` and writing a `package.py` file inside of it).
8. Check for dependencies of `my-package` that are not in Spack/converted/in the queue yet. Add those to the conversion queue.
9. Repeat from 2. with the next package from the queue.

9. Repeat from 2. with the next package from the queue, until no dependencies are left or `--max-conversions` is reached.

### Package conversion

This section describes how an individual python package is converted to Spack.

Conversion of a package, provided as a string `name`, is handled by the method `core._convert_single`. It first checks whether it is dealing with a GitHub package, by calling the `GitHubProvider.package_exists(name)` method. After deciding on the right provider, it uses it to obtain a list of all availabe package versions. For each version, the `pyproject.toml` contents are loaded and parsed, resulting in a `core.PyProject` object.
Conversion of a package, provided as a string `name`, is handled by the method `core._convert_single`. It first checks whether it is dealing with a GitHub package, by calling the `GitHubProvider.package_exists(name)` method. After deciding on the right provider, it uses it to obtain a list of all availabe package versions. For each version, the `pyproject.toml` contents are loaded and parsed, resulting in one `core.PyProject` object each.
This list of `PyProject`s is then passed to `core.SpackPyPkg.build_from_pyprojects`. There, first the newest version of the package/pyprojects is used for converting the general metadata. This metadata includes

- the package name
Expand Down Expand Up @@ -59,11 +59,24 @@ Most (recoverable) errors that occur at any point during the conversion process,

### Python extensions

> Work in progress...
Build processes and their specifications for such packages can be very involved and complex and converting them accurately is generally not fully automatable. Our goal is thus simply to provide the user with hints and suggestions where possible. But we still require the user to review and "fine-tune" the `package.py` file manually. The information we want to "extract" and add to the Spack package is mainly just the declared dependencies and their version- or other constraints.

We currently only support python extensions (python packages including/providing bindings for compiled code like C++) using the [scikit-build-core](https://scikit-build-core.readthedocs.io/en/latest/) backend.

> This could be extended to other very similar build backends, e.g. meson-python.
#### scikit-build-core

Since scikit-build-core is a wrapper around cmake, python packages with this backend come with at least one `CMakeLists.txt` file. In order to get the dependencies, we parse the file and look at every `cmake_minimum_required` and `find_package` statement, and convert the corresponding dependency to a Spack Spec. We also recursively search any included subdirectory (from `add_subdirectory`) for additional CMakeLists.txt files which we process identically. Note that we ignore any conditional statements that would lead to the inclusion/exclusion of dependencies under certain conditions - we assume every dependency to be included.
Finally, we display all converted dependencies in the the `package.py` as comments, letting the user choose whether they want to use, modify or delete them. To each dependency, we also add the source(s) where it was specified, i.e. the path to the `CMakeLists.txt` file and the line number. There can be multiple sources for a single dependency if it was specified multiple times. If the specified dependency versions differ between sources, they are added behind the source. The addition of this source information should make it easier for the user to check and edit the dependencies.

We are currently working on supporting python extensions (python packages including/providing bindings for compiled code like C++), starting with the [scikit-build-core](https://scikit-build-core.readthedocs.io/en/latest/) backend and pybind11.
For example, if the dependency `test` was declared two times, once in `CMakeLists.txt` as `find_package(test)` and once in `ext/CMakeLists.txt` as `find_package(test 4.2)`, the resulting section in the `package.py` file would look like this:

Build processes for such packages can be very involved and complex and are generally not fully automatable. Our goal is thus simply to provide the user with hints and suggestions where possible. But we still require them to review and "fine-tune" the `package.py` file manually. Since scikit-build-core is a wrapper around cmake, we initially just plan to try to extract any sort of version constraints, external dependencies, flags, etc. found in `CMakeLists.txt`, add them to the `package.py` as comments, and let the user implement the details.
```python
# depends_on("test")
# CMakeLists.txt, line 15
# ext/CMakeLists.txt, line 123 ([email protected])
```

### Package naming conventions

Expand Down

0 comments on commit 806e7e6

Please sign in to comment.