Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CYT 645 Fill in RTD site getting started, usage, configuration, and plugin page TODOs #82

Merged
merged 11 commits into from
Nov 27, 2023
203 changes: 199 additions & 4 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,213 @@

## Build configuration file

TODO: add details about making a config file
A configuration file contains the information about the sample to gather information from. Example JSON configuration files can be found in the examples folder of this repository.

**extractPaths**: (required) the absolute path or relative path from location of current working directory that `surfactant` is being run from to the sample folders, cannot be a file (Note that even on Windows, Unix style `/` directory separators should be used in paths)\
**archive**: (optional) the full path, including file name, of the zip, exe installer, or other archive file that the folders in **extractPaths** were extracted from. This is used to collect metadata about the overall sample and will be added as a "Contains" relationship to all software entries found in the various **extractPaths**\
**installPrefix**: (optional) where the files in **extractPaths** would be if installed correctly on an actual system i.e. "C:/", "C:/Program Files/", etc (Note that even on Windows, Unix style `/` directory separators should be used in the path). If not given then the **extractPaths** will be used as the install paths

## Example configuration files

Lets say you have a .tar.gz file that you want to run surfactant on. For this example, we will be using the HELICS release .tar.gz example. In this scenario, the absolute path for this file is /home/samples/helics.tar.gz. Upon extracting this file, we get a helics folder with 4 sub-folders: bin, include, lib64, and share.

### Example 1: Simple Configuration File

TODO: Add details
If we want to include only the folders that contain binary files to analyze, our most basic configuration would be:

```json
[
{
"extractPaths": ["/home/samples/helics/bin", "/home/samples/helics/lib64"]
}
]
```

The resulting SBOM would be structured like this:

```json
{
"software": [
{
"UUID": "abc1",
"fileName": ["helics_binary"],
"installPath": ["/home/samples/helics/bin/helics_binary"],
"containerPath": null
},
{
"UUID": "abc2",
"fileName": ["lib1.so"],
"installPath": ["/home/samples/helics/lib64/lib1.so"],
"containerPath": null
}
],
"relationships": [
{
"xUUID": "abc1",
"yUUID": "abc2",
"relationship": "Uses"
}
]
}
```

### Example 2: Detailed Configuration File

TODO: Add details
A more detailed configuration file might look like the example below. The resulting SBOM would have a software entry for the helics.tar.gz with a "Contains" relationship to all binaries found to in the extractPaths. Providing the install prefix of `/` and an extractPaths as `/home/samples/helics` will allow to surfactant correctly assign the install paths in the SBOM for binaries in the subfolders as `/bin` and `/lib64`.

```json
[
{
"archive": "/home/samples/helics.tar.gz",
"extractPaths": ["/home/samples/helics"],
"installPrefix": "/"
}
]
```

The resulting SBOM would be structured like this:

```json
{
"software": [
{
"UUID": "abc0",
"fileName": ["helics.tar.gz"],
"installPath": null,
"containerPath": null
},
{
"UUID": "abc1",
"fileName": ["helics_binary"],
"installPath": ["/bin/helics_binary"],
"containerPath": ["abc0/bin/helics_binary"]
},
{
"UUID": "abc2",
"fileName": ["lib1.so"],
"installPath": ["/lib64/lib1.so"],
"containerPath": ["abc0/lib64/lib1.so"]
}
],
"relationships": [
{
"xUUID": "abc0",
"yUUID": "abc1",
"relationship": "Contains"
},
{
"xUUID": "abc0",
"yUUID": "abc2",
"relationship": "Contains"
},
{
"xUUID": "abc1",
"yUUID": "abc2",
"relationship": "Uses"
}
]
}
```

### Example 3: Adding Related Binaries

TODO: Add details
If our sample helics tar.gz file came with a related tar.gz file to install a plugin extension module (extracted into a helics_plugin folder that contains bin and lib64 subfolders), we could add that into the configuration file as well:

```json
[
{
"archive": "/home/samples/helics.tar.gz",
"extractPaths": ["/home/samples/helics"],
"installPrefix": "/"
},
{
"archive": "/home/samples/helics_plugin.tar.gz",
"extractPaths": ["/home/samples/helics_plugin"],
"installPrefix": "/"
}
]
```

The resulting SBOM would be structured like this:

```json
{
"software": [
{
"UUID": "abc0",
"fileName": ["helics.tar.gz"],
"installPath": null,
"containerPath": null
},
{
"UUID": "abc1",
"fileName": ["helics_binary"],
"installPath": ["/bin/helics_binary"],
"containerPath": ["abc0/bin/helics_binary"]
},
{
"UUID": "abc2",
"fileName": ["lib1.so"],
"installPath": ["/lib64/lib1.so"],
"containerPath": ["abc0/lib64/lib1.so"]
},
{
"UUID": "abc3",
"fileName": ["helics_plugin.tar.gz"],
"installPath": null,
"containerPath": null
},
{
"UUID": "abc4",
"fileName": ["helics_plugin"],
"installPath": ["/bin/helics_plugin"],
"containerPath": ["abc3/bin/helics_plugin"]
},
{
"UUID": "abc5",
"fileName": ["lib_plugin.so"],
"installPath": ["/lib64/lib_plugin.so"],
"containerPath": ["abc3/lib64/lib_plugin.so"]
}
],
"relationships": [
{
"xUUID": "abc1",
"yUUID": "abc2",
"relationship": "Uses"
},
{
"xUUID": "abc4",
"yUUID": "abc5",
"relationship": "Uses"
},
{
"xUUID": "abc5",
"yUUID": "abc2",
"relationship": "Uses"
},
{
"xUUID": "abc0",
"yUUID": "abc1",
"relationship": "Contains"
},
{
"xUUID": "abc0",
"yUUID": "abc2",
"relationship": "Contains"
},
{
"xUUID": "abc3",
"yUUID": "abc4",
"relationship": "Contains"
},
{
"xUUID": "abc3",
"yUUID": "abc5",
"relationship": "Contains"
}
]
}
```

NOTE: These examples have been simplified to show differences in output based on configuration.
48 changes: 45 additions & 3 deletions docs/getstarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,60 @@

## Installation

TODO: Installation steps
### For Users:

1. Create a virtual environment with python >= 3.8 [Optional, but recommended]

```bash
python -m venv cytrics_venv
source cytrics_venv/bin/activate
```

2. Install Surfactant with pip

```bash
pip install surfactant
```

### For Developers:

1. Create a virtual environment with python >= 3.8 [Optional, but recommended]

```bash
python -m venv cytrics_venv
source cytrics_venv/bin/activate
```

2. Clone sbom-surfactant

```bash
git clone [email protected]:LLNL/Surfactant.git
```

3. Create an editable surfactant install (changes to code will take effect immediately):

```bash
pip install -e .
```

To install optional dependencies required for running pytest and pre-commit:

```bash
pip install -e ".[test,dev]"
```

## Understanding the SBOM Output

### Software

TODO: Section information
This section contains a list of entries relating to each piece of software found in the sample. Metadata including file size, vendor, version, etc are included in this section along with a uuid to uniquely identify the software entry.

### Relationships

TODO: Section information
This section contains information on how each of the software entries in the previous section are linked.

**Uses**: this relationship type means that x software uses y software i.e. y is a helper module to x\
**Contains**: this relationship type means that x software contains y software (often x software is an installer or archive such as a zip file)
### Observations

TODO: Section information
42 changes: 38 additions & 4 deletions docs/plugins.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,51 @@
# Plugins

TODO: About the plugin system
The surfactant plugin system uses the [pluggy](https://pluggy.readthedocs.io/en/stable) module. This module is used by projects such as pytest and tox for their plugin systems; installing and writing plugins for surfactant is a similar to using plugins for those projects. Most of the core surfactant functionality is also implemented as plugins (see [surfactant/output](https://github.com/LLNL/Surfactant/tree/main/surfactant/output), [surfactant/infoextractors](https://github.com/LLNL/Surfactant/tree/main/surfactant/infoextractors), [surfactant/filetypeid](https://github.com/LLNL/Surfactant/tree/main/surfactant/filetypeid), and [surfactant/relationships](https://github.com/LLNL/Surfactant/tree/main/surfactant/relationships)).

## Creating a Plugin

### Step 1: Write Plugin

TODO: Function implementation instructions
In order to create a plugin, you will need to write your implementation for one or more of the functions in the [hookspec.py](https://github.com/LLNL/Surfactant/tree/main/surfactant/plugin/hookspecs.py) file. Which functions you implement will depend on the goals of your plugin.

#### Brief overview of functions
[identify_file_type](https://github.com/LLNL/Surfactant/tree/main/surfactant/plugin/hookspecs.py#L15)
- Return a string representation of the type of file passed in

[extract_file_info](https://github.com/LLNL/Surfactant/tree/main/surfactant/plugin/hookspecs.py#L29)
- Determine how file info is supposed to be extracted

[establish_relationships](https://github.com/LLNL/Surfactant/tree/main/surfactant/plugin/hookspecs.py#L47)
- Determines how to establish relationships between the software/metadata that has been passed to it

[write_sbom](https://github.com/LLNL/Surfactant/tree/main/surfactant/plugin/hookspecs.py#L70)
- Determine what format to write the SBOM to file

[read_sbom](https://github.com/LLNL/Surfactant/tree/main/surfactant/plugin/hookspecs.py#L80)
- If reading from input SBOMs, specifies what format the input SBOMs are

### Step 2. Write .toml File

TODO: Plugin metadata details
Once you have written your plugin, you will need to write a pyproject.toml file. Include any relevant project metadata/dependencies for your plugin, as well as an entry-point specification (example below) to make the plugin discoverable by surfactant. Once you write your .toml file, you can `pip install .` your plugin.
More information on entry points can be found [here](https://setuptools.pypa.io/en/latest/userguide/entry_point.html#entry-points-syntax)

#### Example

TODO: Example .toml files
#### sampleplugin.py
```python
import surfactant.plugin
from surfactant.sbomtypes import SBOM

@surfactant.plugin.hookimpl
def write_sbom(sbom: SBOM, outfile) -> None:
outfile.write(sbom.to_json(indent=10))
```
#### pyproject.toml
```toml
... generic pyproject info ...
[project.entry-points."surfactant"]
sampleplugin = "sampleplugin"
```
From the same folder as your sampleplugin files, run `pip install .` to install your plugin and surfactant will automatically load and use the plugin.

Another example can be found in the [surfactantplugin-checksec.py](https://github.com/LLNL/Surfactant/tree/main/surfactantplugin-checksec.py) folder. There you can see the [pyproject.toml](https://github.com/LLNL/Surfactant/tree/main/surfactantplugin-checksec.py/pyproject.toml) file with the `[project.entry-points."surfactant"]` entry. In the [surfactantplugin_checksec.py](https://github.com/LLNL/Surfactant/tree/main/surfactantplugin-checksec.py/surfactantplugin_checksec.py) file, you can identify the hooked functions with the `@surfactant.plugin.hookimpl` hook.
24 changes: 21 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,30 @@

## Identify Sample File

TODO: Information about downloadable files to test on
In order to test out surfactant, you will need a sample file/folder. If you don't have one on hand, you can download and use the portable .zip file from <https://github.com/ShareX/ShareX/releases> or the Linux .tar.gz file from <https://github.com/GMLC-TDC/HELICS/releases>.

## Running Surfactant

TODO: List options and commands
```bash
$ surfactant generate [OPTIONS] CONFIG_FILE SBOM_OUTFILE [INPUT_SBOM]
```

**CONFIG_FILE**: (required) the config file created earlier that contains the information on the sample\
**SBOM OUTPUT**: (required) the desired name of the output file\
**INPUT_SBOM**: (optional) a base sbom, should be used with care as relationships could be messed up when files are installed on different systems\
**--skip_gather**: (optional) skips the gathering of information on files and adding software entires\
**--skip_relationships**: (optional) skips the adding of relationships based on metadata\
**--skip_install_path**: (optional) skips including an install path for the files discovered. This may cause "Uses" relationships to also not be generated\
**--recorded_institution**: (optional) the name of the institution collecting the SBOM data (default: LLNL)\
**--output_format**: (optional) changes the output format for the SBOM (given as full module name of a surfactant plugin implementing the `write_sbom` hook)\
**--input_format**: (optional) specifies the format of the input SBOM if one is being used (default: cytrics) (given as full module name of a surfactant plugin implementing the `read_sbom` hook)\
**--help**: (optional) show the help message and exit


## Merging SBOMs

TODO: Instructions on how to merge
A folder containing multiple separate SBOM JSON files can be combined using merge_sbom.py with a command such the one below that gets a list of files using ls, and then uses xargs to pass the resulting list of files to merge_sbom.py as arguments.

`ls -d ~/Folder_With_SBOMs/Surfactant-* | xargs -d '\n' python3.8 merge_sbom.py --config_file=merge_config.json --sbom_outfile combined_sbom.json`

If the config file option is given, a top-level system entry will be created that all other software entries are tied to (directly or indirectly based on other relationships). Specifying an empty UUID will make a random UUID get generated for the new system entry, otherwise it will use the one provided.