Skip to content

Commit

Permalink
Update default model (#15)
Browse files Browse the repository at this point in the history
this PR updates the default model to
https://huggingface.co/BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
which is a better generalist model than the
[previous](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary)

additionally:

- minor bug fixes
- code cleanup
- readme update

---------
  • Loading branch information
pszemraj authored Nov 2, 2024
1 parent 82bafca commit 8ecf04b
Show file tree
Hide file tree
Showing 7 changed files with 121 additions and 127 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file. Dates are d

Generated by [`auto-changelog`](https://github.com/CookPete/auto-changelog).

#### [v0.2.1](https://github.com/pszemraj/textsum/compare/v0.2.0...v0.2.1)

> 18 February 2024
- Batch processing [`#13`](https://github.com/pszemraj/textsum/pull/13)

#### [v0.2.0](https://github.com/pszemraj/textsum/compare/v0.1.5...v0.2.0)

> 8 July 2023
Expand Down
90 changes: 48 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,3 @@
<!-- These are examples of badges you might want to add to your README:
please update the URLs accordingly
[![Built Status](https://api.cirrus-ci.com/github/<USER>/textsum.svg?branch=main)](https://cirrus-ci.com/github/<USER>/textsum)
[![ReadTheDocs](https://readthedocs.org/projects/textsum/badge/?version=latest)](https://textsum.readthedocs.io/en/stable/)
[![Coveralls](https://img.shields.io/coveralls/github/<USER>/textsum/main.svg)](https://coveralls.io/r/<USER>/textsum)
[![PyPI-Server](https://img.shields.io/pypi/v/textsum.svg)](https://pypi.org/project/textsum/)
[![Conda-Forge](https://img.shields.io/conda/vn/conda-forge/textsum.svg)](https://anaconda.org/conda-forge/textsum)
[![Monthly Downloads](https://pepy.tech/badge/textsum/month)](https://pepy.tech/project/textsum)
[![Twitter](https://img.shields.io/twitter/url/http/shields.io.svg?style=social&label=Twitter)](https://twitter.com/textsum)
-->

# textsum

<a href="https://colab.research.google.com/gist/pszemraj/ff8a8486dc3303199fe9c9790a606fff/textsum-summarize-text-files-example.ipynb">
Expand All @@ -23,7 +11,8 @@
This package provides easy-to-use interfaces for using summarization models on text documents of arbitrary length. Currently implemented interfaces include a python API, CLI, and a shareable demo app.

For details, explanations, and docs, see the [wiki](https://github.com/pszemraj/textsum/wiki)
> [!TIP]
> For additional details, explanations, and docs, see the [wiki](https://github.com/pszemraj/textsum/wiki)
---

Expand Down Expand Up @@ -98,15 +87,16 @@ pip install -e .[all]

The package also supports a number of optional extra features, which can be installed as follows:

- `8bit`: Install with `pip install -e .[8bit]`
- `optimum`: Install with `pip install -e .[optimum]`
- `PDF`: Install with `pip install -e .[PDF]`
- `app`: Install with `pip install -e .[app]`
- `unidecode`: Install with `pip install -e .[unidecode]`
- `8bit`: Install with `pip install -e "textsum[8bit]"`
- `optimum`: Install with `pip install -e "textsum[optimum]"`
- `PDF`: Install with `pip install -e "textsum[PDF]"`
- `app`: Install with `pip install -e "textsum[app]"`
- `unidecode`: Install with `pip install -e "textsum[unidecode]"`

Read below for more details on how to use these features.
Replace `textsum` in the command with `.` if installing from source. Read below for more details on how to use these features.

> _Note:_ The `unidecode` extra is a GPL-licensed dependency that is not included by default with the `clean-text` python package. While it can be used for text cleaning pre-summarization, it generally should not make a significant difference in most use cases.
> [!TIP]
> The `unidecode` extra is a GPL-licensed dependency not included by default with the `clean-text` package. Installing it should improve the cleaning of noisy input text, but it should not make a significant difference in most use cases.
## Usage

Expand Down Expand Up @@ -141,13 +131,13 @@ print(f'summary saved to {out_path}')

### CLI

To summarize a directory of text files, run the following command:
To summarize a directory of text files, run the following command in your terminal:

```bash
textsum-dir /path/to/dir
```

A full list:
There are many CLI flags available. A full list:


<details>
Expand Down Expand Up @@ -187,28 +177,27 @@ Some useful options are:

Arguments:

- `input_dir`: The directory containing the input text files to be summarized.
- `--model`: model name or path to use for summarization. (Optional)
- `--shuffle`: Shuffle the input files before processing. (Optional)
- `--skip_completed`: Skip already completed files in the output directory. (Optional)
- `--batch_length`: The maximum length of each input batch. Default is 4096. (Optional)
- `--output_dir`: The directory to write the summarized output files. Default is `./summarized/`. (Optional)

For more information, run the following:
To see all available options, run the following command:

```bash
textsum-dir --help
```

### Demo App

For convenience, a UI demo[^1] is provided using [gradio](https://gradio.app/). To ensure you have the dependencies installed, clone the repo and run the following command:
For convenience, a UI demo[^1] is provided using [gradio](https://gradio.app/). To ensure you have the dependencies installed, run the following command:

```bash
pip install textsum[app]
```

To run the demo, run the following command:
To launch the demo, run:

```bash
textsum-ui
Expand All @@ -220,7 +209,7 @@ This will start a local server that you can access in your browser & a shareable

## Models

Summarization is a memory-intensive task, and the [default model is relatively small and efficient](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) for long-form text summarization. If you want to use a bigger model, you can specify the `model_name_or_path` argument when instantiating the `Summarizer` class.
Summarization is a memory-intensive task, and the [default model is relatively small and efficient](https://huggingface.co/BEE-spoke-data/pegasus-x-base-synthsumm_open-16k) for long-form text summarization. If you want to use a different model, you can specify the `model_name_or_path` argument when instantiating the `Summarizer` class.

```python
summarizer = Summarizer(model_name_or_path='pszemraj/long-t5-tglobal-xl-16384-book-summary')
Expand All @@ -240,70 +229,87 @@ Any [text-to-text](https://huggingface.co/models?filter=text2text) or [summariza

### Parameters

Memory usage can also be reduced by adjusting the parameters for inference. This is discussed in detail in the [project wiki](https://github.com/pszemraj/textsum/wiki).
Memory usage can also be reduced by adjusting the [parameters for inference](https://huggingface.co/docs/transformers/generation_strategies#beam-search-decoding). This is discussed in detail in the [project wiki](https://github.com/pszemraj/textsum/wiki).

tl;dr for this README: use the `summarizer.set_inference_params()` and `summarizer.get_inference_params()` methods to adjust the parameters for inference from either a python `dict` or a JSON file.
> [!IMPORTANT]
> tl;dr: use the `summarizer.set_inference_params()` and `summarizer.get_inference_params()` methods to adjust the inference parameters, passing either a python `dict` or a JSON file.
Support for `GenerationConfig` as the primary method to adjust inference parameters is planned for a future release.

### 8-bit Quantization & TensorFloat32

Some methods of reducing memory usage _if you have compatible hardware_ include loading the model in 8-bit precision via [LLM.int8](https://arxiv.org/abs/2208.07339) and using the `--tf32` flag to use TensorFloat32 precision. See the [transformers docs](https://huggingface.co/docs/transformers/perf_infer_gpu_one#efficient-inference-on-a-single-gpu) for more details on how this works. Using LLM.int8 requires the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) package, which can either be installed directly or via the `textsum[8bit]` extra:
Some methods of efficient inference[^2] include loading the model in 8-bit precision via [LLM.int8](https://arxiv.org/abs/2208.07339) (_reduces memory usage_) and enabling TensorFloat32 precision in the torch backend (_reduces latency_). See the [transformers docs](https://huggingface.co/docs/transformers/perf_infer_gpu_one#efficient-inference-on-a-single-gpu) for more details. Using LLM.int8 requires the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) package, which can either be installed directly or via the `textsum[8bit]` extra:

[^2]: if you have compatible hardware. In general, ampere (RTX 30XX) and newer GPUs are recommended.

```bash
pip install textsum[8bit]
```

To use these options, use the `-8bit` and `--tf32` flags when using the CLI:
To use these options, use the `--load_in_8bit` and `--tf32` flags when using the CLI:

```bash
textsum-dir /path/to/dir -8bit --tf32
textsum-dir /path/to/dir --load_in_8bit --tf32
```

Or in python, using the `load_in_8bit` argument:
Or in Python, using the `load_in_8bit` argument:

```python
summarizer = Summarizer(load_in_8bit=True)
```

If using the python API, it's better to initiate tf32 yourself; see [here](https://huggingface.co/docs/transformers/perf_train_gpu_one#tf32) for how.
If using the Python API, either [manually activate tf32](https://huggingface.co/docs/transformers/perf_train_gpu_one#tf32) or use the `check_ampere_gpu()` function from `textsum.utils` **before initializing the `Summarizer` class**:

```python
from textsum.utils import check_ampere_gpu
check_ampere_gpu() # automatically enables TF32 if Ampere+ available
summarizer = Summarizer(load_in_8bit=True)
```

### Using Optimum ONNX Runtime

> ⚠️ **Note:** This feature is experimental and might not work as expected. Use at your own risk. ⚠️🧪
> [!CAUTION]
> This feature is experimental and might not work as expected. Use at your own risk. ⚠️🧪
ONNX Runtime is a performance-focused inference engine for ONNX models. It can be used to enhance the speed of model predictions, especially on Windows and in environments where GPU acceleration is not available. If you want to use ONNX runtime for inference, you need to set `optimum_onnx=True` when initializing the `Summarizer` class.
ONNX Runtime is a performance-oriented inference engine for ONNX models. It can be used to increase the speed of model inference, especially on Windows and in environments where GPU acceleration is not available. If you want to use ONNX runtime for inference, you need to set `optimum_onnx=True` when initializing the `Summarizer` class.

First, install with `pip install textsum[optimum]`. Then, you can use the following code to initialize the `Summarizer` class with ONNX runtime:
First, install with `pip install textsum[optimum]`. Then initialize the `Summarizer` class with ONNX runtime:

```python
summarizer = Summarizer(optimum_onnx=True)
summarizer = Summarizer(model_name_or_Path="onnx-compatible-model-name", optimum_onnx=True)
```

It will automatically convert the model if it has not been converted to ONNX yet.

**Notes:**

1. ONNX runtime+cuda needs an additional package. Manually install `onnxruntime-gpu` if you plan to use ONNX with GPU.
2. Using ONNX runtime might lead to different behavior in certain models. It is recommended to test the model with and without ONNX runtime **the same input text** before using it for anything important.

### Force Cache

By default, the summarization model uses past computations to speed up decoding. If you want to force the model to always use cache irrespective of the model's default behavior, you can set `force_cache=True` when initializing the `Summarizer` class.
> [!CAUTION]
> Setting `force_cache=True` might lead to different behavior in certain models. Test the model with and without `force_cache` on **the same input text** before using it for anything important.
Using the cache speeds up autoregressive generation by avoiding recomputing attention for tokens that have already been generated. If you want to force the model to always use cache irrespective of the model's default behavior[^3], you can set `force_cache=True` when initializing the `Summarizer` class.

[^3]: `use_cache` can sometimes be disabled due to things like gradient accumulation training, etc., and if not re-enabled will result in slower inference times.

```python
summarizer = Summarizer(force_cache=True)
```

**Note:** Setting `force_cache=True` might lead to different behavior in certain models.

### Compile Model

By default, the model isn't compiled for efficient inference. If you want to compile the model for faster inference times, you can set `compile_model=True` when initializing the `Summarizer` class.
If you want to [compile the model](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) for faster inference times, you can set `compile_model=True` when initializing the `Summarizer` class.

```python
summarizer = Summarizer(compile_model=True)
```

**Note:** Compiling the model might not be supported on all platforms and requires pytorch > 2.0.0.
> [!NOTE]
> Compiling the model might not be supported on all platforms and requires pytorch > 2.0.0.
---

Expand Down
4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ install_requires =
fire
natsort
nltk
torch
torch>=2.0.0
tqdm
transformers>=4.26.0
transformers>=4.46.0

[options.packages.find]
where = src
Expand Down
1 change: 1 addition & 0 deletions src/textsum/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
textsum - a package for summarizing text
"""

import sys

if sys.version_info[:2] >= (3, 8):
Expand Down
8 changes: 4 additions & 4 deletions src/textsum/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@

import textsum
from textsum.summarize import Summarizer
from textsum.utils import enable_tf32, setup_logging
from textsum.utils import check_ampere_gpu, setup_logging


def main(
input_dir: str,
output_dir: Optional[str] = None,
model: str = "pszemraj/long-t5-tglobal-base-16384-book-summary",
model: str = "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k",
no_cuda: bool = False,
tf32: bool = False,
force_cache: bool = False,
Expand Down Expand Up @@ -53,7 +53,7 @@ def main(
Args:
input_dir (str, required): The directory containing the input files.
output_dir (str, optional): Directory to write the output files. If None, writes to input_dir/summarized.
model (str, optional): The name of the model to use for summarization. Default: "pszemraj/long-t5-tglobal-base-16384-book-summary".
model (str, optional): The name of the model to use for summarization. Default: "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k".
no_cuda (bool, optional): Flag to not use cuda if available. Default: False.
tf32 (bool, optional): Enable tf32 data type for computation (requires ampere series GPU or newer). Default: False.
force_cache (bool, optional): Force the use_cache flag to True in the Summarizer. Default: False.
Expand Down Expand Up @@ -98,7 +98,7 @@ def main(
}

if tf32:
enable_tf32() # enable tf32 for computation
check_ampere_gpu() # enable tf32 for computation

summarizer = Summarizer(
model_name_or_path=model,
Expand Down
31 changes: 18 additions & 13 deletions src/textsum/summarize.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

class Summarizer:
"""
Summarizer - utility class for summarizing long text using a pretrained model
Summarizer - utility class for summarizing long text using a pretrained text2text model
"""

settable_inference_params = [
Expand All @@ -44,10 +44,10 @@ class Summarizer:

def __init__(
self,
model_name_or_path: str = "pszemraj/long-t5-tglobal-base-16384-book-summary",
model_name_or_path: str = "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k",
use_cuda: bool = True,
is_general_attention_model: bool = True,
token_batch_length: int = 2048,
token_batch_length: int = 4096,
batch_stride: int = 16,
max_length_ratio: float = 0.25,
load_in_8bit: bool = False,
Expand All @@ -60,17 +60,17 @@ def __init__(
f"""
__init__ - initialize the Summarizer class
:param str model_name_or_path: the name or path of the model to load, defaults to "pszemraj/long-t5-tglobal-base-16384-book-summary"
:param bool use_cuda: whether to use cuda, defaults to True
:param str model_name_or_path: name or path of the model to load, defaults to "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k"
:param bool use_cuda: whether to use cuda if available, defaults to True
:param bool is_general_attention_model: whether the model is a general attention model, defaults to True
:param int token_batch_length: the amount of tokens to process in a batch, defaults to 2048
:param int token_batch_length: number of tokens to split the text into for batch summaries, defaults to 4096
:param int batch_stride: the amount of tokens to stride the batch by, defaults to 16
:param float max_length_ratio: the ratio of the token_batch_length to use as the max_length for the model, defaults to 0.25
:param bool load_in_8bit: whether to load the model in 8bit precision (LLM.int8), defaults to False
:param bool compile_model: whether to compile the model (pytorch 2.0+ only), defaults to False
:param bool optimum_onnx: whether to load the model in ONNX Runtime, defaults to False
:param bool force_cache: whether to force the model to use cache, defaults to False
:param bool disable_progress_bar: whether to disable the progress bar, defaults to False
:param float max_length_ratio: ratio of the token_batch_length to calculate max_length (of outputs), defaults to 0.25
:param bool load_in_8bit: load the model in 8bit precision (LLM.int8), defaults to False
:param bool compile_model: compile the model (pytorch 2.0+ only), defaults to False
:param bool optimum_onnx: load the model in ONNX Runtime, defaults to False
:param bool force_cache: force the model to use cache in generation, defaults to False
:param bool disable_progress_bar: disable the per-document progress bar, defaults to False
:param kwargs: additional keyword arguments to pass to the model as inference parameters, any of: {self.settable_inference_params}
"""
self.logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -113,6 +113,10 @@ def __init__(
provider=provider,
export=not Path(self.model_name_or_path).is_dir(),
) # if a directory, already exported
self.logger.warning(
"ONNXruntime support is experimental, and functionality may vary per-model. "
"Model outputs should be checked for accuracy"
)
else:
self.model = AutoModelForSeq2SeqLM.from_pretrained(
self.model_name_or_path,
Expand Down Expand Up @@ -625,8 +629,9 @@ def __call__(self, input_data, **kwargs):
# or
summary = summarizer("/path/to/textfile.txt")
"""
MAX_FILEPATH_LENGTH = 300 # est
if (
len(str(input_data)) < 1000 # assume > 1000 characters is plaintext
len(str(input_data)) < MAX_FILEPATH_LENGTH
and isinstance(input_data, (str, Path))
and Path(input_data).is_file()
):
Expand Down
Loading

0 comments on commit 8ecf04b

Please sign in to comment.