Skip to content

Commit

Permalink
Merge pull request #442 from foundation-model-stack/v2.4.0-rc2
Browse files Browse the repository at this point in the history
chore(release): merge set of changes for v2.4.0
  • Loading branch information
willmj authored Jan 16, 2025
2 parents 3ec30a0 + 75a5ff6 commit 76bd76d
Show file tree
Hide file tree
Showing 17 changed files with 459 additions and 72 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/image.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@ jobs:
sudo swapoff -a
sudo rm -f /swapfile
sudo apt clean
docker rmi $(docker image ls -aq)
if [ "$(docker image ls -q)" ]; then docker rmi $(docker image ls -aq); fi
df -h
- name: Build image
run: |
docker build -t fms-hf-tuning:dev . -f build/Dockerfile
27 changes: 23 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# FMS HF Tuning

- [Installation](#installation)
- [Data format](#data-format)
- [Data format support](#data-support)
- [Supported Models](#supported-models)
- [Training](#training)
- [Single GPU](#single-gpu)
Expand Down Expand Up @@ -62,13 +62,13 @@ pip install fms-hf-tuning[aim]
For more details on how to enable and use the trackers, Please see, [the experiment tracking section below](#experiment-tracking).

## Data Support
Users can pass training data in a single file using the `--training_data_path` argument along with other arguments required for various [use cases](#use-cases-supported-with-training_data_path-argument) (see details below) and the file can be in any of the [supported formats](#supported-data-formats). Alternatively, you can use our powerful [data preprocessing backend](./docs/advanced-data-preprocessing.md) to preprocess datasets on the fly.
Users can pass training data as either a single file or a Hugging Face dataset ID using the `--training_data_path` argument along with other arguments required for various [use cases](#use-cases-supported-with-training_data_path-argument) (see details below). If user choose to pass a file, it can be in any of the [supported formats](#supported-data-formats). Alternatively, you can use our powerful [data preprocessing backend](./docs/advanced-data-preprocessing.md) to preprocess datasets on the fly.


Below, we mention the list of supported data usecases via `--training_data_path` argument. For details of our advanced data preprocessing see more details in [Advanced Data Preprocessing](./docs/advanced-data-preprocessing.md).

## Supported Data Formats
We support the following data formats via `--training_data_path` argument
We support the following file formats via `--training_data_path` argument

Data Format | Tested Support
------------|---------------
Expand All @@ -77,6 +77,8 @@ JSONL | ✅
PARQUET | ✅
ARROW | ✅

As iterated above, we also support passing a HF dataset ID directly via `--training_data_path` argument.

## Use cases supported with `training_data_path` argument

### 1. Data formats with a single sequence and a specified response_template to use for masking on completion.
Expand Down Expand Up @@ -198,6 +200,10 @@ For advanced data preprocessing support including mixing and custom preprocessin
Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
Granite PowerLM 3B | GraniteForCausalLM | ✅* | ✅* | ✅* |
Granite 3.1 1B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 3B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 8B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.0 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.0 8B | GraniteForCausalLM | ✅* | ✅* | ✔️ |
GraniteMoE 1B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
Expand All @@ -217,7 +223,7 @@ Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
Mistral-7b | Mistral | ✅ | ✅ | ✅ |  
Mistral large | Mistral | 🚫 | 🚫 | 🚫 |

(*) - Supported with `fms-hf-tuning` v2.0.1 or later
(*) - Supported with `fms-hf-tuning` v2.4.0 or later.

(**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.

Expand Down Expand Up @@ -742,6 +748,8 @@ The list of configurations for various `fms_acceleration` plugins:
- [attention_and_distributed_packing](./tuning/config/acceleration_configs/attention_and_distributed_packing.py):
- `--padding_free`: technique to process multiple examples in single batch without adding padding tokens that waste compute.
- `--multipack`: technique for *multi-gpu training* to balance out number of tokens processed in each device, to minimize waiting time.
- [fast_moe_config](./tuning/config/acceleration_configs/fast_moe.py) (experimental):
- `--fast_moe`: trains MoE models in parallel, increasing throughput and decreasing memory usage.

Notes:
* `quantized_lora_config` requires that it be used along with LoRA tuning technique. See [LoRA tuning section](https://github.com/foundation-model-stack/fms-hf-tuning/tree/main?tab=readme-ov-file#lora-tuning-example) on the LoRA parameters to pass.
Expand All @@ -760,6 +768,17 @@ Notes:
* Notes on Multipack
- works only for *multi-gpu*.
- currently only includes the version of *multipack* optimized for linear attention implementations like *flash-attn*.
* Notes on Fast MoE
- `--fast_moe` is an integer value that configures the amount of expert parallel sharding (ep_degree).
- `world_size` must be divisible by the `ep_degree`
- Running fast moe modifies the state dict of the model, and must be post-processed using [checkpoint utils](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/accelerated-moe/src/fms_acceleration_moe/utils/checkpoint_utils.py) to run inference (HF, vLLM, etc.).
- The typical usecase for this script is to run:
```
python -m fms_acceleration_moe.utils.checkpoint_utils \
<checkpoint file> \
<output file> \
<original model>
```
Note: To pass the above flags via a JSON config, each of the flags expects the value to be a mixed type list, so the values must be a list. For example:
```json
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ dev = ["wheel>=0.42.0,<1.0", "packaging>=23.2,<25", "ninja>=1.11.1.1,<2.0", "sci
flash-attn = ["flash-attn>=2.5.3,<3.0"]
aim = ["aim>=3.19.0,<4.0"]
mlflow = ["mlflow"]
fms-accel = ["fms-acceleration>=0.1"]
fms-accel = ["fms-acceleration>=0.6"]
gptq-dev = ["auto_gptq>0.4.2", "optimum>=1.15.0"]


Expand Down
8 changes: 8 additions & 0 deletions tests/acceleration/test_acceleration_dataclasses.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
MultiPack,
PaddingFree,
)
from tuning.config.acceleration_configs.fast_moe import FastMoe, FastMoeConfig
from tuning.config.acceleration_configs.fused_ops_and_kernels import (
FastKernelsConfig,
FusedLoraConfig,
Expand Down Expand Up @@ -88,6 +89,13 @@ def test_dataclass_parse_successfully():
)
assert isinstance(cfg.multipack, MultiPack)

# 5. Specifing "--fast_moe" will parse an FastMoe class
parser = transformers.HfArgumentParser(dataclass_types=FastMoeConfig)
(cfg,) = parser.parse_args_into_dataclasses(
["--fast_moe", "1"],
)
assert isinstance(cfg.fast_moe, FastMoe)


def test_two_dataclasses_parse_successfully_together():
"""Ensure that the two dataclasses can parse arguments successfully
Expand Down
160 changes: 157 additions & 3 deletions tests/acceleration/test_acceleration_framework.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
MultiPack,
PaddingFree,
)
from tuning.config.acceleration_configs.fast_moe import FastMoe, FastMoeConfig
from tuning.config.acceleration_configs.fused_ops_and_kernels import (
FastKernelsConfig,
FusedLoraConfig,
Expand All @@ -56,7 +57,8 @@
# for some reason the CI will raise an import error if we try to import
# these from tests.artifacts.testdata
TWITTER_COMPLAINTS_JSON_FORMAT = os.path.join(
os.path.dirname(__file__), "../artifacts/testdata/twitter_complaints_json.json"
os.path.dirname(__file__),
"../artifacts/testdata/json/twitter_complaints_small.json",
)
TWITTER_COMPLAINTS_TOKENIZED = os.path.join(
os.path.dirname(__file__),
Expand Down Expand Up @@ -87,6 +89,10 @@
# Third Party
from fms_acceleration_aadp import PaddingFreeAccelerationPlugin

if is_fms_accelerate_available(plugins="moe"):
# Third Party
from fms_acceleration_moe import ScatterMoEAccelerationPlugin


# There are more extensive unit tests in the
# https://github.com/foundation-model-stack/fms-acceleration
Expand Down Expand Up @@ -360,7 +366,7 @@ def test_framework_raises_due_to_invalid_arguments(
acceleration_configs_map,
ids=["bitsandbytes", "auto_gptq"],
)
def test_framework_intialized_properly_peft(
def test_framework_initialized_properly_peft(
quantized_lora_config, model_name_or_path, mock_and_spy
):
"""Ensure that specifying a properly configured acceleration dataclass
Expand Down Expand Up @@ -412,7 +418,7 @@ def test_framework_intialized_properly_peft(
"and foak plugins"
),
)
def test_framework_intialized_properly_foak():
def test_framework_initialized_properly_foak():
"""Ensure that specifying a properly configured acceleration dataclass
properly activates the framework plugin and runs the train sucessfully.
"""
Expand Down Expand Up @@ -477,6 +483,60 @@ def test_framework_intialized_properly_foak():
assert spy2["get_ready_for_train_calls"] == 1


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="moe"),
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
)
def test_framework_initialized_properly_moe():
"""Ensure that specifying a properly configured acceleration dataclass
properly activates the framework plugin and runs the train sucessfully.
"""

with tempfile.TemporaryDirectory() as tempdir:

model_args = copy.deepcopy(MODEL_ARGS)
model_args.model_name_or_path = "Isotonic/TinyMixtral-4x248M-MoE"
model_args.torch_dtype = torch.bfloat16
train_args = copy.deepcopy(TRAIN_ARGS)
train_args.output_dir = tempdir
train_args.save_strategy = "no"
train_args.bf16 = True
data_args = copy.deepcopy(DATA_ARGS)
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
data_args.response_template = "\n\n### Label:"
data_args.dataset_text_field = "output"

# initialize a config
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=1))

# create mocked plugin class for spying
MockedPlugin1, spy = create_mock_plugin_class_and_spy(
"FastMoeMock", ScatterMoEAccelerationPlugin
)

# 1. mock a plugin class
# 2. register the mocked plugins
# 3. call sft_trainer.train
with build_framework_and_maybe_instantiate(
[
(["training.moe.scattermoe"], MockedPlugin1),
],
instantiate=False,
):
with instantiate_model_patcher():
sft_trainer.train(
model_args,
data_args,
train_args,
fast_moe_config=moe_config,
)

# spy inside the train to ensure that the ilab plugin is called
assert spy["model_loader_calls"] == 1
assert spy["augmentation_calls"] == 0
assert spy["get_ready_for_train_calls"] == 1


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="aadp"),
reason="Only runs if fms-accelerate is installed along with \
Expand Down Expand Up @@ -661,6 +721,100 @@ def test_error_raised_with_fused_lora_enabled_without_quantized_argument():
)


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="moe"),
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
)
def test_error_raised_with_undividable_fastmoe_argument():
"""
Ensure error is thrown when `--fast_moe` is passed and world_size
is not divisible by ep_degree
"""
with pytest.raises(
AssertionError, match="world size \\(1\\) not divisible by ep_size \\(3\\)"
):
with tempfile.TemporaryDirectory() as tempdir:

model_args = copy.deepcopy(MODEL_ARGS)
model_args.model_name_or_path = "Isotonic/TinyMixtral-4x248M-MoE"
model_args.torch_dtype = torch.bfloat16
train_args = copy.deepcopy(TRAIN_ARGS)
train_args.output_dir = tempdir
train_args.save_strategy = "no"
train_args.bf16 = True
data_args = copy.deepcopy(DATA_ARGS)
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
data_args.response_template = "\n\n### Label:"
data_args.dataset_text_field = "output"

# initialize a config
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=3))

# 1. mock a plugin class
# 2. register the mocked plugins
# 3. call sft_trainer.train
with build_framework_and_maybe_instantiate(
[
(["training.moe.scattermoe"], ScatterMoEAccelerationPlugin),
],
instantiate=False,
):
with instantiate_model_patcher():
sft_trainer.train(
model_args,
data_args,
train_args,
fast_moe_config=moe_config,
)


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="moe"),
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
)
def test_error_raised_fast_moe_with_non_moe_model():
"""
Ensure error is thrown when `--fast_moe` is passed and model is not MoE
"""
with pytest.raises(
AttributeError,
match="'LlamaConfig' object has no attribute 'num_local_experts'",
):
with tempfile.TemporaryDirectory() as tempdir:

model_args = copy.deepcopy(MODEL_ARGS)
model_args.model_name_or_path = "TinyLlama/TinyLlama-1.1B-Chat-v0.3"
model_args.torch_dtype = torch.bfloat16
train_args = copy.deepcopy(TRAIN_ARGS)
train_args.output_dir = tempdir
train_args.save_strategy = "no"
train_args.bf16 = True
data_args = copy.deepcopy(DATA_ARGS)
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
data_args.response_template = "\n\n### Label:"
data_args.dataset_text_field = "output"

# initialize a config
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=1))

# 1. mock a plugin class
# 2. register the mocked plugins
# 3. call sft_trainer.train
with build_framework_and_maybe_instantiate(
[
(["training.moe.scattermoe"], ScatterMoEAccelerationPlugin),
],
instantiate=False,
):
with instantiate_model_patcher():
sft_trainer.train(
model_args,
data_args,
train_args,
fast_moe_config=moe_config,
)


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="foak"),
reason="Only runs if fms-accelerate is installed along with \
Expand Down
Loading

0 comments on commit 76bd76d

Please sign in to comment.