adds Granite Liger Kernel option for Granite3.y models #430

JamesKunstle · 2025-03-21T00:02:46Z

Support for Granite was added in LK v0.5.4, so we can add it as an additional performance option.

Signed-off-by: James Kunstle <[email protected]>

JamesKunstle · 2025-03-21T01:07:13Z

Current CI failure seems unrelated to liger_kernel addition

RobotSail · 2025-03-22T04:52:14Z

Need to test but this could be very useful if it works correctly

JamesKunstle · 2025-03-22T05:04:57Z

Currently correctness test for the kernels themselves are done in-tree in the liger-kernels repo. They assure correct convergence and logit equivalence after training. Link to test here

I've also validated training-dynamics equivalence given identical batches. Link to Jira issue here.

There's a roughly 1% raw improvement with batches being identical. This is expected- the real benefits will come from larger possible batch sizes! This kernel set is also missing the most important kernel: LigerFusedLinearCrossEntropy which skips logit materialization, saving a lot of net memory headroom. We'll add this in the future.

I've given the PSAP team a ref to this PR so they can quantify the improved memory headroom and find new max_batch_lens

RobotSail

@JamesKunstle Thanks for adding this, a few comments

RobotSail · 2025-03-22T05:04:29Z

src/instructlab/training/main_ds.py

+    # Third Party
+    from liger_kernel.transformers import apply_liger_kernel_to_granite
+except ImportError:
+    apply_liger_kernel_to_granite = lambda *args, **kwargs: None  # pylint: disable=C3001


@JamesKunstle We'll want to support other models beyond Granite, and it seems like Liger's API has a number of different functions they expose for common architectures (mistral, llama, etc.).

It seems like they provide a way to map directly from the model_type field on the model into the Liger kernel to apply. Could we please use that here to support more models?

https://github.com/linkedin/Liger-Kernel/blob/293bf7eec7043c8c34b3cd82975c97e4c2f4254f/src/liger_kernel/transformers/monkey_patch.py#L1058C1-L1058C29

++ certainly, only limiting to Granite to get PSAP numbers then will expand the integration to other supported architectures.

RobotSail · 2025-03-22T05:05:44Z

src/instructlab/training/main_ds.py

@@ -974,6 +995,9 @@ def run_training(torch_args: TorchrunArgs, train_args: TrainingArgs) -> None:
            "The last checkpoint will be saved as 'last_epoch'."
        ),
    )
+
+    # this will work for granite-3.y models but not granite-7b because that's a Llama 2 model arch.
+    parser.add_argument("--enable-granite-liger-kernel", action="store_true")


If we're providing this as an arg through the CLI, we should also expose this in the TrainingArgs config so the SDK can be consistent.

++ agreed, will do once we've got the Granite numbers from PSAP

mergify bot added ci-failure dependencies Pull requests that update a dependency file labels Mar 21, 2025

JamesKunstle force-pushed the jkunstle/granite-liger-kernel branch 2 times, most recently from 1cd4e63 to 2b1ad16 Compare March 21, 2025 00:17

mergify bot added ci-failure and removed ci-failure labels Mar 21, 2025

JamesKunstle force-pushed the jkunstle/granite-liger-kernel branch from 2b1ad16 to e6ee97a Compare March 21, 2025 00:34

adds Granite Liger Kernel option for Granite3.y models

89aeca5

Signed-off-by: James Kunstle <[email protected]>

JamesKunstle force-pushed the jkunstle/granite-liger-kernel branch from e6ee97a to 89aeca5 Compare March 21, 2025 00:36

mergify bot added ci-failure and removed ci-failure labels Mar 21, 2025

RobotSail reviewed Mar 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds Granite Liger Kernel option for Granite3.y models #430

adds Granite Liger Kernel option for Granite3.y models #430

JamesKunstle commented Mar 21, 2025

JamesKunstle commented Mar 21, 2025

RobotSail commented Mar 22, 2025

JamesKunstle commented Mar 22, 2025 •

edited

Loading

RobotSail left a comment

RobotSail Mar 22, 2025

JamesKunstle Mar 22, 2025

RobotSail Mar 22, 2025

JamesKunstle Mar 22, 2025

adds Granite Liger Kernel option for Granite3.y models #430

Are you sure you want to change the base?

adds Granite Liger Kernel option for Granite3.y models #430

Conversation

JamesKunstle commented Mar 21, 2025

JamesKunstle commented Mar 21, 2025

RobotSail commented Mar 22, 2025

JamesKunstle commented Mar 22, 2025 • edited Loading

RobotSail left a comment

Choose a reason for hiding this comment

RobotSail Mar 22, 2025

Choose a reason for hiding this comment

JamesKunstle Mar 22, 2025

Choose a reason for hiding this comment

RobotSail Mar 22, 2025

Choose a reason for hiding this comment

JamesKunstle Mar 22, 2025

Choose a reason for hiding this comment

JamesKunstle commented Mar 22, 2025 •

edited

Loading