[JAX] Decouple Recipe and ScalingMode #1728

jberchtold-nvidia · 2025-04-29T01:01:59Z

Description

Currently the recipe and scaling mode are coupled in the TE/JAX extension. This is okay for the current recipes, such as delayed scaling, current scaling, and MXFP8 block scaling, as there is only a single scaling mode used per recipe. However, for the DeepSeek recipe this assumption no longer holds. For the DeepSeek recipe we will need 1x128 1D block scaling for inputs and 128x128 2D block scaling for weights. As a result, we need to decouple the two concepts of recipe and scaling mode.

This PR only decouples the recipe and scaling mode, it does not implement the DeepSeek recipe.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Define UsageContext which defines the context in which a quantizer will be used (e.g. is the quantizer used for x, kernel, or grad)
Add RecipeManager classes that provides recipe-specific functionality for quantization
Replace QuantizeConfig.SCALING_MODE with QuantizeConfig.RECIPE_MANAGER and update QuantizeFactory to use the recipe manager instead

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Jeremy Berchtold <[email protected]>

jberchtold-nvidia · 2025-04-29T01:19:02Z

/te-ci L0

…e instead of no quantization Signed-off-by: Jeremy Berchtold <[email protected]>

Signed-off-by: Jeremy Berchtold <[email protected]>

jberchtold-nvidia · 2025-04-29T17:30:43Z

/te-ci L0

phu0ngng · 2025-05-02T14:28:53Z

transformer_engine/jax/quantize/helper.py

+class UsageContext:
+    """Context of where a particular quantizer will be used which is needed by some recipes."""
+
+    usage_type: UsageType


Hi, why do we need a new class just to wrap around an enum?

phu0ngng · 2025-05-02T14:40:33Z

transformer_engine/jax/quantize/helper.py

+
+
+@dataclass
+class QuantizerParams:


I have a look at this QuantizerParams and on how it is used, and I would prefer not to have this class for the following reasons:

Whenever we need to query the scaling_mode or q_dtype or q_layout info, instead of

QuantizeConfig.RECIPE_MANAGER.get_quantizer_params(UsageContext(UsageType.X)).scaling_mode

We could have simply done

QuantizeConfig.RECIPE_MANAGER.get_scaling_mode(UsageType.X)

It is way simpler for other people to follow and make a contribution later.
2. With this QuantizerParam, we add one more level of object inside the Quantizer, which does not give any benefits. For the Quantizer create, we could do

q_x = QuantizerFactory.create(RECIPE_MANAGER.get_scaling_mode(UsageType.X), RECIPE_MANAGER.get_quantize_dtype(UsageType.X), RECIPE_MANAGER.get_quantize_layout(UsageType.X), **args_x)

phu0ngng · 2025-05-02T14:44:36Z

transformer_engine/jax/quantize/helper.py

        cls.INITIALIZED = True
        cls.MARGIN = fp8_recipe.margin if "margin" in dir(fp8_recipe) else 0.0
        cls.FP8_FORMAT = fp8_recipe.fp8_format
        cls.FWD_DTYPE, cls.BWD_DTYPE = _format2dtypes(cls.FP8_FORMAT)
-        cls.SCALING_MODE = _get_scaling_mode(fp8_recipe)
+        cls.RECIPE_MANAGER = recipe_manager


Why don't we merge QuantizeConfig and RecipeManager into a single class?
I don't see a clear need for them to exist separately.

Decouple recipe and scaling mode

575f4c4

Signed-off-by: Jeremy Berchtold <[email protected]>

jberchtold-nvidia force-pushed the dev/jberchtold/jax-scaling-mode-and-recipe-decoupling branch from aa85930 to 575f4c4 Compare April 29, 2025 01:12

jberchtold-nvidia added 2 commits April 29, 2025 17:17

Update grad quantizer q_layout to default layout when is_2x2x is Fals…

123c688

…e instead of no quantization Signed-off-by: Jeremy Berchtold <[email protected]>

Lint

bb8fde7

Signed-off-by: Jeremy Berchtold <[email protected]>

jberchtold-nvidia requested a review from phu0ngng May 1, 2025 20:18

phu0ngng reviewed May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Decouple Recipe and ScalingMode #1728

[JAX] Decouple Recipe and ScalingMode #1728

jberchtold-nvidia commented Apr 29, 2025

jberchtold-nvidia commented Apr 29, 2025

jberchtold-nvidia commented Apr 29, 2025

phu0ngng May 2, 2025

phu0ngng May 2, 2025

phu0ngng May 2, 2025

[JAX] Decouple Recipe and ScalingMode #1728

Are you sure you want to change the base?

[JAX] Decouple Recipe and ScalingMode #1728

Conversation

jberchtold-nvidia commented Apr 29, 2025

Description

Type of change

Changes

Checklist:

jberchtold-nvidia commented Apr 29, 2025

jberchtold-nvidia commented Apr 29, 2025

phu0ngng May 2, 2025

Choose a reason for hiding this comment

phu0ngng May 2, 2025

Choose a reason for hiding this comment

phu0ngng May 2, 2025

Choose a reason for hiding this comment