Jetstream by default #118

tengomucho · 2024-11-22T15:09:15Z

What does this PR do?

This makes all the changes to allow having the Jetstream Pytorch engine to be the default backend for TGI on TPUs.
This backend is reliable and performant and give the best throughput on TGI.

Implementation is slightly different, so a separate test is added.

Most tests work for both, except for the continuous batching one. This allows to remove the old GPT2 based tests, that are quite slow and do not use any sharding or KV cache, so they might not really be representative of most relevant models on TGI.

There are equivalent tests now on the TinyLlama model, that run faster, use the KV cache and sharding. The only test that does not have an equivalence is the continuous batching one, but the test was not working for most other models, so I prefer to remove it anyway, as having it passing was not representative anyway of the current state.

HuggingFaceDocBuilderDev · 2024-11-22T15:13:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Now that the engine is stable and tested, its engine is set as the default one for TGI.

dacorvo

I think I got lost in your changes: can you summarize how tests are now supposed to work ?

dacorvo · 2024-11-25T07:54:49Z

text-generation-inference/tests/test_generator_slot.py

+    ids=["spaces", "chinese-utf8", "emojis"],
+)
+def test_decode_streaming_jetstream(tokenizer, input_text, generated_text):
+    if not jetstream_pt_available():


Note that you could have created a decorator.

I refactored the test to avoid repetitions.

dacorvo · 2024-11-25T07:57:15Z

text-generation-inference/tests/test_prefill_truncate.py

+    assert generations[0].tokens.texts == [" the"]
+
+
+def test_prefill_truncate_jetstream():


I fail to see the difference between the two tests: I don't think it was required to add the 'jetstream' one

The two tests are identical in behaviour, but if jetstream is loaded the other test will fail to run correctly because of incompatibility on the dependencies when using some features of pytorch (i.e.: multiprocessing).
I just have two identical tests, but one is going to be run when jetstream is enabled, the other one will be skipped, and when jetstream is disabled it will be the other way around.

They are not only identical in behaviour: this is the same test with two different names ... What am I missing ?

dacorvo · 2024-11-25T07:59:43Z

text-generation-inference/tests/test_tinyllama.py

+    _test_continuous_batching_two_requests(model_path)
+
+
+"""NOTE: This test does not work on PyTorch/XLA, because of the way


You should adapt the test to make it actually useful for the XLA configuration.

In my tests, with BF16 and KV cache, I was not able to get this test working. I think there might be an issue on the way KV cache is implemented, because this test is successful on the Jetstream backend with BF16 on the same hardware. This is the reason why I left the test there, as a reminder that this should be done later on, but that it does not really work as expected for now.

So far filtering was done using the name of the test. Now the selection is done using a custom marker, that allows for clearer filtering.

dacorvo

Nice ! Thank you for this pull-request.

dacorvo · 2024-11-25T14:19:55Z

text-generation-inference/tests/conftest.py

+    # Skip tests that require torch xla but not jetstream
+    if "torch_xla" in marker_names and "jetstream" not in marker_names:
+        if jetstream_pt_enabled:
+            pytest.skip("Jetstream PyTorch must be disabled")


nit: I would find it clearer to say sthg like "Jetstream is enabled: xla test will fail".

tengomucho · 2024-11-25T14:30:48Z

I think I got lost in your changes: can you summarize how tests are now supposed to work ?

@dacorvo as discussed offline, the idea is to change the default backend of TPU TGI from torch xla to jetstream.
I just updated the tests so they use clearer markers to check if the backend they are running is correctly selected.

For some reason the env var was not carried on (though Jetstream was disabled anyway). Moving the variable to the command line invocation will remove a warning in the logs.

baptistecolle

JETSTREAM_PT_DISABLE=1

optimum/tpu/jetstream_pt_support.py

Makefile

text-generation-inference/tests/test_prefill_truncate.py

.github/workflows/test-pytorch-xla-tpu.yml

Some tests result change when operations are done in a slightly different way. This has happened now with the torch xla tests, resulting in different results on the CI. To avoid this, now tests compare the obtained token and text is different from the one obtained when running with greedy search.

tengomucho added 4 commits November 22, 2024 14:40

test(slots): add unit tests for slots for jetstream too

abeace6

Implementation is slightly different, so a separate test is added.

test(truncate): adapt test for jetstream too

b15f952

feat(tgi): Jetstream/Pytorch is now the default engine

07f74ff

Now that the engine is stable and tested, its engine is set as the default one for TGI.

tengomucho force-pushed the jetstream-by-default branch from 078dfa4 to 07f74ff Compare November 22, 2024 15:25

tengomucho marked this pull request as ready for review November 22, 2024 15:39

tengomucho requested review from dacorvo and baptistecolle November 22, 2024 15:39

dacorvo reviewed Nov 25, 2024

View reviewed changes

review(test): refactor slot test to avoid repeating code

d18dc6e

tengomucho force-pushed the jetstream-by-default branch from 33fa858 to 5de0979 Compare November 25, 2024 14:04

feat(tests): use pytests markers to filter jetstream and torch xla tests

6704a80

So far filtering was done using the name of the test. Now the selection is done using a custom marker, that allows for clearer filtering.

tengomucho force-pushed the jetstream-by-default branch from 5de0979 to 6704a80 Compare November 25, 2024 14:04

dacorvo approved these changes Nov 25, 2024

View reviewed changes

tengomucho added 2 commits November 25, 2024 14:36

review(tests): skip test message clarification

5a8aa37

ci(torch xla): use JETSTREAM_PT_DISABLE env var in command line

4c5e727

For some reason the env var was not carried on (though Jetstream was disabled anyway). Moving the variable to the command line invocation will remove a warning in the logs.

dacorvo approved these changes Nov 25, 2024

View reviewed changes

baptistecolle reviewed Nov 25, 2024

View reviewed changes

optimum/tpu/jetstream_pt_support.py Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

text-generation-inference/tests/test_prefill_truncate.py Outdated Show resolved Hide resolved

.github/workflows/test-pytorch-xla-tpu.yml Outdated Show resolved Hide resolved

review(ci): fix JETSTREAM_PT_DISABLE env var usage again

718a297

baptistecolle approved these changes Nov 26, 2024

View reviewed changes

tengomucho merged commit 8c2c199 into main Nov 27, 2024
5 checks passed

tengomucho deleted the jetstream-by-default branch November 27, 2024 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jetstream by default #118

Jetstream by default #118

tengomucho commented Nov 22, 2024

HuggingFaceDocBuilderDev commented Nov 22, 2024

dacorvo left a comment

dacorvo Nov 25, 2024

tengomucho Nov 25, 2024

dacorvo Nov 25, 2024

tengomucho Nov 25, 2024

dacorvo Nov 25, 2024

dacorvo Nov 25, 2024

tengomucho Nov 25, 2024

dacorvo left a comment

dacorvo Nov 25, 2024

tengomucho commented Nov 25, 2024

baptistecolle left a comment

		assert generations[0].tokens.texts == [" the"]


		def test_prefill_truncate_jetstream():

		_test_continuous_batching_two_requests(model_path)


		"""NOTE: This test does not work on PyTorch/XLA, because of the way

Jetstream by default #118

Jetstream by default #118

Conversation

tengomucho commented Nov 22, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Nov 22, 2024

dacorvo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dacorvo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tengomucho commented Nov 25, 2024

baptistecolle left a comment

Choose a reason for hiding this comment