Properly support batched/non-batched with vllm/llama.cpp #77

njhill · 2024-07-03T22:38:02Z

This is based on @npalaska's PR #58.

With these changes we will auto-detect whether the server supports batched inputs and if not will send them sequentially.

and other streamlining

src/instructlab/sdg/llmblock.py

npalaska · 2024-07-03T22:51:55Z

src/instructlab/sdg/llmblock.py

            parsed_outputs = self._parse(output)
-            # pylint: disable=consider-using-generator
+
            max_length = max([len(value) for value in parsed_outputs.values()])


Github lint is suggesting to use max(len(value) for value in parsed_outputs.values())

This logic still needs to be cleaned up anyhow I think, it's not doing what it was intended to

src/instructlab/sdg/llmblock.py

npalaska · 2024-07-03T22:57:31Z

src/instructlab/sdg/llmblock.py


    def validate(self, prompt_template: str, input_dict: Dict[str, Any]) -> bool:
        if isinstance(prompt_template, dict):
            prompt_template = prompt_template[input_dict[self.selector_column_name]]
        return super()._validate(prompt_template, input_dict)
+
+
+def server_supports_batched(client, model_id: str) -> bool:


This might be nitpick but we can use server_supports_batching instead of server_supports_batched?

I think batched is better.. since it's referring to the inputs. Even without batched inputs it might do batching internally.

npalaska · 2024-07-03T23:02:30Z

src/instructlab/sdg/llmblock.py

+        for prompt in prompts:
+            for _ in range(n):
+                response = self.client.completions.create(
+                    prompt=prompt, **generate_args
+                )


We could rewrite this as

responses = [ self.client.completions.create(prompt=prompt, **generate_args) for prompt in prompts for _ in range(n) ]

wdyt?

yes but then we would require an additional loop anyhow

njhill · 2024-07-03T23:13:02Z

Thanks @npalaska I addressed most of those comments.

markmc

Cool, looks like a great direction

At least resolve the #TODO remove sample from samples thing

markmc · 2024-07-04T07:44:06Z

src/instructlab/sdg/llmblock.py

        }

+        # Whether the LLM server supports a list of input prompts
+        # and supports the n parameter to generate n outputs per input
+        self.server_supports_batched = server_supports_batched(client, model_id)


The FlowParams in #64 would give us a place to do this once rather than for every LLMBlock, but that can be fixed up later

See PipelineContext in #86 now

markmc · 2024-07-04T07:44:25Z

src/instructlab/sdg/llmblock.py

@@ -45,8 +46,13 @@ def __init__(
            "model": self.model,
            "temperature": 0,
            "max_tokens": 12000,
+            #"seed": 12345,  TBD


Delete? Or add an explanation to the comment

markmc · 2024-07-04T07:46:35Z

src/instructlab/sdg/llmblock.py

+            )
+            return [choice.text.strip() for choice in response.choices]
+
+        n = gen_kwargs.get("n", 1)


I would have imagined doing this in reverse - including "num_instructions_to_generate" in the block config and adding 'n' to gen_kwargs if batching was supported. No biggie though

markmc · 2024-07-04T07:47:59Z

src/instructlab/sdg/llmblock.py

@@ -113,21 +132,30 @@ def generate(self, samples, **gen_kwargs) -> Dataset:
        # validate each sample
        for sample in samples:
            if not self._validate(self.prompt_template, sample):
-                return None
+                logger.warning("Sample failed validation")  #TODO add details
+                #TODO remove sample from samples


Hmm. Should this be in a separate PR. If in this PR, the TODO should be resolved?

markmc · 2024-07-04T07:49:26Z

src/instructlab/sdg/llmblock.py

+        outputs = self._generate(samples, **gen_kwargs)
+        logger.debug("Generated outputs: %s", outputs)
+
+        num_parallel_samples = gen_kwargs.get("n", 1)


Hmm, and here's a reason to make num_parallel_samples part of the block config ... and add 'n' to gen_kwargs based on that

markmc · 2024-07-04T07:52:04Z

src/instructlab/sdg/llmblock.py

+        supported = len(response.choices) == 6
+    except openai.InternalServerError:
+        supported = False
+    client.server_supports_batched = supported


Hmm, I understand that you want to cache this ... but I don't like setting a new attribute on a class we don't own

I guess this could be removed with a move to FlowParams

markmc · 2024-07-04T07:59:11Z

Also, please squash those fixup commits as per instructlab/dev-docs#110

mergify · 2024-07-06T23:20:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. @njhill please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

As per instructlab/sdg#77 Signed-off-by: Mark McLoughlin <[email protected]>

aakankshaduggal · 2024-07-08T22:48:21Z

Closing in favor of #105

npalaska and others added 9 commits July 3, 2024 13:51

Remove the iterBlock and use openai's 'n' parameter instead

e3b80d4

some debug

9a9cb4e

fix zipping of samples and outputs

777e052

some refactoring

9ca5578

fix the num_samples location

b5a4862

update generate_data API calls

e0866fc

change back the defaults in test scripts

85069cf

fix SimpleFlows

ed8d95b

Properly support batched/non-batched with vllm/llama.cpp

6ed1246

and other streamlining

mergify bot added the ci-failure label Jul 3, 2024

njhill mentioned this pull request Jul 3, 2024

Replace Iterblock with LLMBlock #58

Closed

aakankshaduggal mentioned this pull request Jul 3, 2024

Hotfix for Filterblocks #72

Closed

npalaska reviewed Jul 3, 2024

View reviewed changes

Address review comments

53c14f9

markmc requested changes Jul 4, 2024

View reviewed changes

mergify bot added the needs-rebase label Jul 6, 2024

markmc added a commit to markmc/dev-docs that referenced this pull request Jul 8, 2024

Replace num_iters with num_instructions_to_generate

0011e66

As per instructlab/sdg#77 Signed-off-by: Mark McLoughlin <[email protected]>

aakankshaduggal closed this Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly support batched/non-batched with vllm/llama.cpp #77

Properly support batched/non-batched with vllm/llama.cpp #77

njhill commented Jul 3, 2024

npalaska Jul 3, 2024

njhill Jul 3, 2024

npalaska Jul 3, 2024

njhill Jul 3, 2024

npalaska Jul 3, 2024

njhill Jul 3, 2024

npalaska Jul 3, 2024

njhill commented Jul 3, 2024

markmc left a comment

markmc Jul 4, 2024

markmc Jul 5, 2024

markmc Jul 4, 2024

markmc Jul 4, 2024

markmc Jul 4, 2024

markmc Jul 4, 2024

markmc Jul 4, 2024

markmc commented Jul 4, 2024

mergify bot commented Jul 6, 2024

aakankshaduggal commented Jul 8, 2024

Properly support batched/non-batched with vllm/llama.cpp #77

Properly support batched/non-batched with vllm/llama.cpp #77

Conversation

njhill commented Jul 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhill commented Jul 3, 2024

markmc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markmc commented Jul 4, 2024

mergify bot commented Jul 6, 2024

aakankshaduggal commented Jul 8, 2024