[WIP]Add text tagging by prompt mapper op #408

garyzhang99 · 2024-08-30T03:37:53Z

As the title says.
Use llm to perform arbitrary tagging and classification.

Also adding a filter op, like video_tagging_from_frames_filter

drcege · 2024-09-12T02:55:39Z

data_juicer/ops/mapper/text_tagging_by_prompt_mapper.py

+                                      skip_special_tokens=True)
+
+        text_tags = []
+        text_tags.append(output)


It is better to strip output

drcege · 2024-09-12T02:58:16Z

data_juicer/ops/mapper/text_tagging_by_prompt_mapper.py

+        """ # noqa
+
+        super().__init__(*args, **kwargs)
+        self.num_proc = 1


If enable_vllm is False, num_proc=1 will force this OP to run in single process/GPU. Is this the desired behavior?

drcege · 2024-09-12T02:59:35Z

data_juicer/ops/mapper/text_tagging_by_prompt_mapper.py

+
+        text_tags = []
+        text_tags.append(output)
+        sample[Fields.text_tags] = text_tags


Please refer to #423 for adding user-specified tag field name.

drcege · 2024-09-12T03:09:22Z

tests/ops/mapper/test_text_tagging_by_prompt_mapper.py

+    if not string_list: 
+        assert False, "输入的列表不能是空的"
+
+    for string in string_list:


Why not directly check output in string_list?

drcege · 2024-09-12T03:10:47Z

tests/ops/mapper/test_text_tagging_by_prompt_mapper.py

+            max_model_len=1024,
+            max_num_seqs=16,
+            sampling_params={'temperature': 0.1, 'top_p': 0.95, 'max_tokens': 256})
+


Can this OP run in multiple processes, especially without vllm? Please add more tests.

drcege · 2024-09-12T03:11:59Z

tests/ops/mapper/test_text_tagging_by_prompt_mapper.py

+            max_num_seqs=16,
+            sampling_params={'temperature': 0.1, 'top_p': 0.95, 'max_tokens': 256})
+
+


It would be better to add tests for tensor_parallel_size.

github-actions · 2024-10-03T09:33:13Z

This PR is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this PR will be closed in 3 day.

github-actions · 2024-10-07T09:33:52Z

Close this stale PR.

add text tagging by prompt mapper op

2920174

garyzhang99 added the dj:op issues/PRs about some specific OPs label Aug 30, 2024

garyzhang99 self-assigned this Aug 30, 2024

garyzhang99 had a problem deploying to Testing August 30, 2024 03:37 — with GitHub Actions Failure

garyzhang99 requested a review from yxdyc August 30, 2024 03:39

drcege reviewed Sep 12, 2024

View reviewed changes

drcege requested review from drcege, HYLcool and zhijianma September 12, 2024 08:10

github-actions bot added the stale-pr label Oct 3, 2024

github-actions bot closed this Oct 7, 2024

yxdyc reopened this Oct 9, 2024

yxdyc had a problem deploying to Testing October 9, 2024 06:53 — with GitHub Actions Failure

github-actions bot removed the stale-pr label Oct 9, 2024

drcege requested review from BeachWang and pan-x-c October 14, 2024 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]Add text tagging by prompt mapper op #408

[WIP]Add text tagging by prompt mapper op #408

garyzhang99 commented Aug 30, 2024 •

edited

Loading

drcege Sep 12, 2024

drcege Sep 12, 2024

drcege Sep 12, 2024

drcege Sep 12, 2024 •

edited

Loading

drcege Sep 12, 2024 •

edited

Loading

drcege Sep 12, 2024 •

edited

Loading

github-actions bot commented Oct 3, 2024

github-actions bot commented Oct 7, 2024

		max_num_seqs=16,
		sampling_params={'temperature': 0.1, 'top_p': 0.95, 'max_tokens': 256})

[WIP]Add text tagging by prompt mapper op #408

Are you sure you want to change the base?

[WIP]Add text tagging by prompt mapper op #408

Conversation

garyzhang99 commented Aug 30, 2024 • edited Loading

drcege Sep 12, 2024

Choose a reason for hiding this comment

drcege Sep 12, 2024

Choose a reason for hiding this comment

drcege Sep 12, 2024

Choose a reason for hiding this comment

drcege Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

drcege Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

drcege Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Oct 3, 2024

github-actions bot commented Oct 7, 2024

garyzhang99 commented Aug 30, 2024 •

edited

Loading

drcege Sep 12, 2024 •

edited

Loading

drcege Sep 12, 2024 •

edited

Loading

drcege Sep 12, 2024 •

edited

Loading