-
Notifications
You must be signed in to change notification settings - Fork 194
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[feat] support video evaluation for qwen2-vl and add mix-evals-video2…
…text (#275) * feat: add new ouput_path saving logic and add evaluation tracker to manage samples saving process * add: regression test * add: regression test * clean: unuseful code * 🚫 Remove unused import for cleaner code Eliminated the commented-out import statement for WandbLogger to tidy up the code and enhance readability. This helps maintain focus on active components and prevents confusion over unused code. A cleaner structure contributes to better maintainability in the long run. No functional changes were made, just a step towards a more streamlined codebase. * [task] add mix_evals for video evaluation * Merge branch 'origin/main' * ✨ Improve model name sanitization for Hugging Face formats * 🧹 Refactor settings for Llava OneVision model * ✨ Enhance video and image processing capabilities - Integrated vision processing for videos and images, improving context handling within the model. - Added error logging for missing utility dependencies to inform users about installation requirements. - Updated YAML configuration to standardize prompt handling for various video tasks. - Bumped version number to indicate ongoing development status. These changes streamline how visuals are managed in the model, contributing to better assistant responses in tasks involving media. * 🎉 Enhance W&B logging and video playback - Added automatic naming for W&B runs if not specified, improving organization. - Updated video frame rate from 1.0 to 0.5 for better performance and resource management during visual content processing. - Streamlined W&B logging by removing redundant code, ensuring cleaner execution flow. These changes optimize logging efficiency and enhance the overall user experience. * ✨ Refine conversation logic and adjust token limits - Updated chat template logic for better formatting in responses, ensuring consistent handling of user and assistant roles. - Reduced maximum new tokens in multiple evaluation files to ensure more concise outputs and improve efficiency. - Enhanced clarity in few-shot tasks by explicitly labeling question and answer roles in generated text. - Simplified logging of contextual and target information during evaluation, ensuring better tracking of results. These adjustments improve the overall output quality and streamline the evaluation processes. * feat: change qwen2 vl video reading to 0.25 fps to avoid oom * 🎥 Update video message structure in Qwen2_VL * Update qwen2_vl.py
- Loading branch information
Showing
11 changed files
with
458 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
dataset_kwargs: | ||
cache_dir: mix_evals_video2text | ||
token: true | ||
video: true | ||
dataset_path: lmms-lab/MixEvals_Video2Text | ||
lmms_eval_specific_kwargs: | ||
default: | ||
post_prompt: "" | ||
pre_prompt: "" | ||
gpt4v: | ||
post_prompt: "" | ||
pre_prompt: These are frames from a video. Please answer the following questions about the video. | ||
metadata: | ||
gpt_eval_model_name: gpt-4o-mini | ||
modality: video | ||
version: 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
group: mix_evals_video2text | ||
task: | ||
# - mix_evals_video2text_openconv | ||
- mix_evals_video2text_mc | ||
- mix_evals_video2text_freeform |
25 changes: 25 additions & 0 deletions
25
lmms_eval/tasks/mix_evals/mix_evals_video2text_freeform.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
dataset_name: "video2text_closeended_free-form" | ||
task: "mix_evals_video2text_freeform" | ||
test_split: test | ||
output_type: generate_until | ||
doc_to_visual: !function utils.mix_evals_video2text_doc_to_visual | ||
doc_to_text: !function utils.mix_evals_video2text_doc_to_text | ||
doc_to_target: "{{target}}" | ||
process_results: !function utils.mix_evals_video2text_process_results_freeform | ||
metric_list: | ||
- metric: gpt_eval | ||
aggregation: !function utils.mix_evals_video2text_gpt_eval | ||
higher_is_better: true | ||
|
||
generation_kwargs: | ||
max_new_tokens: 16 | ||
|
||
include: _default_template_yaml | ||
|
||
lmms_eval_specific_kwargs: | ||
default: | ||
pre_prompt: "These are frames from a video. Please answer the following questions about the video." | ||
post_prompt: "Answer the question using a single word or phrase." | ||
gpt4v: | ||
pre_prompt: "These are frames from a video. Please answer the following questions about the video with a short phrase." | ||
post_prompt: "" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
include: _default_template_yaml | ||
dataset_name: "video2text_closeended_multiple-choice" | ||
task: "mix_evals_video2text_mc" | ||
test_split: test | ||
output_type: generate_until | ||
doc_to_visual: !function utils.mix_evals_video2text_doc_to_visual | ||
doc_to_text: !function utils.mix_evals_video2text_doc_to_text | ||
doc_to_target: "{{target}}" | ||
|
||
generation_kwargs: | ||
max_new_tokens: 5 | ||
|
||
metric_list: | ||
- metric: exact_match | ||
aggregation: mean | ||
higher_is_better: true | ||
ignore_case: true | ||
ignore_punctuation: true | ||
|
||
filter_list: | ||
- name: "flexible-extract" | ||
filter: | ||
- function: !function utils.MultiChoiceRegexFilter | ||
group_select: 0 | ||
ignore_case: true | ||
ignore_punctuation: true | ||
|
||
lmms_eval_specific_kwargs: | ||
default: | ||
pre_prompt: "These are frames from a video. Please answer the following questions about the video." | ||
post_prompt: "Answer with the option's letter from the given choices directly." | ||
gpt4v: | ||
pre_prompt: "These are frames from a video. Please answer the following questions about the video." | ||
post_prompt: "Answer with the option's letter from the given choices directly." |
22 changes: 22 additions & 0 deletions
22
lmms_eval/tasks/mix_evals/mix_evals_video2text_openended.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
include: _default_template_yaml | ||
dataset_name: "video2text_openended" | ||
task: "mix_evals_video2text_openconv" | ||
test_split: test | ||
output_type: generate_until | ||
doc_to_visual: !function utils.mix_evals_video2text_doc_to_visual | ||
doc_to_text: !function utils.mix_evals_video2text_doc_to_text_open_convs | ||
doc_to_target: "" | ||
process_results: !function utils.mix_evals_video2text_process_results_open_convs | ||
|
||
metric_list: | ||
- metric: submission | ||
aggregation: !function utils.mix_evals_video2text_aggregate_gen | ||
higher_is_better: true | ||
|
||
lmms_eval_specific_kwargs: | ||
default: | ||
pre_prompt: "These are frames from a video. Please answer the following questions about the video." | ||
post_prompt: "" | ||
gpt4v: | ||
pre_prompt: "These are frames from a video. Please answer the following questions about the video." | ||
post_prompt: "" |
Oops, something went wrong.