Skip to content

Pull requests: modelscope/data-juicer

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Add humanvbench operators dj:multimodal issues/PRs about multimodal data processing dj:op issues/PRs about some specific OPs good first issue Good for newcomers
#553 opened Jan 17, 2025 by SYSUzhouting Loading…
Resplit input dataset in ray mode
#549 opened Jan 15, 2025 by chenyushuo Loading…
Add unittest for ray text dedup
#540 opened Jan 10, 2025 by chenyushuo Loading…
[WIP] refactor of dataset builder and executor
#537 opened Jan 9, 2025 by cyruszhang Loading…
Add minhash deduplicator based on RAY and Redis dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:op issues/PRs about some specific OPs
#489 opened Nov 15, 2024 by pan-x-c Loading…
Automatically split input dataset in ray mode
#415 opened Sep 4, 2024 by pan-x-c Loading…
[WIP]Add text tagging by prompt mapper op dj:op issues/PRs about some specific OPs
#408 opened Aug 30, 2024 by garyzhang99 Loading…
1 task
Add GPT-4V as evaluator dj:multimodal issues/PRs about multimodal data processing enhancement New feature or request stale-pr
#276 opened Mar 22, 2024 by drcege Draft DJ-SORA
ProTip! Follow long discussions with comments:>50.