Skip to content

Issues: modelscope/data-juicer

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Simplifying Open Source Contributions Through Operator Tiering from Dev aspect dj:op issues/PRs about some specific OPs enhancement New feature or request good first issue Good for newcomers
#510 opened Dec 11, 2024 by yxdyc
2 tasks done
How to use Data-Juicer to process Chinese documents question Further information is requested
#509 opened Dec 11, 2024 by aruig666
3 tasks done
Can the cleaning statistics be viewed after creating the config file and performing the cleaning? question Further information is requested
#499 opened Nov 27, 2024 by Tendo33
3 tasks done
Guidance on Monitoring Task Execution with Ray Executor in Data Juicer dj:dist issues/PRs about distributed data processing question Further information is requested
#496 opened Nov 24, 2024 by Fatima-0SA
3 tasks done
AttributeError: 'FusedFilter' object has no attribute '_name' bug Something isn't working dj:op issues/PRs about some specific OPs
#495 opened Nov 24, 2024 by xunmenglt
Merge local and API LLM calling enhancement New feature or request
#490 opened Nov 15, 2024 by BeachWang
2 tasks done
sharegpt format support dj:dataset issues/PRs about the dj-dataset dj:multimodal issues/PRs about multimodal data processing priority:high in high priority question Further information is requested
#488 opened Nov 14, 2024 by IvanDeng0
3 tasks done
Checkpointer support for Ray-Mode enhancement New feature or request
#487 opened Nov 12, 2024 by yxdyc
2 tasks done
Distributed processing
编译安装时报错 question Further information is requested
#486 opened Nov 12, 2024 by charonkk
3 tasks done
Anyone tried DJ on multimodal datasets of more than 20M samples? question Further information is requested
#482 opened Nov 11, 2024 by serser
3 tasks done
windows系统支持 question Further information is requested
#477 opened Nov 6, 2024 by zytcharming
3 tasks done
Update of Jupyter Notebooks bug Something isn't working documentation Improvements or additions to documentation
#476 opened Nov 6, 2024 by HYLcool
[Bug]: perplexity_filter 算子内存OOM bug Something isn't working
#474 opened Nov 5, 2024 by weiaicunzai
3 tasks done
How to calculate the image_text_similarity scores for both Chinese and English? dj:multimodal issues/PRs about multimodal data processing dj:op issues/PRs about some specific OPs question Further information is requested
#473 opened Nov 5, 2024 by weiaicunzai
LLM造数据时需要try_num参数 enhancement New feature or request
#470 opened Nov 4, 2024 by BeachWang
2 tasks done
[Feat]: Unified LLM Calling Management enhancement New feature or request
#451 opened Oct 16, 2024 by drcege
2 tasks done
[Feat]: Automatic Version Matching During Installation enhancement New feature or request
#450 opened Oct 16, 2024 by drcege
2 tasks done
[Bug]: KeyError: 'resource' bug Something isn't working
#440 opened Sep 29, 2024 by luckystar1992
3 tasks done
Require fps filter and mapper for videos dj:op issues/PRs about some specific OPs enhancement New feature or request
#433 opened Sep 23, 2024 by BeachWang
Guidance for OP with multiple data fields to be processed enhancement New feature or request
#411 opened Sep 2, 2024 by yxdyc
2 tasks done
[Feat]: Add Ray actor support dj:dist issues/PRs about distributed data processing enhancement New feature or request stale-issue
#371 opened Jul 29, 2024 by drcege
support panda's student captioner model in our captioning mapper dj:multimodal issues/PRs about multimodal data processing dj:op issues/PRs about some specific OPs enhancement New feature or request stale-issue
#251 opened Mar 14, 2024 by yxdyc
ProTip! Add no:assignee to see everything that’s not assigned.