Skip to content

Commit

Permalink
update link (#228)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhijianma authored Mar 7, 2024
1 parent 2720113 commit 156ed20
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 38 deletions.
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[[中文主页]](README_ZH.md) | [[Docs]](README.md#documentation-index--文档索引-a-namedocumentationindex) | [[API]](https://alibaba.github.io/data-juicer) | [[*DJ-SORA*]](docs/DJ_SORA.md)
[[中文主页]](README_ZH.md) | [[Docs]](#documents) | [[API]](https://alibaba.github.io/data-juicer) | [[*DJ-SORA*]](docs/DJ_SORA.md)

# Data-Juicer: A One-Stop Data Processing System for Large Language Models

Expand All @@ -16,8 +16,8 @@



[![Document_List](https://img.shields.io/badge/Docs-English-blue?logo=Markdown)](README.md#documentation-index--文档索引-a-namedocumentationindex)
[![文档列表](https://img.shields.io/badge/文档-中文-blue?logo=Markdown)](README_ZH.md#documentation-index--文档索引-a-namedocumentationindex)
[![Document_List](https://img.shields.io/badge/Docs-English-blue?logo=Markdown)](#documents)
[![文档列表](https://img.shields.io/badge/文档-中文-blue?logo=Markdown)](README_ZH.md#documents)
[![API Reference](https://img.shields.io/badge/Docs-API_Reference-blue?logo=Markdown)](https://alibaba.github.io/data-juicer/)
[![Paper](http://img.shields.io/badge/cs.LG-arXiv%3A2309.02033-B31B1B?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2309.02033)

Expand Down Expand Up @@ -45,7 +45,7 @@ In this new version, we support more features for **multimodal data (including v
- ![new](https://img.alicdn.com/imgextra/i4/O1CN01kUiDtl1HVxN6G56vN_!!6000000000764-2-tps-43-19.png) [2024-02-05] Our paper has been accepted by SIGMOD'24 industrial track!
- [2024-01-10] Discover new horizons in "Data Mixture"—Our second data-centric LLM competition has kicked off! Please visit the competition's [official website](https://tianchi.aliyun.com/competition/entrance/532174) for more information.
- [2024-01-05] We release **Data-Juicer v0.1.3** now!
In this new version, we support **more Python versions** (3.7-3.10), and support **multimodal** dataset [converting](tools/multimodal/README.md)/[processing](docs/Operators.md) (Including texts, images, and audios. More modalities will be supported in the future).
In this new version, we support **more Python versions** (3.8-3.10), and support **multimodal** dataset [converting](tools/multimodal/README.md)/[processing](docs/Operators.md) (Including texts, images, and audios. More modalities will be supported in the future).
Besides, our paper is also updated to [v3](https://arxiv.org/abs/2309.02033).

- [2023-10-13] Our first data-centric LLM competition begins! Please
Expand All @@ -59,7 +59,7 @@ Table of Contents
* [Data-Juicer: A One-Stop Data Processing System for Large Language Models](#data-juicer-a-one-stop-data-processing-system-for-large-language-models)
* [Table of Contents](#table-of-contents)
* [Features](#features)
* [Documentation Index | 文档索引](#documentation-index--文档索引-a-namedocumentationindex)
* [Documentation Index](#documents)
* [Demos](#demos)
* [Prerequisites](#prerequisites)
* [Installation](#installation)
Expand Down Expand Up @@ -111,19 +111,19 @@ Table of Contents



## Documentation Index | 文档索引 <a name="documentationindex"/>
## Documentation Index <a name="documents"/>

- [Overview](README.md) | [概览](README_ZH.md)
- [Operator Zoo](docs/Operators.md) | [算子库](docs/Operators_ZH.md)
- [Configs](configs/README.md) | [配置系统](configs/README_ZH.md)
- [Developer Guide](docs/DeveloperGuide.md) | [开发者指南](docs/DeveloperGuide_ZH.md)
- ["Bad" Data Exhibition](docs/BadDataExhibition.md) | [“坏”数据展览](docs/BadDataExhibition_ZH.md)
- Dedicated Toolkits | 专用工具箱
- [Quality Classifier](tools/quality_classifier/README.md) | [质量分类器](tools/quality_classifier/README_ZH.md)
- [Auto Evaluation](tools/evaluator/README.md) | [自动评测](tools/evaluator/README_ZH.md)
- [Preprocess](tools/preprocess/README.md) | [前处理](tools/preprocess/README_ZH.md)
- [Postprocess](tools/postprocess/README.md) | [后处理](tools/postprocess/README_ZH.md)
- [Third-parties (LLM Ecosystems)](thirdparty/README.md) | [第三方库(大语言模型生态)](thirdparty/README_ZH.md)
- [Overview](README.md)
- [Operator Zoo](docs/Operators.md)
- [Configs](configs/README.md)
- [Developer Guide](docs/DeveloperGuide.md)
- ["Bad" Data Exhibition](docs/BadDataExhibition.md)
- Dedicated Toolkits
- [Quality Classifier](tools/quality_classifier/README.md)
- [Auto Evaluation](tools/evaluator/README.md)
- [Preprocess](tools/preprocess/README.md)
- [Postprocess](tools/postprocess/README.md)
- [Third-parties (LLM Ecosystems)](thirdparty/README.md)
- [API references](https://alibaba.github.io/data-juicer/)
- [Awesome LLM-Data](docs/awesome_llm_data.md)
- [DJ-SORA](docs/DJ_SORA.md)
Expand Down
38 changes: 19 additions & 19 deletions README_ZH.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[[English Page]](README.md) | [[文档]](README_ZH.md#documentation-index--文档索引-a-namedocumentationindex) | [[API]](https://alibaba.github.io/data-juicer) | [[*DJ-SORA*]](docs/DJ_SORA_ZH.md)
[[English Page]](README.md) | [[文档]](#documents) | [[API]](https://alibaba.github.io/data-juicer) | [[*DJ-SORA*]](docs/DJ_SORA_ZH.md)

# Data-Juicer: 为大语言模型提供更高质量、更丰富、更易“消化”的数据

Expand All @@ -14,8 +14,8 @@
[![ModelScope- Demos](https://img.shields.io/badge/ModelScope-Demos-4e29ff.svg?logo=data:image/svg+xml;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMjI0IDEyMS4zMyIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCTxwYXRoIGQ9Im0wIDQ3Ljg0aDI1LjY1djI1LjY1aC0yNS42NXoiIGZpbGw9IiM2MjRhZmYiIC8+Cgk8cGF0aCBkPSJtOTkuMTQgNzMuNDloMjUuNjV2MjUuNjVoLTI1LjY1eiIgZmlsbD0iIzYyNGFmZiIgLz4KCTxwYXRoIGQ9Im0xNzYuMDkgOTkuMTRoLTI1LjY1djIyLjE5aDQ3Ljg0di00Ny44NGgtMjIuMTl6IiBmaWxsPSIjNjI0YWZmIiAvPgoJPHBhdGggZD0ibTEyNC43OSA0Ny44NGgyNS42NXYyNS42NWgtMjUuNjV6IiBmaWxsPSIjMzZjZmQxIiAvPgoJPHBhdGggZD0ibTAgMjIuMTloMjUuNjV2MjUuNjVoLTI1LjY1eiIgZmlsbD0iIzM2Y2ZkMSIgLz4KCTxwYXRoIGQ9Im0xOTguMjggNDcuODRoMjUuNjV2MjUuNjVoLTI1LjY1eiIgZmlsbD0iIzYyNGFmZiIgLz4KCTxwYXRoIGQ9Im0xOTguMjggMjIuMTloMjUuNjV2MjUuNjVoLTI1LjY1eiIgZmlsbD0iIzM2Y2ZkMSIgLz4KCTxwYXRoIGQ9Im0xNTAuNDQgMHYyMi4xOWgyNS42NXYyNS42NWgyMi4xOXYtNDcuODR6IiBmaWxsPSIjNjI0YWZmIiAvPgoJPHBhdGggZD0ibTczLjQ5IDQ3Ljg0aDI1LjY1djI1LjY1aC0yNS42NXoiIGZpbGw9IiMzNmNmZDEiIC8+Cgk8cGF0aCBkPSJtNDcuODQgMjIuMTloMjUuNjV2LTIyLjE5aC00Ny44NHY0Ny44NGgyMi4xOXoiIGZpbGw9IiM2MjRhZmYiIC8+Cgk8cGF0aCBkPSJtNDcuODQgNzMuNDloLTIyLjE5djQ3Ljg0aDQ3Ljg0di0yMi4xOWgtMjUuNjV6IiBmaWxsPSIjNjI0YWZmIiAvPgo8L3N2Zz4K)](https://modelscope.cn/studios?name=Data-Jiucer&page=1&sort=latest&type=1)
[![HuggingFace- Demos](https://img.shields.io/badge/🤗HuggingFace-Demos-4e29ff.svg)](https://huggingface.co/spaces?&search=datajuicer)

[![Document_List](https://img.shields.io/badge/Docs-English-blue?logo=Markdown)](README.md#documentation-index--文档索引-a-namedocumentationindex)
[![文档列表](https://img.shields.io/badge/文档-中文-blue?logo=Markdown)](README_ZH.md#documentation-index--文档索引-a-namedocumentationindex)
[![Document_List](https://img.shields.io/badge/Docs-English-blue?logo=Markdown)](README.md#documents)
[![文档列表](https://img.shields.io/badge/文档-中文-blue?logo=Markdown)](#documents)
[![API Reference](https://img.shields.io/badge/Docs-API_Reference-blue?logo=Markdown)](https://alibaba.github.io/data-juicer/)
[![Paper](http://img.shields.io/badge/cs.LG-arXiv%3A2309.02033-B31B1B?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2309.02033)

Expand All @@ -40,7 +40,7 @@ Data-Juicer(包含[DJ-SORA](docs/DJ_SORA_ZH.md))正在积极更新和维护
- [2024-01-10] 开启“数据混合”新视界——第二届Data-Juicer大模型数据挑战赛已经正式启动!立即访问[竞赛官网](https://tianchi.aliyun.com/competition/entrance/532174),了解赛事详情。

-[2024-01-05] 现在,我们发布了 **Data-Juicer v0.1.3** 版本!
在这个新版本中,我们支持了**更多Python版本**(3.7-3.10),同时支持了**多模态**数据集的[转换](tools/multimodal/README_ZH.md)[处理](docs/Operators_ZH.md)(包括文本、图像和音频。更多模态也将会在之后支持)。
在这个新版本中,我们支持了**更多Python版本**(3.8-3.10),同时支持了**多模态**数据集的[转换](tools/multimodal/README_ZH.md)[处理](docs/Operators_ZH.md)(包括文本、图像和音频。更多模态也将会在之后支持)。
此外,我们的论文也更新到了[第三版](https://arxiv.org/abs/2309.02033)

- [2023-10-13] 我们的第一届以数据为中心的 LLM 竞赛开始了!
Expand All @@ -53,7 +53,7 @@ Data-Juicer(包含[DJ-SORA](docs/DJ_SORA_ZH.md))正在积极更新和维护
* [Data-Juicer: 为大语言模型提供更高质量、更丰富、更易“消化”的数据](#data-juicer-为大语言模型提供更高质量更丰富更易消化的数据)
* [目录](#目录)
* [特点](#特点)
* [Documentation Index | 文档索引](#documentation-index--文档索引-a-namedocumentationindex)
* [文档索引](#documents)
* [演示样例](#演示样例)
* [前置条件](#前置条件)
* [安装](#安装)
Expand Down Expand Up @@ -93,20 +93,20 @@ Data-Juicer(包含[DJ-SORA](docs/DJ_SORA_ZH.md))正在积极更新和维护
* **灵活 & 易扩展**:支持大多数数据格式(如jsonl、parquet、csv等),并允许灵活组合算子。支持[自定义算子](docs/DeveloperGuide_ZH.md#构建自己的算子),以执行定制化的数据处理。


## Documentation Index | 文档索引 <a name="documentationindex"/>

* [Overview](README.md) | [概览](README_ZH.md)
* [Operator Zoo](docs/Operators.md) | [算子库](docs/Operators_ZH.md)
* [Configs](configs/README.md) | [配置系统](configs/README_ZH.md)
* [Developer Guide](docs/DeveloperGuide.md) | [开发者指南](docs/DeveloperGuide_ZH.md)
* ["Bad" Data Exhibition](docs/BadDataExhibition.md) | [“坏”数据展览](docs/BadDataExhibition_ZH.md)
* Dedicated Toolkits | 专用工具箱
* [Quality Classifier](tools/quality_classifier/README.md) | [质量分类器](tools/quality_classifier/README_ZH.md)
* [Auto Evaluation](tools/evaluator/README.md) | [自动评测](tools/evaluator/README_ZH.md)
* [Preprocess](tools/preprocess/README.md) | [前处理](tools/preprocess/README_ZH.md)
* [Postprocess](tools/postprocess/README.md) | [后处理](tools/postprocess/README_ZH.md)
* [Third-parties (LLM Ecosystems)](thirdparty/README.md) | [第三方库(大语言模型生态)](thirdparty/README_ZH.md)
* [API references](https://alibaba.github.io/data-juicer/)
## 文档索引 <a name="documents"/>

* [概览](README_ZH.md)
* [算子库](docs/Operators_ZH.md)
* [配置系统](configs/README_ZH.md)
* [开发者指南](docs/DeveloperGuide_ZH.md)
* [“坏”数据展览](docs/BadDataExhibition_ZH.md)
* 专用工具箱
* [质量分类器](tools/quality_classifier/README_ZH.md)
* [自动评测](tools/evaluator/README_ZH.md)
* [前处理](tools/preprocess/README_ZH.md)
* [后处理](tools/postprocess/README_ZH.md)
* [第三方库(大语言模型生态)](thirdparty/README_ZH.md)
* [API 参考](https://alibaba.github.io/data-juicer/)
* [Awesome LLM-Data](docs/awesome_llm_data.md)
* [DJ-SORA](docs/DJ_SORA_ZH.md)

Expand Down
2 changes: 1 addition & 1 deletion data_juicer/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = '0.1.3'
__version__ = '0.2.0'

import os
import subprocess
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def get_install_requirements(require_f_paths, env_dir='environments'):
name='py-data-juicer',
version=version,
url='https://github.com/alibaba/data-juicer',
author='SysML team of Alibaba DAMO Academy',
author='SysML Team of Alibaba Tongyi Lab',
description='A One-Stop Data Processing System for Large Language '
'Models.',
long_description=readme_md,
Expand Down

0 comments on commit 156ed20

Please sign in to comment.