前后端接口细节

MongoDB数据库结构

数据集名称：annotaiton_data

annotation_raw_data
{
    uuid: a unique uuid for a sentence,
    text: the original text for a sentence,
    labeled: true or false, default is false, identify if this sentence has been labeled by used
    dataset_uuid: the dataset that it belong to
}

annotation_data
{
    uuid: a unique uuid for a sentence,
    text: the original text for a sentence,
    label: the classify label for a sentence make by user,
    dataset_uuid: the dataset that it belong to
    user_uuid: the user uid
    timestamp: labeled timestamp
}

user
{
    uuid: user id,
    user_name: user name
}

dataset
{
    uuid
    dataset_name,
    user_uuid,
    creat_timestamp,
}

train_status
{
    user_uuid,  //用户id
    dataset_uuid,   // 数据集id
    model_name,  //模型名称,目前只能存储一个模型名字
    model_type, //ner, classify
    model_version, //timestamp
    is_full_train, //（增量还是全量）
    status //模型状态，训练中，还是已经完成
    start_timestamp
    end_timestamp
}

接口

webui interface

远程上传数据文件

请求地址: /upload_remote_file 请求类型: HTTP
请求方式: POST
响应类型: JSON
功能: 将请求中的文件的内容存入MongoDB
参数示例：

{
    "file":upload_file, //上传文件
}

返回参数：

{
    "data" : 0, status 数据，success or failed
    "code" : 200, //状态码，int, 200标识提交成功，其他code标识提交失败
    "message" : "Load SUCCESS"  //详细信息
}

获取没有被用户标准作的数据

请求地址: /load_single_unlabeled 请求类型: HTTP
请求方式: GET
响应类型: JSON
功能: 获取一条没有被用户标注过的数据
参数：无返回参数：

{
    'message': '', 
    'data': 
        {
            'dataset_id': 0, 
            'text': '贵公司负责人：你好！ 本公司(祥泰实业有限公司）具有良 好有的进口来源，有剩余的发票及广泛的网 络可为贵公司谋利获得双嬴，本公司原则是 满意付款。有诚意来电洽商。 电  话：013631690076 邮  箱：[email protected] 联系人：郭 生', 
            'uuid': '5dfbe77a-dd93-11e7-a8b2-408d5cb30446'
        }", 
    'code': 200
}

提交标注数据

TODO

获取历史标注数据

TODO

TaskCenter Interface

全量训练

请求地址： /full_train
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：使用已标注数据进行全量训练

参数示例：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"ner", //任务类型(classify, ner)(string)
    "max_train_num": -1, //任务最大样本数(可省略，默认-1，无限制）(int)
    "min_train_num": 4, //任务最小样本数（可省略，默认4）(int)
}

返回参数：

{
    "ret_code" : 0, //状态码，int, 0标识提交成功，1标识提交失败
    "msg" : "abc", //详细信息，string, 如用户id不存在，dataset不存在，或者dataset标注数据不满足约束条件，或者重复提交
}

增量训练(读取数据库）

请求地址： /batch_db_train
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：使用已标注数据进行批训练
参数示例：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"ner", //任务类型(classify, ner)(string)
    "started_timestamp": 123213213, //此batch第一条数据标记时的时间戳
    "batch_number": 4, //此batch训练的数量
}

返回参数：

{
    "ret_code" : 0, //状态码，int, 0标识提交成功，1标识提交失败
    "msg" : "abc", //详细信息，string, 如用户id不存在，dataset不存在，或者dataset标注数据不满足约束条件，或者重复提交
}

增量训练(直接传输样本数据）

请求地址： /batch_ds_train
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：使用已标注数据进行批量训练
参数示例(classify)：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"classify", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)
       "label": "ask_weather" //句子类别         
    }] //数组
}

参数示例(ner)：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"ner", //任务类型(classify, ner)(string)
    "data":[{
              "text": "刘德华的老婆朱丽倩",
              "ne_list": [{
                 "offset":0, //ne的开始idx(int)
                 "length":3, //ne的长度(int)
                 "type": "person", //ne类型(string) 
               },
               {
                 "offset":4, //ne的开始idx(int)
                 "length":3, //ne的长度(int)
                 "type": "person", //ne类型(string)
               }]//数组
    }]//数组
}

返回参数：

{
    "ret_code" : 0, //状态码，int, 0标识提交成功，1标识提交失败
    "msg" : "abc", //详细信息，string, 如用户id不存在，dataset不存在，或者dataset标注数据不满足约束条件，或者重复提交
}

预测（直接传输数据）

请求地址： /predict_ds
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：预测数据
参数示例(classify/ner)：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"classify", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)        
    }] //数组
}

返回参数(classify)：

{
    "ret_code": 0, //状态码
    "msg": "abc", //反馈信息，是不是预测成功等
    "task_type":"classify", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)      
       "results": [
          {"label":"ask_weather", "prob":0.98},
          {"label":"ask_time", "prob", 0.02}
        ] //数组   
    }] //数组
}

返回参数(ner)：

{
    "ret_code": 0, //状态码
    "msg": "abc", //反馈信息，是不是预测成功等
    "task_type":"ner", //任务类型(classify, ner)(string)
    "data":[{
       "text": "刘德华的老婆朱丽倩", //句子(string)      
       "prob": 0.98, //如果使用crf，则只会输出句子概率
       "results":[{
            "offset":0, //ne的开始idx(int)
            "length":3, //ne的长度(int)
            "type": "person", //ne类型(string) 
        },
        {
            "offset":4, //ne的开始idx(int)
            "length":3, //ne的长度(int)
            "type": "person", //ne类型(string)
        }] //数组
    }]
}

预测（读数据库）

请求地址： /predict_db
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：预测数据
参数示例(classify/ner)：

{
    "trained_dataset_id":1234, //数据集id(int)
    "predict_dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"classify", //任务类型(classify, ner)(string)
    "row_idx": 123, //数据库第一行
    "batch_num": 10, //预测条数
}

返回参数(classify)：

{
    "ret_code": 0, //状态码
    "msg": "abc", //反馈信息，是不是预测成功等
    "task_type":"classify", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)      
       "results": [
          {"label":"ask_weather", "prob":0.98},
          {"label":"ask_time", "prob", 0.02}
        ] //数组   
    }] //数组
}

返回参数(ner)：

{
    "ret_code": 0, //状态码
    "msg": "abc", //反馈信息，是不是预测成功等
    "task_type":"ner", //任务类型(classify, ner)(string)
    "data":[{
       "text": "刘德华的老婆朱丽倩", //句子(string)      
       "prob": 0.98, //如果使用crf，则只会输出句子概率
       "results":[{
            "offset":0, //ne的开始idx(int)
            "length":3, //ne的长度(int)
            "type": "person", //ne类型(string) 
        },
        {
            "offset":4, //ne的开始idx(int)
            "length":3, //ne的长度(int)
            "type": "person", //ne类型(string)
        }] //数组
    }]
}

状态查询

请求地址： /status 请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：预测数据
参数示例(classify/ner)：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type": "classify",
}

返回参数(classify/ner)：

{
    "ret_code": 0, //状态:有训练中(trainning);任务提交了，还没进入训练队列(pending);训练完成(complete);训练过程出错(error);}
    "msg": "正在训练中"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

前后端接口细节

MongoDB数据库结构

接口

webui interface

远程上传数据文件

获取没有被用户标准作的数据

提交标注数据

获取历史标注数据

TaskCenter Interface

全量训练

增量训练(读取数据库）

增量训练(直接传输样本数据）

预测（直接传输数据）

预测（读数据库）

状态查询

Clone this wiki locally