前后端接口细节

MongoDB数据库结构

数据集名称：annotaiton_data

annotation_data
{
    uuid: a unique uuid for a sentence,
    text: the original text for a sentence,
    label: the classify label for a sentence make by user,
}

接口

数据导入接口

请求地址: /upload_remote_file 请求类型: HTTP
请求方式: POST
数据类型: multipart/form-data
响应类型: JSON
功能: 将请求中的文件的内容存入MongoDB
请求参数：

参数名称	是否必须	类型	描述
file	true	file	需要导入到数据库中的数据文件

响应数据：无特殊数据返回

load 获取没有打label 的数据的接口

请求地址: /load_single_unlabeled 请求类型: HTTP
请求方式: GET
数据类型:
响应类型: JSON
功能: 将请求中的文件的内容存入MongoDB
请求参数：无响应数据：

参数名称	是否必须	类型
message	false	string

TaskCenter Interface

全量训练

请求地址： /full_train
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：使用已标注数据进行全量训练

参数示例：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"ner", //任务类型(classify, ner)(string)
    "max_train_num": -1, //任务最大样本数(可省略，默认-1，无限制）(int)
    "min_train_num": 4, //任务最小样本数（可省略，默认4）(int)
}

返回参数：

{
    "ret_code" : 0, //状态码，int, 0标识提交成功，1标识提交失败
    "msg" : "abc", //详细信息，string, 如用户id不存在，dataset不存在，或者dataset标注数据不满足约束条件，或者重复提交
}

增量训练(读取数据库）

请求地址： /batch_db_train
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：使用已标注数据进行全量训练
参数示例：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"ner", //任务类型(classify, ner)(string)
    "started_timestamp": 123213213, //此batch第一条数据标记时的时间戳
    "batch_number": 4, //此batch训练的数量
}

返回参数：

{
    "ret_code" : 0, //状态码，int, 0标识提交成功，1标识提交失败
    "msg" : "abc", //详细信息，string, 如用户id不存在，dataset不存在，或者dataset标注数据不满足约束条件，或者重复提交
}

增量训练(直接传输样本数据）

请求地址： /batch_ds_train
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：使用已标注数据进行全量训练
参数示例(classify)：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"classify", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)
       "label": "ask_weather" //句子类别         
    }] //数组
}

参数示例(ner)：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"ner", //任务类型(classify, ner)(string)
    "data":[{
              "text": "刘德华的老婆朱丽倩",
              "ne_list": [{
                 "offset":0, //ne的开始idx(int)
                 "length":3, //ne的长度(int)
                 "type": "person", //ne类型(string) 
               },
               {
                 "offset":4, //ne的开始idx(int)
                 "length":3, //ne的长度(int)
                 "type": "person", //ne类型(string)
               }]//数组
    }]//数组
}

返回参数：

{
    "ret_code" : 0, //状态码，int, 0标识提交成功，1标识提交失败
    "msg" : "abc", //详细信息，string, 如用户id不存在，dataset不存在，或者dataset标注数据不满足约束条件，或者重复提交
}

预测（直接传输数据）

请求地址： /predict_ds
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：预测数据
参数示例(classify/ner)：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"classify", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)        
    }] //数组
}

返回参数(classify)：

{
    "ret_code": 0, //状态码
    "msg": "abc", //反馈信息，是不是预测成功等
    "task_type":"classify", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)      
       "result": [
          {"label":"ask_weather", "prob":0.98},
          {"label":"ask_time", "prob", 0.02}
        ] //数组   
    }] //数组
}

返回参数(ner)：

{
    "ret_code": 0, //状态码
    "msg": "abc", //反馈信息，是不是预测成功等
    "task_type":"ner", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)      
       "result":[{
              "text": "刘德华的老婆朱丽倩",
              "prob": 0.98, //如果使用crf，则只会输出句子概率
              "ne_list": [{
                 "offset":0, //ne的开始idx(int)
                 "length":3, //ne的长度(int)
                 "type": "person", //ne类型(string) 
               },
               {
                 "offset":4, //ne的开始idx(int)
                 "length":3, //ne的长度(int)
                 "type": "person", //ne类型(string)
               }]//数组  
      }] //数组
}

预测（读数据库）

请求地址： /predict_db
请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：预测数据
参数示例(classify/ner)：

{
    "trained_dataset_id":1234, //数据集id(int)
    "predict_dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type":"classify", //任务类型(classify, ner)(string)
    "row_idx": 123, //数据库第一行
    "batch_num": 10, //预测条数
}

返回参数(classify)：

{
    "ret_code": 0, //状态码
    "msg": "abc", //反馈信息，是不是预测成功等
    "task_type":"classify", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)      
       "result": [
          {"label":"ask_weather", "prob":0.98},
          {"label":"ask_time", "prob", 0.02}
        ] //数组   
    }] //数组
}

返回参数(ner)：

{
    "ret_code": 0, //状态码
    "msg": "abc", //反馈信息，是不是预测成功等
    "task_type":"ner", //任务类型(classify, ner)(string)
    "data":[{
       "text": "明天天气怎么样？", //句子(string)      
       "result":[{
              "text": "刘德华的老婆朱丽倩",
              "prob": 0.98, //如果使用crf，则只会输出句子概率
              "ne_list": [{
                 "offset":0, //ne的开始idx(int)
                 "length":3, //ne的长度(int)
                 "type": "person", //ne类型(string) 
               },
               {
                 "offset":4, //ne的开始idx(int)
                 "length":3, //ne的长度(int)
                 "type": "person", //ne类型(string)
               }]//数组  
      }] //数组
}

状态查询

请求地址： /status 请求类型：HTTP
请求方式：POST
响应类型：JSON
功能：预测数据
参数示例(classify/ner)：

{
    "dataset_id":1234, //数据集id(int)
    "user_id":1234, //用户id(int)
    "task_type": "classify",
}

返回参数(classify/ner)：

{
    "ret_code": 0, //状态:有训练中(trainning);任务提交了，还没进入训练队列(pending);训练完成(complete);训练过程出错(error);}
    "msg": "正在训练中"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

前后端接口细节

MongoDB数据库结构

接口

数据导入接口

load 获取没有打label 的数据的接口

TaskCenter Interface

全量训练

增量训练(读取数据库）

增量训练(直接传输样本数据）

预测（直接传输数据）

预测（读数据库）

状态查询

Clone this wiki locally