Skip to content

命名实体识别baseline(使用LTP)

memeda edited this page Aug 30, 2016 · 1 revision

语料描述

人民日报1998年1月做训练(后10%数据作为开发集),6月前10000句做测试作为训练数据。

语料 实例数(行数) 实体数
pku-train 34,426 40,922
pku-holdout 3,000 4269
pku-test 10,000 11,340

语料中共包含13种标签,O标示Out , 其次,分别是 {S- , B- , I- , E-} x {Nh , Ns , Ni} , 表示单个、起始、中间、结尾的人名、地名、机构名标签。

评价

语料 P R F1
pku-train 99.45% 99.74% 99.59%
pku-holdout 91.66% 91.10% 91.38%
pku-test 93.18% 94.39% 93.78%
具体标签的信息
PKU-TRAIN
Nh: precision:  99.83%; recall:  99.95%; FB1:  99.89  13150
Ni: precision:  99.10%; recall:  99.71%; FB1:  99.40  8978
Ns: precision:  99.36%; recall:  99.60%; FB1:  99.48  18912

PKU-HOLDOUT
Nh: precision:  94.31%; recall:  92.58%; FB1:  93.44  1336
Ni: precision:  85.71%; recall:  83.92%; FB1:  84.81  749
Ns: precision:  92.08%; recall:  92.72%; FB1:  92.40  2158

PKU-TEST
Nh: precision:  97.48%; recall:  97.87%; FB1:  97.67  3249
Ni: precision:  87.11%; recall:  87.74%; FB1:  87.42  2653
Ns: precision:  93.56%; recall:  95.54%; FB1:  94.54  5586

速度

耗时:

PKU-TRAIN : 18.065 s

PKU-HOLDOUT: 1.718 s

PKU-TEST : 5.655 s