Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentences 有用到什么算法或者语义分析之类的吗? #102

Open
gMan1990 opened this issue Mar 5, 2019 · 3 comments
Open

sentences 有用到什么算法或者语义分析之类的吗? #102

gMan1990 opened this issue Mar 5, 2019 · 3 comments

Comments

@gMan1990
Copy link

gMan1990 commented Mar 5, 2019

希望简单讲解下断句逻辑~

@muzier
Copy link

muzier commented Jun 26, 2019

同问。目前中文断句,好像没看到什么特别好的办法,不知道@isnowfy大神是怎么解决的。。。。

@gMan1990
Copy link
Author

gMan1990 commented Jul 6, 2019

@muzier
http://www.davismol.net/2015/02/03/java-how-to-split-a-string-into-fixed-length-rows-without-breaking-the-words/

Pattern.compile(
            String.format("\\W?(.{1,%1$d}\\b|.{1,%1$d})\\W?", 句子最大长度-2),
            Pattern.UNICODE_CHARACTER_CLASS)

@jeremy-feng
Copy link

源代码在这里:

def get_sentences(doc):
line_break = re.compile('[\r\n]')
delimiter = re.compile('[,。?!;]')
sentences = []
for line in line_break.split(doc):
line = line.strip()
if not line:
continue
for sent in delimiter.split(line):
sent = sent.strip()
if not sent:
continue
sentences.append(sent)
return sentences

应该就是以 ,。?!; 这几个标点符号为划分依据的吧~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants