Skip to content

Commit

Permalink
Extract sentence splitting in SemanticChunker into a private method
Browse files Browse the repository at this point in the history
This change allows users to easily override splitting the text into
sentences in the SemanticChunker, which allows them to use their own
sentence splitting algorithm.
  • Loading branch information
levara committed Dec 22, 2024
1 parent dce0640 commit 752d54f
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion libs/experimental/langchain_experimental/text_splitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,12 +208,15 @@ def _calculate_sentence_distances(

return calculate_cosine_distances(sentences)

def _get_single_sentences_list(self, text: str) -> List[str]:
return re.split(self.sentence_split_regex, text)

def split_text(
self,
text: str,
) -> List[str]:
# Splitting the essay (by default on '.', '?', and '!')
single_sentences_list = re.split(self.sentence_split_regex, text)
single_sentences_list = self._get_single_sentences_list(text)

# having len(single_sentences_list) == 1 would cause the following
# np.percentile to fail.
Expand Down

0 comments on commit 752d54f

Please sign in to comment.