Skip to content

Commit

Permalink
fix possible mismatch in the reconstructing the training data
Browse files Browse the repository at this point in the history
  • Loading branch information
lfoppiano committed Aug 23, 2023
1 parent fc6a6a5 commit 0745539
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ protected Element getParentElement(Element body, String previousParagraphId, Str
} else {
parent = previousParent;
}


return parent;
}
Expand Down Expand Up @@ -164,7 +164,7 @@ protected Element trainingExtraction(List<Span> spanList, List<LayoutToken> toke
p.appendChild(entityElement);

// We stop the process if something doesn't match
int accumulatedOffset = startPosition + length(contentBefore) + length(name);
int accumulatedOffset = startPosition + length(contentBefore) + LayoutTokensUtil.toText(superconductor.getLayoutTokens()).stripTrailing().length();
if (end != accumulatedOffset) {
throw new RuntimeException("Wrong synchronisation between entities and layout tokens. End entity offset: " + end
+ " different from the expected offset: " + accumulatedOffset);
Expand Down

0 comments on commit 0745539

Please sign in to comment.