You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the 'web' data load the chunking will sometime split text in the middle of a sentence. I'd like to have an option to pass in a css selector that pulls out a list of paragraphs from the source page. Then each complete paragraph could be its own chunk.
In the screenshots below the source article content is an array of paragraphs extracted from HTML using selectors. Each item is then turned into its own document with embedding.
The text was updated successfully, but these errors were encountered:
When using the 'web' data load the chunking will sometime split text in the middle of a sentence. I'd like to have an option to pass in a css selector that pulls out a list of paragraphs from the source page. Then each complete paragraph could be its own chunk.
See an example here: https://github.com/JohnGUnderwood/atlas-news-search/blob/main/backend/packages/main.py#L177-L198
In the screenshots below the source article content is an array of paragraphs extracted from HTML using selectors. Each item is then turned into its own document with embedding.
The text was updated successfully, but these errors were encountered: