You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ERROR IN split_and_upload(): Traceback: [<FrameSummary file /app/ai_ta_backend/vector_database.py, line 780 in split_and_upload>, <FrameSummary file /opt/venv/lib/python3.8/site-packages/langchain/text_splitter.py, line 136 in create_documents>, <FrameSummary file /opt/venv/lib/python3.8/site-packages/langchain/text_splitter.py, line 687 in split_text>, <FrameSummary file /opt/venv/lib/python3.8/site-packages/langchain/text_splitter.py, line 669 in _split_text>, <FrameSummary file /opt/venv/lib/python3.8/site-packages/langchain/text_splitter.py, line 250 in _tiktoken_encoder>, <FrameSummary file /opt/venv/lib/python3.8/site-packages/tiktoken/core.py, line 117 in encode>, <FrameSummary file /opt/venv/lib/python3.8/site-packages/tiktoken/core.py, line 351 in raise_disallowed_special_token>]
❌❌ Error in split_and_upload:Encountered text corresponding to disallowed special token '<|endoftext|>'.
The text was updated successfully, but these errors were encountered:
This is a particular problem with scraping github repos related to AI... need to have a try catch for now and do a better fix later. Somehow sanitize special tokens. That's the new sql injection.
The text was updated successfully, but these errors were encountered: