-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional crash dumping on ConnectionError #1077
Conversation
python/langsmith/client.py
Outdated
@@ -742,6 +746,14 @@ def _content_above_size(self, content_length: Optional[int]) -> Optional[str]: | |||
if size_limit is None: | |||
return None | |||
if content_length > size_limit: | |||
should_debug_crash_dump = os.getenv("LANGSMITH_DEBUG_CRASH_DUMP") in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use get_env_var
in utils to cache this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt of having the value be a file path?
LANGSMITH_DEBUG_CRASH_DUMP=my_file.jsonl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I like that!
python/langsmith/client.py
Outdated
"true", | ||
] | ||
if should_debug_crash_dump and request is not None: | ||
with open("content_size_limit_crash_dump.jsonl", "a") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to specify 'utf-8'
as the encoding.
Also, let's use gzip - by definition these will get pretty big pretty fast
python/langsmith/client.py
Outdated
"1", | ||
"true", | ||
] | ||
if should_debug_crash_dump and request is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't belong in a function that checks content length - it should go outside as the next step.
Also I think we would benefit from seeing other things in the dump even if the payload size doesn't surpass (since these are also of interest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should always dump? Had intended this to only dump if we exceeded the content length limit (and also potentially dump here if other errors occur)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only on error, but this is intended for debugging connection errors, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes it's a connection error but specifically the one triggered by the content length limit.
Failed to batch ingest runs: langsmith.utils.LangSmithConnectionError: Connection error caused failure to POST https://api.smith.langchain.com/runs/batch in LangSmith API. The content length of 270646872 bytes exceeds the maximum size limit of 20971520 bytes.
Their input and output tokens and traces seem to be pretty small, so we wanted to see what else was in their payload which was causing them to exceed 20mb.
We can also log more generically on ConnectionError though if you think that's better! It does make sense to log for other types of connection errors too
Co-authored-by: William FH <[email protected]>
python/langsmith/client.py
Outdated
try: | ||
request_data: ( | ||
requests.Request | requests.PreparedRequest | ||
) = copy.deepcopy(e.request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid deep copies. Especially just so we can put a masked API key in the headers
) | ||
if debug_crash_dump_file is not None and e.request is not None: | ||
try: | ||
headers = e.request.headers.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hinthornw Let me know if you agree with this approach.
Made a shallow copy of the headers so we could get rid of the x-api-key, but log the rest of the headers.
Manually added all of the fields which I think are relevant here. This is to prevent us accidentally logging other sensitive fields (if any).
else None | ||
), | ||
} | ||
with gzip.open( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you easily open a gzip file that has been appended to multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looked into it a bit, it is a bit of a pain to read from multiple streams.
Do you think an acceptable alternative is to write to a file with a certain timestamp appended to the file path?
We can probably turn the env variable back to a boolean, and then use a fixed static name like debug_crash_dump_{timestamp}
. wdyt? @nfcampos
@@ -931,6 +932,43 @@ def request_with_retries( | |||
filler = "*" * (max(0, len(api_key) - 7)) | |||
masked_api_key = f"{prefix}{filler}{suffix}" | |||
|
|||
debug_crash_dump_file = ls_utils.get_env_var( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we read env var once in global scope and then just use that here (reading env vars is expensive)
No longer necessary due to multi-part endpoint |
Crash dump on ConnectionError if a global variable is provided.
The goal is to see inside of a request without logging sensitive fields