Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional crash dumping on ConnectionError #1077

Closed
wants to merge 22 commits into from
Closed

Conversation

nhuang-lc
Copy link
Contributor

@nhuang-lc nhuang-lc commented Oct 8, 2024

Crash dump on ConnectionError if a global variable is provided.

The goal is to see inside of a request without logging sensitive fields

@nhuang-lc nhuang-lc changed the title Add crush dumping if content size limit exceeded Add crash dumping if content size limit exceeded Oct 8, 2024
@nhuang-lc nhuang-lc requested a review from hinthornw October 8, 2024 20:58
@@ -742,6 +746,14 @@ def _content_above_size(self, content_length: Optional[int]) -> Optional[str]:
if size_limit is None:
return None
if content_length > size_limit:
should_debug_crash_dump = os.getenv("LANGSMITH_DEBUG_CRASH_DUMP") in [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use get_env_var in utils to cache this

Copy link
Collaborator

@hinthornw hinthornw Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt of having the value be a file path?
LANGSMITH_DEBUG_CRASH_DUMP=my_file.jsonl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I like that!

"true",
]
if should_debug_crash_dump and request is not None:
with open("content_size_limit_crash_dump.jsonl", "a") as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to specify 'utf-8' as the encoding.
Also, let's use gzip - by definition these will get pretty big pretty fast

"1",
"true",
]
if should_debug_crash_dump and request is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't belong in a function that checks content length - it should go outside as the next step.

Also I think we would benefit from seeing other things in the dump even if the payload size doesn't surpass (since these are also of interest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should always dump? Had intended this to only dump if we exceeded the content length limit (and also potentially dump here if other errors occur)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only on error, but this is intended for debugging connection errors, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes it's a connection error but specifically the one triggered by the content length limit.

Failed to batch ingest runs: langsmith.utils.LangSmithConnectionError: Connection error caused failure to POST https://api.smith.langchain.com/runs/batch in LangSmith API. The content length of 270646872 bytes exceeds the maximum size limit of 20971520 bytes.

Their input and output tokens and traces seem to be pretty small, so we wanted to see what else was in their payload which was causing them to exceed 20mb.

We can also log more generically on ConnectionError though if you think that's better! It does make sense to log for other types of connection errors too

@nhuang-lc nhuang-lc requested a review from hinthornw October 9, 2024 16:58
try:
request_data: (
requests.Request | requests.PreparedRequest
) = copy.deepcopy(e.request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid deep copies. Especially just so we can put a masked API key in the headers

)
if debug_crash_dump_file is not None and e.request is not None:
try:
headers = e.request.headers.copy()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hinthornw Let me know if you agree with this approach.

Made a shallow copy of the headers so we could get rid of the x-api-key, but log the rest of the headers.

Manually added all of the fields which I think are relevant here. This is to prevent us accidentally logging other sensitive fields (if any).

@nhuang-lc nhuang-lc requested a review from hinthornw October 10, 2024 02:38
@nhuang-lc nhuang-lc changed the title Add crash dumping if content size limit exceeded Add optional crash dumping on ConnectionError Oct 14, 2024
else None
),
}
with gzip.open(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you easily open a gzip file that has been appended to multiple times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked into it a bit, it is a bit of a pain to read from multiple streams.

Do you think an acceptable alternative is to write to a file with a certain timestamp appended to the file path?

We can probably turn the env variable back to a boolean, and then use a fixed static name like debug_crash_dump_{timestamp}. wdyt? @nfcampos

@@ -931,6 +932,43 @@ def request_with_retries(
filler = "*" * (max(0, len(api_key) - 7))
masked_api_key = f"{prefix}{filler}{suffix}"

debug_crash_dump_file = ls_utils.get_env_var(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we read env var once in global scope and then just use that here (reading env vars is expensive)

@nhuang-lc
Copy link
Contributor Author

No longer necessary due to multi-part endpoint

@nhuang-lc nhuang-lc closed this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants