Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Enhance prompt to emphasise that it is a script rather than a notebook #1218

Merged
merged 4 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
- OpenAI: Map some additional 400 status codes to `content_filter` stop reason.
- Anthropic: Handle 413 status code (Payload Too Large) and map to `model_length` StopReason.
- Tasks: Log sample with error prior to raising task-ending exception.
- Python: Enhance prompt to emphasise that it is a script rather than a notebook.
- Computer: Various improvements to image including desktop, python, and VS Code configuration.
- Bugfix: Don't download full log from S3 for header_only reads.

Expand Down
35 changes: 33 additions & 2 deletions src/inspect_ai/tool/_tools/_execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,39 @@ async def execute(code: str) -> str:
"""
Use the python function to execute Python code.

The python function will only return you the stdout of the script,
so make sure to use print to see the output.
The Python tool executes single-run Python scripts. Important notes:
1. Each execution is independent - no state is preserved between runs
2. You must explicitly use print() statements to see any output
3. Simply writing expressions (like in notebooks) will not display results
4. The script cannot accept interactive input during execution
5. Return statements alone won't produce visible output
6. All variables and imports are cleared between executions
7. Standard output (via print()) is the only way to see results

Examples:
INCORRECT (notebook style):
x = 5
x * 2 # Won't show anything
return x * 2 # Won't show anything
[1, 2, 3] # Won't show anything

CORRECT:
x = 5
print(x * 2) # Will show: 10
result = x * 2
print(result) # Will show: 10
print([1, 2, 3]) # Will show: [1, 2, 3]

INCORRECT (assuming previous imports persist):
# First run:
import numpy as np
# Second run:
arr = np.array([1, 2, 3]) # This will fail - numpy not imported in this run

CORRECT (each run is self-contained):
import numpy as np
arr = np.array([1, 2, 3])
print(arr) # Will show: [1 2 3]

Args:
code (str): The python code to execute.
Expand Down