Skip to content

Commit

Permalink
Python: Enhance prompt to emphasise that it is a script rather than a…
Browse files Browse the repository at this point in the history
… notebook (#1218)

* Python: Enhance prompt to emphasise that it is a script rather than a notebook

* more python prompt improvements
  • Loading branch information
jjallaire authored Feb 5, 2025
1 parent 160e5b1 commit d17645c
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
- OpenAI: Map some additional 400 status codes to `content_filter` stop reason.
- Anthropic: Handle 413 status code (Payload Too Large) and map to `model_length` StopReason.
- Tasks: Log sample with error prior to raising task-ending exception.
- Python: Enhance prompt to emphasise that it is a script rather than a notebook.
- Computer: Various improvements to image including desktop, python, and VS Code configuration.
- Bugfix: Don't download full log from S3 for header_only reads.

Expand Down
35 changes: 33 additions & 2 deletions src/inspect_ai/tool/_tools/_execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,39 @@ async def execute(code: str) -> str:
"""
Use the python function to execute Python code.
The python function will only return you the stdout of the script,
so make sure to use print to see the output.
The Python tool executes single-run Python scripts. Important notes:
1. Each execution is independent - no state is preserved between runs
2. You must explicitly use print() statements to see any output
3. Simply writing expressions (like in notebooks) will not display results
4. The script cannot accept interactive input during execution
5. Return statements alone won't produce visible output
6. All variables and imports are cleared between executions
7. Standard output (via print()) is the only way to see results
Examples:
INCORRECT (notebook style):
x = 5
x * 2 # Won't show anything
return x * 2 # Won't show anything
[1, 2, 3] # Won't show anything
CORRECT:
x = 5
print(x * 2) # Will show: 10
result = x * 2
print(result) # Will show: 10
print([1, 2, 3]) # Will show: [1, 2, 3]
INCORRECT (assuming previous imports persist):
# First run:
import numpy as np
# Second run:
arr = np.array([1, 2, 3]) # This will fail - numpy not imported in this run
CORRECT (each run is self-contained):
import numpy as np
arr = np.array([1, 2, 3])
print(arr) # Will show: [1 2 3]
Args:
code (str): The python code to execute.
Expand Down

0 comments on commit d17645c

Please sign in to comment.