Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to reduce context window #5193

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

drbarq
Copy link

@drbarq drbarq commented Nov 22, 2024

This PR adds functionality to allow users to set a lower context window than their LLMs maximum context window size. This is useful when the models performance degrades significantly with larger context windows, allowing users to optimize the tradeoff between context length and model performance.

Background

With PR #4977 being merged in version 0.14, we now support dynamic context window sizes. This PR builds on that by allowing users to manually set a lower context window size than their LLMs maximum, which can be beneficial in cases where:

  • The model shows performance degradation with very large contexts
  • Users want to optimize for speed vs context length
  • Memory or computational resources need to be conserved

Changes

  • Add token count checking before adding new events to prevent exceeding the window
  • Implement truncation logic when token count would exceed the configured limit
  • Improve handling of first user message preservation to maintain conversation coherence
  • Add comprehensive test case for context window parameter truncation

Configuration

Users can set max_input_tokens in two ways:

  1. Through config.toml:
[llm]
max_input_tokens = 20000
  1. Through environment variables:
export LLM_MAX_INPUT_TOKENS=20000

Implementation Details

  • Token count is checked before adding each new event
  • If adding an event would exceed the context window:
    1. The history is truncated using _apply_conversation_window
    2. Action-observation pairs are kept together
    3. The first user message is always preserved
  • The truncation is done in a way that maintains conversation coherence

Testing

Added new test case test_context_window_parameter_truncation that verifies:

  • Token count checking works correctly
  • Truncation occurs when limit would be exceeded
  • First user message is preserved
  • Action-observation pairs stay together

This implements and enhances the changes from PR #5079.

- Add max_input_tokens parameter to AgentController to allow setting a lower context window
- Add token count checking and truncation when adding new events
- Improve handling of first user message preservation
- Add test case for context window parameter truncation
@enyst
Copy link
Collaborator

enyst commented Nov 22, 2024

Thank you for taking this up, @drbarq ! For the record, the reason for this PR "switch" is here.

- Remove max_input_tokens parameter from AgentController constructor
- Use LLM configuration system to set max_input_tokens through config.toml or environment variables
- Update test to set max_input_tokens directly in LLM config

Users can now set max_input_tokens in two ways:
1. Through config.toml:
   [llm]
   max_input_tokens = 20000

2. Through environment variables:
   export LLM_MAX_INPUT_TOKENS=20000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants