Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 998 Bytes

README.md

File metadata and controls

25 lines (18 loc) · 998 Bytes

VW in a langchain chain

Install requirements.txt

VowpalWabbit

There is an example notebook (rl_chain.ipynb) with basic usage of the chain.

TLDR:

  • Chain is initialized and creates a Vowpal Wabbit instance - only Contextual Bandits and Slates are supported for now
  • You can change the arguments at chain creation time
  • There is a default prompt but it can be changed
  • There is a default reward function that gets triggered and triggers learn automatically
    • This can be turned off and score can be spcified explicitly

Flow:

  • Developer: creates chain
  • Developer: sets actions
  • Developer: calls chain with context and other prompt inputs
  • Chain: calls VW with the context and selects an action
  • Chain: action (and other vars) are passed to the LLM with the prompt
  • Chain: if default reward set, the LLM is called to judge and give a reward score of the response based on the context
  • Chain: VW learn is triggered with that score