Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add condition to halt_after / halt_before #436

Open
gamarin2 opened this issue Nov 26, 2024 · 10 comments
Open

Add condition to halt_after / halt_before #436

gamarin2 opened this issue Nov 26, 2024 · 10 comments

Comments

@gamarin2
Copy link

gamarin2 commented Nov 26, 2024

Is your feature request related to a problem? Please describe.
Currently, it is not possible to halt the app after a given action iff a condition is met. Here is a motivating example of when it might be needed:

Say you have a chatgpt-like app with various "chats", where each chat is a distinct burr app. When a user starts a new chat, you'd like to have a quick routing step that will detect, based on the first prompt, if the new chat is actually the continuation of a previous chat. If so, we want to halt the current app and restart the relevant app. If not, the app can continue its normal lifecycle.

Describe the solution you'd like
Instead of providing a list of actions to halt_after (or halt_before), provide a list of tuples (action, condition). The application will halt after a given action if and only if condition is met. If no condition is provided, the app will always halt after action is processed (current behavior).

Describe alternatives you've considered
Right now, the alternative is to have a dummy action to halt on and a conditional transition to it.

with_transitions(
    ("routing_action", "continue_action", when(continue=True)),
    ("routing_action, "halt_action", when(continue=False))
)

And then when calling app.run:

final_action, result, final_state = app.run(
            halt_after=["halt_action"],
            inputs={"prompt" : user_input}
        )

This is less clean than the suggested API.

Another possible way to address this would be to enable halting the application from within an action, although it may be harder to implement.

@gamarin2
Copy link
Author

gamarin2 commented Nov 26, 2024

Tangentially, I'm curious how this would impact best practices when it comes to handling human input. For example, if I want to build a basic chat app, I would currently do as suggested in the cookbooks:

@action(reads=[], writes=["prompt", "chat_history"])
def human_input(state: State) -> Tuple[dict, State]:
    chat_item = {
        "content": prompt,
        "role": "user"
    }
    # return the prompt as the result
    # put the prompt in state and update the chat_history
    return (
        {"prompt": prompt},
        state.update(prompt=prompt).append(chat_history=chat_item)
    )

def create_chat_app():
    return (
        ApplicationBuilder().with_actions(
            human_input=human_input,
            ai_response=ai_response
        ).with_transitions(
            ("human_input", "ai_response"),
            ("ai_response", "human_input")
        ).with_state(chat_history=[])
        .with_entrypoint("human_input")
        .build()
    )

def chat():
    app = create_chat_app()
    print("Hi, how can I help you? \n")
    while True:
        user_input = input()

        if user_input.lower() == "q":
            break

        final_action, result, final_state = app.run(
            halt_after=["ai_response"],
            inputs={"prompt" : user_input}
        )

        print("\n" + final_state['response'] + "\n")

If it were possible to halt_after a given action based on a condition, we could do the following instead (I'm omitting ai_response, where you would just add a print statement for the response):

@action(reads=[], writes=["prompt", "chat_history", "halt"])
def human_input(state: State) -> Tuple[dict, State]:
    prompt = input()

    if prompt.lower() == "q":
         return (
            {"prompt": prompt},
            state.update(halt=True)
        )

    chat_item = {
        "content": prompt,
        "role": "user"
    }
    # return the prompt as the result
    # put the prompt in state and update the chat_history
    return (
        {"prompt": prompt},
        state.update(prompt=prompt).append(chat_history=chat_item)
    )

def create_chat_app():
    return (
        ApplicationBuilder().with_actions(
            human_input=human_input,
            ai_response=ai_response
        ).with_transitions(
            ("human_input", "ai_response"),
            ("ai_response", "human_input")
        ).with_state(chat_history=[])
        .with_entrypoint("human_input")
        .build()
    )

def chat():
    app = create_chat_app()
    print("Hi, how can I help you? \n")

    final_action, result, final_state = app.run(
        halt_after=[("human_input", when(halt=True)],
    )

    print("\n" + "Goodbye" + "\n")

Essentially, in that second scenario we'd only halt the app when the user actively asks for it (or quits the website, or switches app). Not sure of all the implications since I'm just starting out with Burr, but it makes sense intuitively for my use case at least.

@skrawcz
Copy link
Contributor

skrawcz commented Nov 26, 2024

@gamarin2 yeah so inputs() works if you're doing a command line application. If you're running Burr in a web-service then that pattern wouldn't work.

halt_after=[("human_input", when(halt=True)]

This is interesting. We'd want to inject this as an implict edge in the graph if you think about visualizing it. It would reduce the need for a "terminal and or input" action which is just a placeholder. This might come with edge cases with respect to restarting. E.g. we'd need to double check what happens if you resume an application after halting (I think it might just work, but then again it might not without augmenting state).

Say you have a chatgpt-like app with various "chats", where each chat is a distinct burr app. When a user starts a new chat, you'd like to have a quick routing step that will detect, based on the first prompt, if the new chat is actually the continuation of a previous chat. If so, we want to halt the current app and restart the relevant app. If not, the app can continue its normal lifecycle.

Can you expand on more how you would determine this? We've built the Burr data model around enabling someone to build the "left hand side" of the ChatGPT interface. E.g. you either click new, or you click into an old conversation. With burr new with be a new app_id, while the other way would be to list_app_ids() for a user (you can also then pull the state outside of the app to introspect) and then pick an app_id to use.

@gamarin2
Copy link
Author

gamarin2 commented Nov 26, 2024

This is interesting. We'd want to inject this as an implict edge in the graph if you think about visualizing it. It would reduce the need for a "terminal and or input" action which is just a placeholder.

Yes, that's the idea!

Can you expand on more how you would determine this? We've built the Burr data model around enabling someone to build the "left hand side" of the ChatGPT interface. E.g. you either click new, or you click into an old conversation. With burr new with be a new app_id, while the other way would be to list_app_ids() for a user (you can also then pull the state outside of the app to introspect) and then pick an app_id to use.

When the user opens chatgpt, they are presented with a chatbox for a new chat. They can click on that and start typing or they can select a previous chat to continue it. The issue is that when the user has many chats, they may not want to scroll or bother finding the one relevant to their current prompt, or they might forget they have started one already, and so they would just start typing immediately. Hence the need for routing.

One way one could implement it is by making a quick call to a small model like gpt-4o-mini with the user prompt and a summary of previous chats, and ask it to decide (Yes/No) whether the current prompt is a logical continuation of an existing chat. If so, then you prompt the user to confirm it is ("It looks like this request belongs to this chat. Do you want to switch?"). If not, you just start the new app.

It may be overkill for chatgpt, but for my use case it makes a lot of sense.

@skrawcz
Copy link
Contributor

skrawcz commented Nov 26, 2024

One way we could implement it is by making a quick call to a small model like gpt-4o-mini with the user prompt and a summary of previous chats, and ask it to decide (Yes/No) whether the current prompt is a logical continuation of an existing chat. If so, then you prompt the user to confirm it is ("It looks like this request belongs to this chat. Do you want to switch?"). If not, you just start the new app.

Oh that's interesting. You'd want to switch the app_id in that case while the graph is running? I think you could wholly replace the current state, but the tracking of lineage of conversation would be difficult given the swapping of IDs, unless you always create a new app_id and all you're doing is just augmenting conversation chat history (but would require you to post process to generate summaries since there would be duplication...) -- which I think would work the way you're thinking about it.

Otherwise to me splitting it into two stages doesn't sounds unreasonable to me.

  1. Decide what to do. This could be a burr graph, but that might be overkill from what you described.
  2. Run the conversation graph with the right app_id or start a new one.

@zilto
Copy link
Collaborator

zilto commented Nov 26, 2024

I built several human-in-the-loop apps and faced similar challenges. I generally prefer a 3rd pattern that more explicitly define I/O of the app.

The input is always a function param (user_input here). This value is automatically tracked in the UI since it's an input. I use a separate user_turn() when I need more involved input sanitization, validation, etc.

@action(reads=[], writes=["chat_history"])
def bot_turn(state: State, user_input: str) -> Tuple[dict, State]:
    system_prompt = ...
    
    # if there's no existing history, start a new history with the system prompt
    chat_history = state.get(
      "chat_history",
      [{"role": system, "content": system_prompt}]
    )
    
    user_message = {"role": "user", "content": user_input}
    
    response = client.chat.completions.create(
      model=...,
      messages=chat_history + [user_message]
    )

    # append the user message and LLm message to history
    return state.append(chat_history=[user_message, response.choices[0].message])

def create_chat_app():
    return (
        ApplicationBuilder()
        .with_actions(bot_turn)
        .with_transitions(("bot_turn", "bot_turn"))
        .with_entrypoint("bot_turn")
        .build()
    )

def chat_in_terminal():
    app = create_chat_app()
    print("Hi, how can I help you? \n")
    while True:
        user_input = input()

        if user_input.lower() == "q":
            break

        final_action, result, final_state = app.run(
            halt_before=["bot_turn"],
            inputs={"user_input" : user_input}
        )

        print("\n" + final_state['response'] + "\n")

Other notes:

  • you need to halt_before to ensure you get the updated inputs
  • doing state.get("history", []) with a default value of an empty list avoids a faulty default state (e.g., you're missing chat_history=[])
  • using input() within an action will limit the portability of the code (e.g., behind a Gradio UI, a FastAPI server). If you use this approach you can always have two implementations human_input__terminal() and human_input__service() and assign the right one depending on context in the ApplicationBuilder().with_actions(human_input=...)

Conditional halting

I like the proposed syntax for app.run(halt_after=[("human_input", when(halt=True)]) that uses a tuple since it reminds of the conditional action API. I think the broader realization here is that we should be able to halt based on State rather than "what is the current action".

Alternative solution

Currently, this is not possible through .run(), but run is just a wrapper of .iterate(), which itself wraps .step(). Here's the full code for .run()

def run(self, ...):
  gen = self.iterate(halt_before=halt_before, halt_after=halt_after, inputs=inputs)
  while True:
    try:
        next(gen)
    except StopIteration as e:
        result = e.value
        return result

So it seems reasonable to create custom "run methods" manually

def run_app(app: Application):
  gen = app.iterate(halt_before=halt_before, halt_after=halt_after, inputs=inputs)
  while True:
    try:
        action_name, result, state = next(gen)
        if state["halt"] is True:
          break  # return something
    except StopIteration as e:
        result = e.value
        return result
  return

Looking directly at the .iterate() API could be helpful for your use case. This will call .step() repeatedly (i.e., execute actions) until you hit the halting action, then you can inspect state

for action, result, state in app.iterate(halt_after=["human_input"]):
  if state["halt"] is True:
     break

Virtual edges

This is interesting. We'd want to inject this as an implict edge in the graph if you think about visualizing it

I don't think there's a reason to modify the graph here. The condition is not on an edge given there's currently no implicit "terminal state action" AFAIK. It simply exhausts the generator by hitting has_next_action() == False

@elijahbenizzy
Copy link
Contributor

One way we could implement it is by making a quick call to a small model like gpt-4o-mini with the user prompt and a summary of previous chats, and ask it to decide (Yes/No) whether the current prompt is a logical continuation of an existing chat. If so, then you prompt the user to confirm it is ("It looks like this request belongs to this chat. Do you want to switch?"). If not, you just start the new app.

Oh that's interesting. You'd want to switch the app_id in that case while the graph is running? I think you could wholly replace the current state, but the tracking of lineage of conversation would be difficult given the swapping of IDs, unless you always create a new app_id and all you're doing is just augmenting conversation chat history (but would require you to post process to generate summaries since there would be duplication...) -- which I think would work the way you're thinking about it.

Otherwise to me splitting it into two stages doesn't sounds unreasonable to me.

  1. Decide what to do. This could be a burr graph, but that might be overkill from what you described.
  2. Run the conversation graph with the right app_id or start a new one.

Agreed with @skrawcz -- there's the solution of halting when a certain condition is met (which I think is, on its own, useful), but to me this is really a question of how you get the data for the prior apps. Something like:

  1. Load up every app into a vector db (either a persister/hook) -- use this to query
  2. Have a burr app (or just a piece of code) that processes the query, and selects the top few (or top one) -- if nothing is within the threshold then return None
  3. Then load that up from where it left off

So it's an indexing layer on top of Burr -- E.G. through a custom persister or some other tool. Then the proposed solution here would help -- E.G. deciding whether to halt, but you don't have to use burr for that (it's kind of a nice-to-have).

What @zilto suggested could work as well -- there are quite a few different constructs that could make it possible (halt_before is often a good workaround).

Regarding the syntax -- I like it quite a bit. There's a nice internal construct we have that's hidden -- the prior action is actually stored in state, and no reason we couldn't store the net action. Then everything becomes a condition of state -- E.G. halt when state.__prior_action == "human_input" AND state.halt == True or something like that. It could simplify implementation and increase visibility...

@zilto
Copy link
Collaborator

zilto commented Nov 26, 2024

Agreed with @skrawcz -- there's the solution of halting when a certain condition is met (which I think is, on its own, useful), but to me this is really a question of how you get the data for the prior apps. Something like:
Load up every app into a vector db (either a persister/hook) -- use this to query
Have a burr app (or just a piece of code) that processes the query, and selects the top few (or top one) -- if nothing is within the > threshold then return None
Then load that up from where it left off

One thing I sketched up and could explore further:

  • persisted state as Pydantic models
  • embed Pydantic objects into LanceDB (easy via their integrations); specify the fields to embed
  • user-facing API to query these persisted results

This eventually leads to the idea of a "memory" layer. It could be vector DB, full-text search, image comparison, custom rankers, etc. for retrieval

@elijahbenizzy
Copy link
Contributor

Agreed with @skrawcz -- there's the solution of halting when a certain condition is met (which I think is, on its own, useful), but to me this is really a question of how you get the data for the prior apps. Something like:
Load up every app into a vector db (either a persister/hook) -- use this to query
Have a burr app (or just a piece of code) that processes the query, and selects the top few (or top one) -- if nothing is within the > threshold then return None
Then load that up from where it left off

One thing I sketched up and could explore further:

  • persisted state as Pydantic models
  • embed Pydantic objects into LanceDB (easy via their integrations); specify the fields to embed
  • user-facing API to query these persisted results

This eventually leads to the idea of a "memory" layer. It could be vector DB, full-text search, image comparison, custom rankers, etc. for retrieval

Yep, something like this could be a nice solution -- curious whether it would be a separate layer, an implementation of the persister, or just a hook/stuff in the action...

@zilto
Copy link
Collaborator

zilto commented Nov 26, 2024

Yep, something like this could be a nice solution -- curious whether it would be a separate layer, an implementation of the persister, or just a hook/stuff in the action...

The main design question is: when and how do you call memory?

  • in this case, it seems to be a regular "search" feature that's decoupled from the app. As said in this thread, it could be modelled as a function, a Burr action, or an entire separate Burr Application
  • the more "coupled" memory is when you want actions to be able to access previous states.

If I was building for a production use case, I would have two Burr Application services:

  • the main application: handles the core logic, the persister calls the memory layer, so these operations are non-blocking
  • the memory layer: close to storage, can batch indexing and handle row-level security for data from multiple users
    I think this approach is promising given how it aligns with Anthropic's MCP

Implementation:

  • a Burr hook could send state to the memory layer under specific conditions (doesn't need to be on every action); Maybe the persister is better here, but I don't know the implementation
  • I brings back the broader question of "how to deal with State that becomes large in size". We need an index/pointer-based system such that State only holds references that are sufficient to query, evaluate, conditions, etc.

@gamarin2
Copy link
Author

Oh that's interesting. You'd want to switch the app_id in that case while the graph is running? I think you could wholly replace the current state, but the tracking of lineage of conversation would be difficult given the swapping of IDs, unless you always create a new app_id and all you're doing is just augmenting conversation chat history (but would require you to post process to generate summaries since there would be duplication...) -- which I think would work the way you're thinking about it.

Yes, the idea is to always create an app ID. Then, depending on the result of the router, either continue or halt and ditch the app for the more relevant one.

Otherwise to me splitting it into two stages doesn't sounds unreasonable to me.

I guess so. You do lose some of the benefits of graph vizualisation and tracing though (the first LLM call isn't part of your state machine in that scenario).

Agreed with @skrawcz -- there's the solution of halting when a certain condition is met (which I think is, on its own, useful), but to me this is really a question of how you get the data for the prior apps.

I mean it depends on the use case, but as far as my app is concerned, my (naive) solution would be to ask gpt-4o-mini to produce a summary of the chat at various logical halting points and store that in my User class (outside the state). Then I would just pass these summaries as inputs whenever I restart the app.

If we're talking about indexing state for the same app at various points in the lifecycle and making that accessible, then that's a whole separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants