Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement the support of configurable yaml #68

Merged
merged 17 commits into from
Jul 5, 2024

Conversation

fengsh27
Copy link
Collaborator

@fengsh27 fengsh27 commented Jun 24, 2024

Abstract

In this submission, we introduced the support of configurable yaml production.yml. Here is a sample yaml file:

KnowledgeGraph:
  enabled: true             # enable/disable KnowledgeGraph agent
  servers:
  - server: pole-kg         # server name, used for display, required
    address: biocypher      # server address, host name or ip address, required
    port: 7687              # server port, optional, default value "7687"
  - server: genomicKB
    address: xxx.xxx.xxx
    port: 47687

VectorStore:
  enabled: true               # enable/disable VectorStore agent
  servers:
  - server: local             # server name, used for display, required
    address: standalone       # server address, host name or ip address, required
    port: 19530               # server port, optional, default value "19530"
  - server: remote
    address: xxx.xxx.xxx
    port: 19530

Text:
  Welcome:
    Title: Cancer Geneticist Assistant
    Disclaimer: "This is a use case demonstration, not a final product. The data and information provided here are synthetic (in the case of patient data) or limited by demo access (in the case of the OncoKB API)."
    About:
      Title: About
      Citation: {Citation}
      ListTitle: {List Title}
      ListItems:
        - {List Item 1}
        - {List Item 2}
        - {List Item 3}
      Heading2: {Heading2}
      Models: {Models Info}
  Masks:
    - name: Cancer Genetics Assistant
      avatar: "🧑‍🔬"
      context:
        - id: "cancer-genetics-assistant-1"
          role: "system"
          content: "You are an assistant to a cancer geneticist."
          date: ""
        - id: "cancer-genetics-assistant-2"
          role: "system"
          content: "Your role is to assist the user in the task of analysing patient data and prioritising treatments."
          date: ""
        - id: "cancer-genetics-assistant-3"
          role: "system"
          content: "You can ask the user to provide explanations and more background at any time."
          date: ""
      modelConfig:
        model: gpt-3.5-turbo
        temperature: 0
        max_tokens: 2000
        presence_penalty: 0
        frequency_penalty: 0
        sendMemory: true
        historyMessageCount: 4
        compressMessageLengthThreshold: 2000
      lang: en
      builtin: true
      createdAt: 1697222692762
...

If there are multiple servers in this yaml file, users can select from them:
image

And, we can also override the contents in Help page and the persona with this configurable YAML file (see Text>Welcome and Text>Mask parts in the above sample YAML file).

To customize Biochatter-Next with this YAML file, we need to specify the location of our YAML file in docker-compose.yml:

...
services:
  biochatter-next:
    image: biocypher/biochatter-next:0.1.1
    ports:
      - 3000:3000
    env_file:
      - .env
    volumes:
      - ./tmp:/tmp
      - ./app-config:/app-config            # mount the folder containing our YAML file (production.yml) to the container
    environment:
      - BASE_URL=http://flask-app:5001
      - CUSTOM_BIOCHATTER_NEXT_FILE=/app-config/production.yml  # tell BioChatter-Next where the YAML file is
    depends_on:
      - flask-app
 ...

@fengsh27 fengsh27 requested a review from slobentanzer June 24, 2024 18:00
@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jun 24, 2024

@slobentanzer , I don't include prompts and welcome text into the yaml file. I think they have already been configurable in app/locales/, app/masks/ and public/prompts.json.

Copy link
Contributor

@slobentanzer slobentanzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fengsh27 this looks great already. However, with regard to the yaml config: this feature is aimed at the greatest user-friendliness, so I think adding them to the config is the best choice. I would not expect this type of user to easily find several places in the existing app to configure the text fragments. I think the important ones we should include (also for the demonstration in the paper use case) are:

  • Welcome text (so the end user of the app knows what it does)
  • System prompt (so the developer is able to tune the model according to the use case; this also means deactivation of the builtin prompt templates, so the end user does not change it to something unrelated)
  • maybe a configuration of the tools the RAG agent should be able to use (this is a bit dependent on the implementation of the API agent, so let's hold off on this for now)

Could you please add that? In terms of behaviour, I would probably prefer setting the app up to show a reduced interface (no selection of personas) if this setting is present in the YAML configuration.

@fengsh27
Copy link
Collaborator Author

@slobentanzer Sure, I'll add welcome text and system prompts to configurable yaml file. And regarding RAG agent, we can modify the configuration after API agent is integrated into BioChatter.

@fengsh27 fengsh27 requested a review from slobentanzer June 25, 2024 21:33
@slobentanzer
Copy link
Contributor

@fengsh27 thanks! Would it be possible to have the configuration more flexible? I updated the config for the use case (see recent commit) and didn't have any use for these "What" and "How" messages, but I think removing them leads to an error. If we need to predefine fields, maybe it would be good to keep the end-user configuration simple; I don't think the use cases will have much use for things like the "What/How" messages. Is this possible?

@fengsh27
Copy link
Collaborator Author

No problem, let me fix it

@slobentanzer
Copy link
Contributor

Works now, thanks! Next request: can the components that we do not define be removed?

Currently, the app shows me the what/how messages, which probably isn't desired in the case of deploying a custom model. Can we organise it like this:

  • there is an environment variable (CUSTOM_BIOCHATTER_NEXT=true/false), passed to the docker compose, which determines whether we launch the default biochatter-next or the custom one based on the definition
  • if CUSTOM_BIOCHATTER_NEXT=false, we simply display our BioChatter default, which explains what it is and how to develop your own adaptation of if (the state before this PR)
  • if CUSTOM_BIOCHATTER_NEXT=true, we display only the content of the configuration (could be one or two paragraphs with headers, the notification box if desired, a citation), removing all default components that are not in the config

Sounds reasonable?

@fengsh27
Copy link
Collaborator Author

@slobentanzer , I've submitted a change to support CUSTOM_BIOCHATTER_NEXT for controlling welcome display, please have a look to ensure this meets your expectation.

@slobentanzer
Copy link
Contributor

@fengsh27 thanks. :)

The default welcome message is still overridden by the content of the YAML if I set CUSTOM_BIOCHATTER_NEXT to false. It should revert to the standard message we had before for the whole welcome page if it's set to false, ignoring the customisation (which includes restricting the masks, setting the database IPs, etc). If set to false, everything should be as it was before the PR.

@fengsh27
Copy link
Collaborator Author

@fengsh27 thanks. :)

The default welcome message is still overridden by the content of the YAML if I set CUSTOM_BIOCHATTER_NEXT to false. It should revert to the standard message we had before for the whole welcome page if it's set to false, ignoring the customisation (which includes restricting the masks, setting the database IPs, etc). If set to false, everything should be as it was before the PR.

make sense

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jun 27, 2024

@fengsh27 thanks. :)
The default welcome message is still overridden by the content of the YAML if I set CUSTOM_BIOCHATTER_NEXT to false. It should revert to the standard message we had before for the whole welcome page if it's set to false, ignoring the customisation (which includes restricting the masks, setting the database IPs, etc). If set to false, everything should be as it was before the PR.

make sense

@slobentanzer , I have a concern that, if we set CUSTOM_BIOCHATTER_NEXT to false, should we use kg servers info and vector store servers info in yaml file, or the Mask in yaml file?
I'm wondering if this CUSTOM_BIOCHATTER_NEXT is necessary, if users don't like to customize items (like About, Disclaimer, servers info in Welcome), they can comment out or remove them.
I think this way would be clearer to users, that is, configurable yaml has higher priority. If item is not specified in yaml file, it will use default value. If Mask is not specified in yaml file, it would use empty mask.

What's your opinion on this issue?

@slobentanzer
Copy link
Contributor

The two use cases for me are:

  • standard deployment, as it is before this pull request. This is 'our' version of BioChatter Next, as a demo for the framework's capabilities (we can only show one very generic use case, because we don't want to assume too much about the individual user).
  • custom deployment based on user preferences: this should be easily configurable via the YAML.

Why I suggested the env variable to distinguish these two cases: If we didn't have that, we would have to include all personas in the YAML configuration, right? Otherwise, the standard deployment would just have one persona. We do want to include the YAML file with the repository to make it easy for the user to adjust the configuration. However, if we then want to deploy the standard deployment on our server as it is now, we'd have to account for all those settings in that yaml as an application (including these "What/How" messages that are not really relevant for individual custom deployments, all of the masks we want to display, etc). How would you suggest to handle these two cases?

Also, thinking further towards having a BioCypher repository such as the decider-genetics use case, how do we implement the configuration in a Docker compose workflow that pulls the BioChatter Next image from the Dockerhub? The YAML would have to be located in the decider-genetics repository, correct? Should we just have an environment variable for defining the location of the config YAML in the docker compose, and if it is not set, we display the default BioChatter Next?

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jun 27, 2024

Good proposal to specify a env for defining the location of the config YAML in the docker compose. But, I don't think docker compose has the ability to copy file into container. It's an issue how to implement it.

Currently, the only way for users to customize is to sync the BioChatter-Next repo and modify YAML file, then run docker compose build.

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jun 27, 2024

Or in docker-compose.yml, we combine volume and env:

volumes:
  - /user/yaml/path:/configuration
environments:
  - CUSTOM_FILE: /configuration/app-config.yml

So, if CUSTOM_FILE is set, we will display customized content.
Do you have any idea?

@slobentanzer
Copy link
Contributor

And if we define the custom configuration in an env file instead of a YAML? It is only a couple of string values, after all. Then all the values could be passed to the container via the environment. That might get messy though, no dictionary structure. Could we have a script that parses the YAML into environment variables?

@slobentanzer
Copy link
Contributor

I just now understood your proposal. You mean that we would preconfigure the /configuration/ folder in the docker, so the only change the user needs to make is to specify the config YAML? That probably works.

@slobentanzer
Copy link
Contributor

BioCypher repositories have a config directory already, anyways. We could put it there.

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jun 27, 2024

The two use cases for me are:

  • standard deployment, as it is before this pull request. This is 'our' version of BioChatter Next, as a demo for the framework's capabilities (we can only show one very generic use case, because we don't want to assume too much about the individual user).
  • custom deployment based on user preferences: this should be easily configurable via the YAML.

Why I suggested the env variable to distinguish these two cases: If we didn't have that, we would have to include all personas in the YAML configuration, right? Otherwise, the standard deployment would just have one persona. We do want to include the YAML file with the repository to make it easy for the user to adjust the configuration. However, if we then want to deploy the standard deployment on our server as it is now, we'd have to account for all those settings in that yaml as an application (including these "What/How" messages that are not really relevant for individual custom deployments, all of the masks we want to display, etc). How would you suggest to handle these two cases?

Also, thinking further towards having a BioCypher repository such as the decider-genetics use case, how do we implement the configuration in a Docker compose workflow that pulls the BioChatter Next image from the Dockerhub? The YAML would have to be located in the decider-genetics repository, correct? Should we just have an environment variable for defining the location of the config YAML in the docker compose, and if it is not set, we display the default BioChatter Next?

Regarding the two use cases you mentioned:
1). standard deployment, we can provide empty YAML file or not provide YAML. In this case, we will go with standard deployment.

2). customized deployment:
In app-config.yml, we can override standard deployment like this:

KnowledgeGraph:
  enabled: true
  servers:
    - server: pole-kg
      address: biocypher
      port: 7687

VectorStore:
  enabled: true
  servers:
    - server: local
      address: standalone
      port: 19530

Text:
  Welcome:
    Title: balahbalah
    Disclaimer: balahbalah
    About:
      Title: balahbalah
      Citation: balahbalah
      ListTitle: balahbalah
      ListItems:
        - balahbalah
        - balahbalah
      Heading2: balahbalah
      Models: balahbalah
    How: # Here we override with empty content, that is, it won't be displayed
    What: # Here we override with empty content, that is, it won't be displayed
  Mask:
    balahbalah

Is this feasible?

@slobentanzer
Copy link
Contributor

I think empty YAML is problematic for the default; we should instead show default if the environment variable CUSTOM_FILE is not provided.

Otherwise you run into the issue that you cannot distinguish between showing the default and removing the element (removing all elements in the case of an empty YAML), what we encountered above when I wanted to remove the "What/How" messages.

@fengsh27
Copy link
Collaborator Author

I think empty YAML is problematic for the default; we should instead show default if the environment variable CUSTOM_FILE is not provided.

Otherwise you run into the issue that you cannot distinguish between showing the default and removing the element (removing all elements in the case of an empty YAML), what we encountered above when I wanted to remove the "What/How" messages.

Make sense. So, let's implement it in this way. I'll let you know if I have any questions

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 1, 2024

@slobentanzer I've made a code change to support to set custom YAML file via docker-compose.yml. Please have a look.

Copy link
Contributor

@slobentanzer slobentanzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fengsh27 I tried customising the yaml file and it seems it is hardcoded in several locations, this one and maybe in line 50 of constants.ts (although I don't know if that one relates to the directory or is just a name). I would prefer to have config/ as the directory name to align with biocypher project structure, see my commit.

next.config.mjs Outdated
config.plugins.push(
new CopyPlugin({
patterns: [
{ from: 'app-config/production.yml', to: 'app-config/production.yml' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded file location (user states yaml name in env variable)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcoded file location is not used anymore, in app/config/route.ts, it will read env CUSTOM_BIOCHATTER_NEXT_FILE to get custom file location.

I'll remove this line later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker compose build fails for me due to that line:

 > [biochatter-next builder 5/5] RUN yarn build:
0.154 yarn run v1.22.19
0.171 $ cross-env BUILD_MODE=standalone next build
0.427 [Next] build mode standalone
0.428 [Next] build with chunk:  true
0.435 - warn You have enabled experimental feature (forceSwcTransforms) in next.config.mjs.
0.435 - warn Experimental features are not covered by semver, and may cause unexpected or broken application behavior. Use at your own risk.
0.435
0.441 Attention: Next.js now collects completely anonymous telemetry regarding usage.
0.441 This information is used to shape Next.js' roadmap and prioritize features.
0.441 You can learn more, including how to opt-out if you'd not like to participate in this anonymous program, by visiting the following URL:
0.441 https://nextjs.org/telemetry
0.441
0.483 - info Creating an optimized production build...
1.691 Browserslist: caniuse-lite is outdated. Please run:
1.691   npx browserslist@latest --update-db
1.691   Why you should do it regularly: https://github.com/browserslist/browserslist#browsers-data-updating
2.449 Warning: For production Image Optimization with Next.js, the optional 'sharp' package is strongly recommended. Run 'yarn add sharp', and Next.js will use it automatically for Image Optimization.
2.449 Read more: https://nextjs.org/docs/messages/sharp-missing-in-production
24.49 Failed to compile.
24.49
24.49 unable to locate '/app/app-config/production.yml' glob
24.49
24.50
24.50 > Build failed because of webpack errors
24.55 error Command failed with exit code 1.
24.55 info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
------
failed to solve: process "/bin/sh -c yarn build" did not complete successfully: exit code: 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be aware that I changed the location and file name of the config (and specified in .env)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me remove the line

BTW, to make customization work in docker compose workflow, you don't need to modify .env.

To customize Biochatter-Next with this YAML file, we need to specify the location of our YAML file in docker-compose.yml:

...
services:
biochatter-next:
build:
context: ./
dockerfile: Dockerfile
ports:
- 3000:3000
env_file:
- .env
volumes:
- ./tmp:/tmp
- ./app-config:/app-config # mount the folder containing our YAML file (production.yml) to the container
environment:
- BASE_URL=http://flask-app:5001
- CUSTOM_BIOCHATTER_NEXT_FILE=/app-config/production.yml # tell BioChatter-Next where the YAML file is
depends_on:
- flask-app
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I see the distinction. The contents of .env file (passed into the env_file field) should lead to the same result as the same variable(s) in the environment field, no? Should we just remove the .env file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this - ./app-config:/app-config # mount the folder containing our YAML file (production.yml) to the container in my docker-compose.yaml. Can you push all your changes to the repo?

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 2, 2024

@fengsh27 I tried customising the yaml file and it seems it is hardcoded in several locations, this one and maybe in line 50 of constants.ts (although I don't know if that one relates to the directory or is just a name). I would prefer to have config/ as the directory name to align with biocypher project structure, see my commit.

app-config in contants.ts is the entry name of Local Storage that stores the application configuration. Our folder 'app-config' coincides with this entry name. I'm considering to rename it to "customization", as we already have folder app/config, I prefer not to use config.

@slobentanzer
Copy link
Contributor

Agree that app/config probably is confusing. However, BioCypher projects have a config directory with YAML files, which is why I think that using that for the BioChatter Next configuration would be an obvious choice. Can we just copy it into the container in /config (in root) as opposed to /app/config, and read it from there? In the biochatter-next repo, we also have the config directory in root.

@slobentanzer
Copy link
Contributor

The main reason is being able to execute the custom BioChatter next from a BioCypher pipeline repository such as decider-genetics for the use case.

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 2, 2024

As docker-compose can't copy file into container, I think this is the only we can make it able to execute in BioCypher pipeline, like this:

services:
  biochatter-next:
    image: biocypher/biochatter-next:0.1.1
    ports:
      - 3000:3000
    env_file:
      - .env
    volumes:
      - ./tmp:/tmp
      - ./app-config:/app-config            # mount the folder containing our YAML file (production.yml) to the container
    environment:
      - BASE_URL=http://flask-app:5001
      - CUSTOM_BIOCHATTER_NEXT_FILE=/app-config/production.yml  # tell BioChatter-Next where the YAML file is
    depends_on:
      - flask-app

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 2, 2024

As docker-compose can't copy file into container, I think this is the only we can make it able to execute in BioCypher pipeline, like this:

services:
  biochatter-next:
    image: biocypher/biochatter-next:0.1.1
    ports:
      - 3000:3000
    env_file:
      - .env
    volumes:
      - ./tmp:/tmp
      - ./app-config:/app-config            # mount the folder containing our YAML file (production.yml) to the container
    environment:
      - BASE_URL=http://flask-app:5001
      - CUSTOM_BIOCHATTER_NEXT_FILE=/app-config/production.yml  # tell BioChatter-Next where the YAML file is
    depends_on:
      - flask-app

That is, it does not matter where the configuration file production.yml is. It's just a sample configuration file, which can't be used directly by BioCypher pipeline.

@slobentanzer
Copy link
Contributor

OK, and why can't we do that with config instead of app-config? As well as any arbitrary name?

See the current state of this PR (config/decider.yaml). Or maybe we should go with biochatter-next.yaml to identify what the config is for.

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 2, 2024

np, it's OK to me to name it to "config/biochatter-next.yaml".

@slobentanzer
Copy link
Contributor

Great; please include all changes to the docker compose as well, I think it probably failed for me because that line is not in there.

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 2, 2024

Sure

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 2, 2024

@slobentanzer I made a submission according to our discussion, please check if it works as your expectation. If you have any question, please let me know.

@slobentanzer
Copy link
Contributor

@fengsh27 great, works as expected now! :)

a minor issue: on the persona splash screen (if I click the X to exit the welcome splash screen), the user can still select from all the builtin masks we have as default. Is there an easy way to get rid of the splash screen in the case when we have a custom YAML configuration?

@slobentanzer
Copy link
Contributor

In fact, any time the user clicks on the Help button and then closes with the X button, the persona splash screen appears. Would be good to be able to turn that off (i.e., the user closes and then is back in the custom conversation with the correct persona).

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 2, 2024

@slobentanzer fixed, please give it a try

@slobentanzer
Copy link
Contributor

Great job, that is exactly how I envisioned it. I have one remaining question about the standard behaviour of the conversations: can we change the display / mask of the default conversation? The default conversation has the name New Conversation (i.e., the conversation that appears on first visit, or when the user closes all conversations in the conversation history). Since the last changes, we already have the correct prompt in the new conversation; could we also show it with the name of the mask (Cancer Genetics Assistant in this case) instead of New Conversation?

Other than that, from my end this is ready to merge. :)

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 3, 2024

Great job, that is exactly how I envisioned it. I have one remaining question about the standard behaviour of the conversations: can we change the display / mask of the default conversation? The default conversation has the name New Conversation (i.e., the conversation that appears on first visit, or when the user closes all conversations in the conversation history). Since the last changes, we already have the correct prompt in the new conversation; could we also show it with the name of the mask (Cancer Genetics Assistant in this case) instead of New Conversation?

Other than that, from my end this is ready to merge. :)

Make sense, I've fix the issue, and also fixed the typo in constant.ts.

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 4, 2024

@slobentanzer Please have a look at the submission at your convenience.

@fengsh27
Copy link
Collaborator Author

fengsh27 commented Jul 5, 2024

@slobentanzer I'll merge this submission first. If you have any questions or concerns later on, I'll create a new PR for them.

@fengsh27 fengsh27 merged commit 3d67f04 into main Jul 5, 2024
@fengsh27 fengsh27 deleted the wip/fengsh/configurable-yaml branch July 5, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants