Skip to content

Generative AI Infrastructure v1.1 Release Notes

Latest
Compare
Choose a tag to compare
@ftian1 ftian1 released this 26 Nov 00:46
· 26 commits to main since this release

OPEA Release Notes v1.1

We are pleased to announce the release of OPEA version 1.1, which includes significant contributions from the open-source community. This release addresses over 470 pull requests.

More information about how to get started with OPEA v1.1 can be found at Getting Started page. All project source code is maintained in the repository. To pull Docker images, please access the Docker Hub. For instructions on deploying Helm Charts, please refer to the guide.

What's New in OPEA v1.1

This release introduces more scenarios with general availability, including:

Highlights

New GenAI Examples

  • AvatarChatbot: a chatbot that combines a virtual "avatar" that can run on either Intel Gaudi 2 AI Accelerator or Intel Xeon Scalable Processors.
  • DBQnA: for seamless translation of natural language queries into SQL and deliver real-time database results.
  • EdgeCraftRAG: a customizable and tunable RAG example for edge solutions on Intel® Arc™ GPUs.
  • GraphRAG: a Graph RAG-based approach to summarization.
  • Text2Image: an application that generates images based on text prompts.
  • WorkflowExecAgent: a workflow executor example to handle data/AI workflow operations via LangChain agents to execute custom-defined workflow-based tools.

Enhanced GenAI Examples

New GenAI Components

Enhanced GenAI Components

GenAIStudio

GenAI Studio, a new project of OPEA, streamlines the creation of enterprise Generative AI applications by providing an alternative UI-based processes to create end-to-end solutions. It supports GenAI application definition, evaluation, performance benchmarking, and deployment. The GenAI Studio empowers developers to effortlessly build, test, optimize their LLM solutions, and create a deployment package. Its intuitive no-code/low-code interface accelerates innovation, enabling rapid development and deployment of cutting-edge AI applications with unparalleled efficiency and precision.

Enhanced Observability

Observability offers real-time insights into component performance and system resource utilization. We enhanced this capability by monitoring key system metrics, including CPU, host memory, storage, network, and accelerators (such as Intel Gaudi), as well as tracking OPEA application scaling.

Helm Charts Support

OPEA examples and microservices support Helm Charts as the packaging format on Kubernetes (k8s). The newly supported examples include AgentQnA, AudioQnA, FaqGen, VisualQnA. The newly supported microservices include chathistory, mongodb, prompt, and Milvus for data-prep and retriever. Helm Charts have now option to get Prometheus metrics from the applications.

Long-context Benchmark Support

We added the following two benchmark kits to response to the community's requirements of long-context language models.

  • HELMET: a comprehensive benchmark for long-context language models covering seven diverse categories of tasks. The datasets are application-centric and are designed to evaluate models at different lengths and levels of complexity.
  • LongBench: a benchmark tool for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models.

Newly Supported Models

  • llama-3.2 (1B/3B/11B/90B)
  • glm-4-9b-chat
  • Qwen2/2.5 (7B/32B/72B)

Newly Supported Hardware

Notable Changes

GenAIExamples
  • Functionalities

    • New GenAI Examples
      • [AvatarChatbot] Initiate "AvatarChatbot" (audio) example (cfffb4c, 960805a)
      • [DBQnA] Adding DBQnA example in GenAIExamples (c0643b7, 6b9a27d)
      • [EdgeCraftRag] Add EdgeCraftRag as a GenAIExample (c9088eb, 7949045, 096a37a)
      • [GraphRAG] Add GraphRAG example a65640b
      • [Text2Image]: Add example for text2image 085d859
      • [WorkflowExecAgent] Add Workflow Executor Example bf5c391
    • Enhanced GenAI Examples
      • [AudioQnA] Add multi-language AudioQnA on Xeon 658867f
      • [AgentQnA] Update AgentQnA example for v1.1 release 5eb3d28
      • [ChatQnA] Enable vLLM Profiling for ChatQnA (00d9bb6, 7adbba6)
      • [ChatQnA] Add Terraform and Ansible Modules information 7c9ed04
      • [ChatQnA] Add chatqna wrapper for multiple model selection fb514bb
      • [DocSum] Supported multimedia and added new GUI powered by gradio (eb91d1f, 0cdeb94)
      • [DocSum] Support Chinese for Docsum b0f7c9c
      • [DocIndexRetriever] Update DocIndexRetriever Example to allow user passing in retriever/reranker params 62e06a0
      • [MultimodalQnA] Image and Audio Support Phase 1 bbc95bb
      • [Text2Image] Add Text2Image UI, UI tests, Readme, and Docker support c6fc92d
      • update examples accuracy 088ab98
      • Add one-button benchmark launcher (5720cd4, ced68e1)
    • Removed GenAI Pipelines
      • [ChatQnA] remove ChatQnA vllm-on-ray 40386d9
    • Changed Defaults
      • [ChatQnA] Set no wrapper ChatQnA as default 619d941
      • [Codegen] Replace codegen default Model to Qwen/Qwen2.5-Coder-7B-Instruct. 2332d22
      • [CodeTrans] update codetrans default model to Mistral-7B-Instruct-v0.3 a2afce1
  • Enhanced Security

  • New Hardware Support

    • [ChatQnA] Add compose example for ChatQnA AMD ROCm deployment 6d3a017
    • [CodeGen] Adding files to deploy CodeGen application on AMD GPU 83172e9
    • [CodeTrans] Adding files to deploy CodeTrans application on AMD GPU 7e62175
    • [DocSum] Add compose example for DocSum amd rocm deployment b1bb6db
    • [FaqGen] Add compose example for FaqGen AMD ROCm 5648839
  • Dependency Versioning

    • [gradio] Bump gradio from 4.44.0 to 5.0.0 in /MultimodalQnA/ui/gradio f2f6c09
    • [TGI-CPU] Update TGI CPU image to latest official release 2.4.0-intel-cpu 0306c62
    • [TGI-Gaudi] Upgrade TGI Gaudi version to v2.0.6 1ff85f6a
    • [TEI-Gaudi] Use fixed version(1.5.0) of TEI Gaudi for stability 9ff7df9
    • [vLLM-Gaudi] align vllm hpu version to latest vllm-fork e9b1645
  • Deployment

    • [ChatQnA] Add instructions of modifying reranking docker image for NVGPU 2587179
    • [ChatQnA] setup ollama service in aipc docker compose def39cf
    • [ChatQnA] Make rerank run on gaudi for hpu docker compose 3c164f3
    • [ChatQnA] Added the k8s yaml for vLLM support e2f9037
    • [ChatQnA] manage your own ChatQnA pipelines. d16c80e
    • [ChatQnA] docker install instruction for csp 75df2c9
    • [ChatQnA] ChatQnA with Remote Inference Endpoints (Kubernetes) 56f770c
    • [ProductivitySuite] Simplify the deployment ProductivitySuite on kubernetes afc39fa
  • Fixed Issues

    • [AvatarChatbot] Fix left issue of tgi version update 393367e
    • [ChatQnA] Fix the service connection issue on GPU and modify the emb backend 944ae47
    • [ChatQnA] Fix AIPC docker container network issue 95b58b5
    • [ChatQnA] Fix top_n rerank docs 4a265ab
    • [ChatQnA] fix chatqna accuracy issue with incorrect penalty b0487fe
    • [ChatQnA] Fix AIPC retriever and UI error 773c32b
    • [DocSum] Fix docSum ui error in accessing parsed files 3744bb8
    • image build bug fix 82801d0
  • Documentation

    • [AudioQnA] Update AudioQnA README.md for its workflow 63bad29
    • [AudioQnA] Update AudioQnA README to add a couple usage details 184e9a4
    • [AgentQnA] Update Agent README.md for workflow 23b820e
    • [AgentQnA] Update README.md for usage experience a8f4245
    • [ChatQnA] Add steps to deploy opea services using minikube 6263b51
    • [ChatQnA] Update ChatQnA Readme for LLM Endpoint aa314f6
    • [ChatQnA] Update ChatQnA AIPC README b056ce6
    • [CodeGen] Update CodeGen README for its workflow 12469c9
    • [DocSum] Update DocSum README.md for its workflow fbde15b
    • [FaqGen] Update FaqGen README.md for its workflow 0c6b044
    • [InstructionTuning] instruction finetune README improvement 644c3a6
    • [MultiModalQnA] Update MultiModal README.md for workflow 40800b0
    • [ProductivitySuite] Update Productivity README.md for workflow 0edff26
    • [DocIndexRetriever] Update DocIndexRetriever README.md for workflow a3f9811
    • [SearchQnA] Update SearchQnA README.md for its workflow bf28c7f
    • [Translation] Update Translation README.md for workflow 35a4fef
    • [VideoQnA] Update VideoQnA README.md for workflow 1929dfd
  • CI/CD/UT

    • Add nightly image build and publish action 78331ee
    • optimize hardware list for test 3b1a9fe
    • open manifest test in CI when dockerfile changed 620ef76
    • Optimize path and link validity check. 7dec001
GenAIComps
  • Functionalities
    • New microservices:
      • Add stable diffusion microservice 5d0c4367
      • Add image2video microservice (Stable Video Diffusion) a03e7a55
      • Text to SQL microservice 827e3d40
      • Add GPT-SoVITS microservice 6da7db9e
      • Add image2image microservice 52c1826f
      • Initiate "animation" component c26d37e7
      • GraphRAG with llama-index 19330ea2
    • Enhanced microservices:
      • Add DPO support in finetuning microservice 37f35140
      • Support Chinese for Docsum 9a00a3ea
      • Support file upload summary for DocSum microservice fa2ea642
      • Add support for Audio and Video summarization to Docsum baafa402
      • vLLM support for FAQGen f5c60f10
      • vLLM support for DocSum 550325d8
      • vLLM support for Codegen 24b9f03f
      • Enable vllm for Agent 4638c1d4
      • Multiple models and remote service support for langchain vLLM text-generation e3812a74
      • Set a higher default value(1.2) about repetition_penalty for codegen example to reduce repetition 5ed428f4
      • MultimodalQnA Image and Audio Support Phase 1 29ef6426
      • refine codetrans prompt, support parameter input 0bb019f8
      • add dynamic batching embedding/reranking 518cdfb6
      • Embedding compatible with OpenAI API 7bf1953c
      • Update RAGAgentLlama and ReActLlama c8e36390
      • [Agent] support custom prompt 3473bfb3
      • agent short & long term memory with langgraph. e39b08f3
      • support faqgen upload file in UI 453ff726
      • Add E2E Prometheus metrics to applications a6998a1d
      • Multiple models support for LLM TGI e879366c
      • Add RAG agent and ReAct agent implemention for llama3.1 served by TGI-gaudi e7fdf537
      • Support Llama3.2 vision and vision guard model 534c227a
      • Add Intel/toxic-prompt-roberta to toxicity detection microservice f6f620a2
      • Refactor milvus dataprep and retriever 84374a57
    • Removed microservices
    • Async support for microservices
      • Support async for embedding micorservice 28672956
      • TEI rerank microservice async support 9df4b3c0
      • Async support for some microservices f3746dc8
  • Performance
    • Fix vllm microservice performance issue. 2159f9ad
    • [Dataprep] Reduce Upload File Time Consumption 71348998
  • New Hardware Support
    • Add vLLM ARC support with OpenVINO backend a2b9d95f
  • Enhanced Security
    • Prediction Guard Guardrails components 4bbc7a2f
    • Add WildGuard Guardrail Microservice 5bb4046bF
    • upgrade setuptools version to fix CVE-2024-6345 6518c0f0
    • Remote TGI/TGI services with OAuth Client Credentials authentication 74df6bb7
  • Validation
    • Combine CI/CD docker compose. 23c99c11
GenAIEvals
  • New Benchmark
  • Performance
    • Add new constant loader & Fix poisson loader issue e11588c
    • Support Poisson distributed requests for benchmark 7305ea3
    • Support customized prompts and max new tokens in chatqna e2e test 79a4ad3
    • Add namespace support for k8s performance test 70697d1
    • Support sharegpt dataset in chatqna e2e test 028bf63
    • [Benchmark] Get benchmark reports. 946c439
  • Accuracy
    • Control the concurrent number of requests in codegen acc test. 84e077e
    • integrate deepeval metric with remote endpoint, like tgi server. ffa65dc
    • Ragaaf - adding new metric 'context recall' cc7cebd
    • Ragaaf - adding new metric 'context relevance' f995c9c
    • Ragaaf (RAG assessment annotation free) 2413e70
    • Adding new metrics to ragas offering d1c1337
    • add crud ragas evaluation. f2bff45
    • Minimize requirements for user data for OPEA ragas f1593ea
  • Monitoring
    • Add node metrics Grafana dashboard a19f42e
    • Add CPU Grafana dashboard 38e69eb
    • add the grafana dashboard json file for Gaudi metrics 6c9ae91
    • Enhance the Grafana JSON file 8653efb
  • Fixed Issues
    • [ChatQnA Benchmark] Fixed the output token in chatqnafixed.py 2c8ca26
    • Fix test duration time inaccurate issue 9d76832
    • Fix llm output token length issue 99ef325
    • Fix llm serving benchmark issue d6bafbd
    • Fix input token size(1024) 30adcbe
    • Ragas fix for use of metrics argument 0cf3631
    • fixed the number of ouput token & fixed the top_k=1 4af0a62
    • Fix JSON Return Format in getReqData Function a4be366
  • Documentation
    • Add setup guide of gaudi prometheus exporter e9b8637
    • Add README for running OPEA ragas using HF endpoint on Gaudi 0dff0d3
GenAIInfra
  • GMC

    • Add manifests for new components e51fd62
  • HelmChart

    • [AgentQnA] Helm Chart for AgentQnA 66de41c
    • [AudioQnA] helm: Add audioQnA e2e helm chart 9efacee
    • [AudioQnA] helm-charts: Add gpt-sovits support 1f55e1a
    • [ChatQnA] Implement the nowrapper version chatqna 71c81d0
    • [FaqGen] Add FaqGen helm chart f847e05
    • [FaqGen] helm: Add llm-faqgen-tgi support 325126e
    • [HPA] helm/manifest: Sync HPA related k8s probe settings c399578
    • [VisualQnA] Add helm chart for VisualQnA example b077d44
    • [UI] support variants for multiple examples 96af2ad
    • [Nginx] helm-chart: Make nginx service type configurable a5c96ab
    • [Milvus] Add milvus support for data-prep and retriever-usvc d289b4e
    • Add helm chart for 3 components 881e2b5
    • accelerate also teirerank with Gaudi 620963f
  • CSP

    • terraform: add AWS/EKS deployment for ChatQnA bdb9af9
      joshuayao marked this conversation as resolved.
  • Monitoring

    • Add Grafana dashboard for monitoring OPEA application scaling in k8s 691bbc5
    • Add ServiceMonitors for rest of OPEA applications fc6235a
    • Add monitoring option to (ChatQnA) Helm charts dbd607e
    • Support alternative metrics on accelerated TGI / TEI instances cdd3585
    • Expose options such as collector.interval of memory bandwidth exporter in k8s manifests and docker for user configuration. 2517e79
  • Dependency Versioning

    • [TEI-Gaudi] Upgrade tei-gaudi version to 1.5.0 c6a9c90
    • [TGI-CPU] Update tgi cpu image version to 2.4.0-intel-cpu f6c180e
    • [TGI-Gaudi] Upgrade tgi-gaudi to version 2.0.6 915baa0
    • Update the image version for ChatQnA examples 593458c
  • Changed Defaults

    • Change default model of codegen and codetrans 74476b7
  • Documentation

    • Update observability README + fix typos 1d77b81
    • Monitoring, Observability and HPA doc improvements 14198fe
    • Update GMC manifest changes and misc fixes 87dc673
    • Improve Helm charts README 7b8c510
    • Create troubleshooting.md d55ded4
    • Enhance helm chart repo usage in README 0de5535
  • CI/CD/UT

    • Refactor CI scripts to support more components e09270a
    • Add github workflows to release helm chart 3910e3b
    • Fix link check failure (#481) (5 weeks ago) fc87ef3
    • Fix CI failures (#477) (5 weeks ago) 7e7b8ab
    • Optimize path and link validity check. 91bd163
      -Enable image build process for memory-bandwidth-exporter ddeac46
    • Add hyperlinks and paths validation. d8cd3a1

Full Changelogs

Contributors

This release would not have been possible without the contributions of the following organizations and individuals.

Contributing Organizations

  • AMD: AMD CPU/GPU support.
  • Capital One: Contributions to CI/CD process.
  • China Unicom: Contributions to the deployment of GenAI examples.
  • Huawei: Contributions to OPEA services deployment.
  • Intel: Development and improvements to GenAI examples, components, infrastructure, evaluation and studio.
  • Nascenia Ltd.: Contributions to documentation.
  • National Chiao Tung University: Contributions to documentation.
  • Princeton University: Integration of HELMET.

Individual Contributors

For a comprehensive list of individual contributors, please refer to the "Full Changelogs" section.