Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready To Merge][AQUA] Mutli-Model Deployment #1061

Merged
merged 197 commits into from
Mar 28, 2025

Conversation

mrDzurb
Copy link
Member

@mrDzurb mrDzurb commented Feb 5, 2025

Description

Enhance AQUA SDK & CLI to Support Multi-Model Deployment

This PR introduces multi-model deployment support in the AQUA SDK and CLI, enabling users to deploy multiple models within a single model deployment. The enhancement includes updates to the SDK and CLI logic, allowing users to specify multiple models for deployment while ensuring proper validation and GPU resource allocation.

Key Changes

  • Updated SDK AQUA Model Deployment API to support multiple models in a single deployment.
  • Modified CLI commands to allow users to provide multiple model OCIDs when creating a deployment.
  • Refactored model validation logic to ensure that:
    • Only LLM models compatible with the VLLM container can be deployed.
    • GPU allocation is properly handled across all selected models.
      • Implemented validation enhancements to check model compatibility with the selected shape and GPU allocation.
      • Updated SDK handlers to support retrieving and registering grouped models.
      • Improved metadata storage mechanism for tracking multiple models in a single deployment.
      • Unit tests added to validate multi-model functionality in the SDK and CLI.

Breaking Changes & Compatibility Considerations

  • Existing single-model deployments remain unchanged, but the updated logic extends support for multi-model deployments.

Next Steps & Future Enhancements

  • Enhance UI to reflect new SDK changes, allowing users to configure multi-model deployments via AQUA UI.
  • Improve documentation to provide clear guidance on deploying multiple models in a single deployment.

Testing & Validation

  • Unit tests updated to cover new multi-model deployment logic.
  • Integration tests performed to validate CLI interactions with multi-model deployment APIs.
  • Manual testing conducted to verify proper GPU allocation and deployment success.
    • The manual tests covered in the: [AQUA][MMD] E-2-E Tests confluence page.

CLI Test

List Available Shapes

ads aqua deployment list_shapes

Retrieve Deployment Configuration*

ads aqua deployment get_multimodel_deployment_config --model_ids '["ocid1.datasciencemodel.oc1.iad.<OCID>"]'

Create a Multi-Model Deployment

ads aqua deployment create --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" --display_name "multi-model-deployment-test-completeion" --instance_shape "BM.GPU.A10.4" --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.<OCID>", "gpu_count":2},{"model_id":"ocid1.datasciencemodel.oc1.iad.<OCID>", "gpu_count":2}]' --log_group_id "ocid1.loggroup.oc1.iad.<OCID>" --access_log_id  "ocid1.log.oc1.iad.<OCID>" --predict_log_id "ocid1.log.oc1.iad.<OCID>"

Get Details for Model Deployment

ads aqua deployment get --model_deployment_id "ocid1.datasciencemodeldeployment.oc1.iad.<OCID>"

@mrDzurb mrDzurb added enhancement New feature or request do not merge for any issue that isn't ready for merging yet AQUA labels Feb 5, 2025
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Feb 5, 2025
Copy link

github-actions bot commented Feb 5, 2025

📌 Cov diff with main:

Coverage-95%

📌 Overall coverage:

Coverage-56.69%

Copy link

📌 Cov diff with main:

Coverage-0%

📌 Overall coverage:

Coverage-19.29%

Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.80%

Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.81%

Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.81%

Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.81%

Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.80%

@mrDzurb mrDzurb requested a review from lu-ohai March 28, 2025 16:42
lu-ohai
lu-ohai previously approved these changes Mar 28, 2025
@elizjo elizjo self-requested a review March 28, 2025 16:44
@mrDzurb mrDzurb changed the title [AQUA] Mutli-Model Deployment [Ready To Merge][AQUA] Mutli-Model Deployment Mar 28, 2025
Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added minor comments, changes look good.

@mrDzurb mrDzurb dismissed stale reviews from VipulMascarenhas and lu-ohai via 5176050 March 28, 2025 20:19
Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.81%

Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.81%

Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.80%

@mrDzurb mrDzurb requested a review from dipatidar March 28, 2025 23:39
@mrDzurb mrDzurb merged commit c45c38a into main Mar 28, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AQUA enhancement New feature or request OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants