-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Install Nvidia DOCA on the servers post provisioning #2219
base: devel
Are you sure you want to change the base?
Conversation
d4ab28d
to
508e321
Compare
# Absolute path to local copy of .tgz file containing DOCA package. | ||
# The package can be downloaded from https://developer.nvidia.com/networking/doca/ | ||
# Optional variable. | ||
nvidia_doca_offline_path: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: during testing I see mix between nvidia_doca_path
and nvidia_doca_offline_path
need to review this is more details
# Usage: configure_doca.yml | ||
doca_tmp_path: /tmp/doca | ||
doca_core_path: /install/doca/x86_64/doca-core | ||
doca_deps_path: /install/doca/x86_64/doca-deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to review this section, too many parameters...
# limitations under the License. | ||
--- | ||
|
||
- name: Delete doca repo folders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to review this entire file, looks like copy paste from cuda, need more attention here...
ansible.builtin.include_role: | ||
name: nvidia_doca | ||
tasks_from: validations.yml | ||
|
||
- name: Check nodes having Infiniband Support | ||
hosts: all | ||
tasks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: missing code to actually start DOCA installation in this file from roles nvidia_doca
block: | ||
- name: Install packages from doca rpm file | ||
ansible.builtin.yum: | ||
name: "{{ doca_filepath }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: need to understand NFS and nvidia_doca_path
vs doca_filepath
Signed-off-by: Boris Glimcher <[email protected]>
- name: Include vars file of inventory role | ||
ansible.builtin.include_vars: "{{ role_path }}/../../../input/network_config.yml" | ||
|
||
# - name: Check status of doca installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this or can remove it ?
os_supported_rocky: "rocky" | ||
os_supported_rhel: "redhat" | ||
|
||
doca_repo_url: "https://linux.mellanox.com/public/repo/doca/{{ nvidia_doca_version }}/rhel/{{ compute_os_version }}/x86_64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct URL example is https://linux.mellanox.com/public/repo/doca/2.5.0/rhel8.0/x86_64/
please replace rhel
with variable so can be used with other distros...
when: nvidia_doca_path | default("", true) | length > 0 | ||
|
||
# - name: Validate nvidia_doca_version | ||
# ansible.builtin.assert: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this code or it can be removed ?
Issues Resolved by this Pull Request
Fixes #
Description of the Solution
nvidia_doca_path
is provided ininput/provision_config.yml
and Nvidia DPUs are available on the target nodes, DOCA packages will be deployed post provisioning without user intervention.network.yml
after provisioning the servers (Assuming the provision tool did not install DOCA packages).From Nvidia documentation:
Suggested Reviewers
@sujit-jadhav