Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling eth cores on BH and initializing FW on active eriscs #18130

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

abhullar-tt
Copy link
Contributor

@abhullar-tt abhullar-tt commented Feb 21, 2025

Ticket

#18038

What's changed

Last set of changes that enable loading FW onto active erisc cores.

This PR points to a UMD branch with some fixes for detecting BH multi-chip connectivity. UMD branch will be mainlined before this PR is merged.

Includes

  • Updating Metal Cluster to pull ClusterDescriptor from UMD rather than creating a separate Metal version
  • Updating some N300 gtests to run on 2 chip machines

Eth links on BH are pretty flaky (https://tenstorrent.atlassian.net/browse/BH-84)

Note: Fabric support needs some additional fixes that are on a separate branch (abhullar/bh-multichip)

Checklist

@abhullar-tt abhullar-tt force-pushed the abhullar/bh-p150 branch 3 times, most recently from 5d35a12 to 3064977 Compare February 22, 2025 17:41
@abhullar-tt abhullar-tt changed the title Abhullar/bh p150 Enabling eth cores on BH and initializing FW on active eriscs Feb 22, 2025
@abhullar-tt abhullar-tt force-pushed the abhullar/bh-p150 branch 2 times, most recently from 7152db8 to 50c4f05 Compare February 26, 2025 06:01
@@ -80,7 +80,7 @@ dram_view_size:

eth:
[
#1-1, 2-1, 3-1, 4-1, 5-1, 6-1, 7-1, 10-1, 11-1, 12-1, 13-1, 14-1, 15-1, 16-1,
1-1, 16-1, 2-1, 15-1, 3-1, 14-1, 4-1, 13-1, 5-1, 12-1, 6-1, 11-1, 7-1, 10-1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -20,6 +20,36 @@ class MultiDeviceFixture : public DispatchFixture {
void SetUp() override { this->arch_ = tt::get_arch_from_string(tt::test_utils::get_umd_arch_name()); }
};

class TwoDeviceFixture : public MultiDeviceFixture {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only difference between this and the N300 fixture below in that we dropping tt::ARCH::WORMHOLE_B0 assertion? Do we intend to run this test on both BH and WH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this should be on both BH and WH, there are some other N300 specific tests which is why I didn't remove the N300 fixture

@@ -159,6 +159,7 @@ class Hal {
bool coordinate_virtualization_enabled_;
uint32_t virtual_worker_start_x_;
uint32_t virtual_worker_start_y_;
bool eth_fw_is_cooperative_; // set when eth riscs have to context switch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -302,7 +305,7 @@ class Cluster {
// Need to hold reference to cluster descriptor to detect total number of devices available in cluster
// UMD static APIs `detect_available_device_ids` and `detect_number_of_chips` only returns number of MMIO mapped
// devices
std::unique_ptr<tt_ClusterDescriptor> cluster_desc_;
tt_ClusterDescriptor* cluster_desc_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhullar-tt abhullar-tt force-pushed the abhullar/bh-p150 branch 3 times, most recently from 7409fd8 to bc07969 Compare February 27, 2025 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants