-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] What is the canonical method to remap the coordinate in CuTe? #1277
Comments
@ccecka Any suggestion? |
I'm not sure what exactly you're trying to accomplish here. We don't have a need for "coordinate remapping" outside of predication applications: That said, the mapper you're interested in can be written as auto inner = make_layout(make_shape(_2{}, _3{}));
auto tiler = make_layout(make_shape(_3{}, _4{}));
auto tiled = blocked_product(inner, tiler); // (_x, _y) -> linear_idx
auto naive = make_layout(make_shape(_6{}, _12{})); // (.x, .y) -> linear_idx, naively linearize blockIdx
auto coord = make_identity_layout(shape(naive)); // (.x, .y) -> (.x, .y)
auto cmap = coord.compose(right_inverse(tiled)).compose(naive); // (.x, .y) -> linear_idx -> (_x, _y)
std::cout << cmap << std::endl;
for (int c = 0; c < size<1>(cmap); ++c) {
for (int r = 0; r < size<0>(cmap); ++r) {
std::cout << naive(r,c) << "\t(.x,.y)=(" << r << "," << c << ")\t(_x,_y)=" << cmap(r,c) << std::endl;
// 0 (.x,.y)=(0,0) (_x,_y)=(0,0)
// 1 (.x,.y)=(1,0) (_x,_y)=(1,0)
// 2 (.x,.y)=(2,0) (_x,_y)=(0,1)
// 3 (.x,.y)=(3,0) (_x,_y)=(1,1)
// 4 (.x,.y)=(4,0) (_x,_y)=(0,2)
// ...
}
} where you may need to reference |
In the case that you're actually interested in transforming Classic "Stream-K" tile scheduling These typically don't use CuTe-Layout transforms for a few technical reasons -- the dynamic shapes often cause lots of divmods that need to be carefully optimized, CuTe's admissibility of |
Thanks, I actullay came up with the coord, but used |
Sometime it is not that easy as the tensor don't have the same modes or have indirect mapping (just like paged attention), in this case you want to recover the mapped the coords and manually slice the portion of the data you (the thread or some other logical unit) care about and restart from there. |
To expand the "indirect mapping", cute somehow reminds me the Taichi[1] paper, in taichi, the array can be hierarchical but the array indexing is flattened (very like the cute hierarchial coord), you just index to the data lied at very bottom. For example, it can somehow create an array of pointers to array, the pointed arrays are arrays of vec3s. You can index through the hierarchy and access vec3 directly. It will be very interesting to see cute to support these type of "pointer delimited" tensors. [1] Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. 2019. Taichi: a language for high-performance computation on spatially sparse data structures. ACM Trans. Graph. 38, 6, Article 201 (December 2019), 16 pages. https://doi.org/10.1145/3355089.3356506 |
Say, I want to tiling something, thread tiling, warp tiling and maybe cta tiling for L2, you name it. And then it comes to coordinate remapping. The only way (see code) I can come up with is map linear index and then unmap from it.
It worked, but seems to be very fragile due to the involved hierarchical coordinate, because sometime I may want a 1d coord for each mode. Is there a canonical way achieve it?
The text was updated successfully, but these errors were encountered: