Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with distribution in memory tile #1999

Open
antonio-fc opened this issue Dec 19, 2024 · 6 comments
Open

problem with distribution in memory tile #1999

antonio-fc opened this issue Dec 19, 2024 · 6 comments

Comments

@antonio-fc
Copy link

Hi team,

I'm trying to implement an application that uses the distribution design pattern with the Memory Tile. The input stream passes correctly to the memory tile to the output. However, when trying to distribute the stream evenly over three object fifos, (like in the example https://github.com/Xilinx/mlir-aie/blob/main/programming_guide/section-2/section-2e/04_distribute_L2/distribute_L2.py) the distribution happens incorrectly to what is expected. The application can be found at: https://github.com/antonio-fc/my_mlir-aie/tree/testing/apps/app6.

The application runs with 'make run_py' which already prints the input to be distributed and the resulting stream from one of the object fifos called 'of_out'. The expected output, is printed below it for reference. Let me know if this is a mistake on my part or if something else is happening. Thanks in advance.

@jgmelber
Copy link
Collaborator

I just ran this test from your repo using a fresh install of milr-aie and the result shows "PASS!". I ran make clean && make before make run_py.

@jgmelber
Copy link
Collaborator

Output from distributed input 1/3:
[1.000e+00 2.000e+00 3.000e+00 ... 1.022e+03 1.023e+03 1.024e+03]
[1025. 1026. 1027. ... 2046. 2047. 2048.]
[2049. 2050. 2051. ... 3070. 3071. 3072.]
[3073. 3074. 3075. ... 4094. 4095. 4096.]
[1025. 1026. 1027. ... 2046. 2047. 2048.]
[2049. 2050. 2051. ... 3070. 3071. 3072.]
[3073. 3074. 3075. ... 4094. 4095. 4096.]
[4097. 4098. 4099. ... 5118. 5119. 5120.]
[5121. 5122. 5123. ... 6142. 6143. 6144.]

Expected output from distributed input 1/3
[1.000e+00 2.000e+00 3.000e+00 ... 1.022e+03 1.023e+03 1.024e+03]
[1025. 1026. 1027. ... 2046. 2047. 2048.]
[2049. 2050. 2051. ... 3070. 3071. 3072.]
[3073. 3074. 3075. ... 4094. 4095. 4096.]
[4097. 4098. 4099. ... 5118. 5119. 5120.]
[5121. 5122. 5123. ... 6142. 6143. 6144.]
[6145. 6146. 6147. ... 7166. 7167. 7168.]
[7169. 7170. 7171. ... 8190. 8191. 8192.]
[8193. 8194. 8195. ... 9214. 9215. 9216.]

@antonio-fc
Copy link
Author

Yes, so those two are supposed to be equal, but they aren't. The pass message can be ignores since it it's not checking anything and i forgot to remove it.

@jgmelber
Copy link
Collaborator

I am somewhat confused by the complexity of the design, do you have a minimal test case for your issue? From what I can tell, two of the objectFIFOs in the distribute pattern have no consumers, so they might be exerting back pressure. Due to the current implementation of the distribute the locks may then be out of sync allowing the design to complete but sending most of the data to the "out" object fifo rather than the others.

@jgmelber
Copy link
Collaborator

I added:

       @core(ct[2][0])
        def core_body():
            # Effective while(1)
            for _ in range_(ITER_KERNEL):
                of_hh.acquire(ObjectFifoPort.Consume, 1)
                of_hh.release(ObjectFifoPort.Consume, 1)

        @core(ct[3][0])
        def core_body():
            # Effective while(1)
            for _ in range_(ITER_KERNEL):
                of_kk.acquire(ObjectFifoPort.Consume, 1)
                of_kk.release(ObjectFifoPort.Consume, 1)

Now the output is:

Output from distributed input 1/3:
[1.000e+00 2.000e+00 3.000e+00 ... 1.022e+03 1.023e+03 1.024e+03]
[1025. 1026. 1027. ... 2046. 2047. 2048.]
[2049. 2050. 2051. ... 3070. 3071. 3072.]
[3073. 3074. 3075. ... 4094. 4095. 4096.]
[4097. 4098. 4099. ... 5118. 5119. 5120.]
[5121. 5122. 5123. ... 6142. 6143. 6144.]
[6145. 6146. 6147. ... 7166. 7167. 7168.]
[7169. 7170. 7171. ... 8190. 8191. 8192.]
[8193. 8194. 8195. ... 9214. 9215. 9216.]

Expected output from distributed input 1/3
[1.000e+00 2.000e+00 3.000e+00 ... 1.022e+03 1.023e+03 1.024e+03]
[1025. 1026. 1027. ... 2046. 2047. 2048.]
[2049. 2050. 2051. ... 3070. 3071. 3072.]
[3073. 3074. 3075. ... 4094. 4095. 4096.]
[4097. 4098. 4099. ... 5118. 5119. 5120.]
[5121. 5122. 5123. ... 6142. 6143. 6144.]
[6145. 6146. 6147. ... 7166. 7167. 7168.]
[7169. 7170. 7171. ... 8190. 8191. 8192.]
[8193. 8194. 8195. ... 9214. 9215. 9216.]

@jgmelber
Copy link
Collaborator

@AndraBisca this is related to the race conditions we have seen on distributes/joins for link with a single pair of semaphore locks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants