Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracy captures produced by benchmark pipeline aren't grouping CPU codegen or GPU zones #7219

Closed
ScottTodd opened this issue Sep 30, 2021 · 9 comments
Labels
bug 🐞 Something isn't working infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing performance ⚡ Performance/optimization related work across the compiler and runtime

Comments

@ScottTodd
Copy link
Member

ScottTodd commented Sep 30, 2021

some discussion on IREE's Discord here

Examples traces can be downloaded from the artifacts tab on https://buildkite.com/iree/iree-benchmark/builds/1107#d697f531-34bf-4372-9942-6d373b8ece5f

Ungrouped CPU zone statistics:
image

Ungrouped GPU child zone statistics:
image

https://github.com/google/iree/blob/main/build_tools/benchmarks/run_benchmarks_on_android.py is the main script for running those benchmarks and collecting traces from them.

I tried to reproduce this on my Windows development machine with Tracy's capture GUI and CLI and an unrooted Samsung Galaxy S10 and was able to see grouped zones using both the CPU and GPU targets / HAL drivers:
image
image

@ScottTodd ScottTodd added bug 🐞 Something isn't working performance ⚡ Performance/optimization related work across the compiler and runtime infrastructure/benchmark Relating to benchmarking infrastructure labels Sep 30, 2021
@ScottTodd
Copy link
Member Author

things left to check to get a repro:

  • the capture tool built + used for the pipeline on Linux
  • the lab phones
  • the python scripts used for the pipeline

@antiagainst
Copy link
Contributor

I tried running the run_benchmarks_on_android.py script to benchmark on a local Android phone from a x86 Linux host. Everything works fine. Then I tried to on an aarch64 host (Raspberry Pi 4), it is not working. It's the same script, the same artifacts (benchmark suites, iree-benchmark-module), and the same phone. What's different is the Tracy capture tool. I have a tracy capture tool compiled for aarch64 for the latter case. In the lab we also use RPI4 to drive the phones.

So right now I suspect there are issues with Tracy capture tool compiled towards aarch64. Or maybe it's due to that we capture with a capture tool compiled for aarch64 and view the capture with a Tracy GUI tool compiled for x86? That's problematic?

I also tried Ben's wolfpld/tracy#262. That does not help either.

@antiagainst
Copy link
Contributor

@benvanik: I don't know much internals about Tracy. So the above is more of my guess. Does it make sense?

@benvanik
Copy link
Collaborator

benvanik commented Oct 6, 2021

That's useful information! I was only trying to capture from an x86 host. Finding the right place to put some printfs that we can read back from the logs on the rpi would be useful.

@GMNGeoffrey GMNGeoffrey added the infrastructure Relating to build systems, CI, or testing label Dec 2, 2021
@GMNGeoffrey GMNGeoffrey added this to IREE Jun 28, 2022
@antiagainst
Copy link
Contributor

So right now I suspect there are issues with Tracy capture tool compiled towards aarch64. Or maybe it's due to that we capture with a capture tool compiled for aarch64 and view the capture with a Tracy GUI tool compiled for x86? That's problematic?

With my Apple M1 macbook, I have both the capture tool and the profiler UI compiled for aarch64 and it works fine. But still I cannot use the profiler UI to open the captures generated from those RPI devices (which is also aarch64).. So I guess it might have something to do with the libraries Tracy depends on Ubuntu?

@benvanik
Copy link
Collaborator

I still have issues loading those android traces - I think I tracked it down to something that looked like undefined behavior somewhere in either the recording of string tables or the parsing of them but wasn't able to figure it out.

@antiagainst
Copy link
Contributor

Yeah, me too. This bug is really wild.. Time to update Tracy though! It has been almost a quarter. :)

@github-project-automation github-project-automation bot moved this to Not Started in (Deprecated) IREE Feb 21, 2023
@allieculp allieculp moved this from Not Started to Backlog in (Deprecated) IREE May 19, 2023
@ScottTodd
Copy link
Member Author

I wonder if this reproduces on the latest Tracy / newer phones. This issue is quite old 🤔

@ScottTodd
Copy link
Member Author

We switched from builtkite to github actions and then later dropped Tracy support from the benchmarks pipelines. Closing this old issue.

@ScottTodd ScottTodd closed this as not planned Won't fix, can't repro, duplicate, stale Aug 13, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in (Deprecated) IREE Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing performance ⚡ Performance/optimization related work across the compiler and runtime
Projects
No open projects
Status: No status
Development

No branches or pull requests

4 participants