-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM github workflow #66
Conversation
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows. |
Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu. |
On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by. |
First, ARM-hosted LLVM: Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work. Second, regarding ARM64 runners: i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days. |
will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners. |
would love to sync up on this and hear more from your colleagues |
A
Also wanted to ask if cross-compiling arm64 for the LLVM workflow is a good idea or if the workflow runner should just run on ubuntu arm64 to avoid any cross-compiling issues? matrix:
config:
- {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'x64'}
- {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'arm64'} #should work
- {runner: 'CentOS 7', runs_on: ['self-hosted', 'CPU'], target-os: 'centos', arch: 'x64'}
- {runner: 'MacOS X64', runs_on: 'macos-12', target-os: 'macos', arch: 'x64'}
- {runner: 'MacOS ARM64', runs_on: 'macos-12', target-os: 'macos', arch: 'arm64'} |
I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out. |
Also, i think you'll need changes below as the following steps are conditional and i believe ubuntu-arm64 won't match any of them as is. |
|
Hi @danikhan632 , this is great to see! Please take a look at #71 and see if you can get your pipelines working on that pool. Once you do, feel free to close that PR (@manbearian will have to manually delete the workflow it created I believe, but for now it's worth keeping up for testing). |
Just commited again with the 1ES arm workflow runner |
Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned |
Hi @danikhan632 , First, there's convergence of two things impacting the MS teams bandwidth right now: December vacations + Big internal presentation this week. So please bear with us. Second, i'm sorry about the annoyance around running the workflow. Since the PR is updating the workflow this requires extra permissions to run after you do this. How much are you changing the workflow each submission? Can we get a version check-in that you can use to test? |
No worries everyone is out for the holidays, so understandable. Right now the workflow kept failing because the self-hosted workflow that was setup doesn't have pip installed and I created a bit of an issue. Not sure about the version checkin-part, its up to date with current build. |
.github/workflows/test-plugin.yml
Outdated
run: | | ||
rm -rf ~/.triton | ||
|
||
- name: install pip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i presume this is an ARM thing... i'll follow to see why this is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NathanielMcVicar following up :)
how does this look to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's an ARM thing, it's that the self-hosted runners come with a bunch of software pre-installed that's not in the base image, so having to install things as part of the run when using the Marketplace image is expected. We could also use Docker if the install time got out of hand, but for now I think just installing it in the yml (or a setup .sh script) is the best approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since the runner is self-hosted and perhaps pip wasn't installed so I added that and was probably going to delete it after it ran with it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These runners aren't self-hosted in the sense you mean here. These are hosted by the Microsoft 1ES infrastructure (see here https://github.com/apps/1es-resource-management) and work mostly just like the github hosted runners. In particular, they are stateless, so anything like installing pip you will have to do on every run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/microsoft/triton-shared/actions/runs/7067334608/job/19308128182#step:10:91
looks like it doesn't have a C++ compiler either, is it just build-essential that needs to be installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -5,11 +5,21 @@ on: | |||
branches: [ "main" ] | |||
push: | |||
branches: [ "main" ] | |||
workflow_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the intent is to have test-plugin.yml be the dispatch point... what's this change for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @NathanielMcVicar might have suggested doing that for me to be able to manually trigger the workflows #71 but I could have misconstruded that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, but since it seems like you may not have the permissions it probably doesn't help much. Regardless seems worth allowing for people making changes to these in general (or wanting to test it in their own forks where the permission issues won't exist).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is this script just dispatches to the other script, why not just use that one?
.github/workflows/test-plugin.yml
Outdated
python3 -m pip install cmake==3.24 | ||
python3 -m pip install ninja | ||
python3 -m pip uninstall -y triton | ||
rm setup.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the plan to get setup.py
updated to handle ARM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a workaround for when my triton didn't support ARM, this can be removed when this repo is upto date with the latest build of triton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a lot of comments. I'm mostly just trying to learn things for now.
.github/workflows/test-plugin.yml
Outdated
|
||
|
||
|
||
build_and_test_triton_shared_arm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once you have this nailed down, can we refactor and resuse the same steps as above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely, this is a pretty hacky workaround but I believe with llvm build 66886578, all of this should be easy to get rid of, also I've been working on lowering linalg.matmul to ArmSVE, so really been enjoying some of that work
Great to see some Arm64 builds :) Here's the workflow I'm using right now. On MacOS, I follow the instructions on the Triton github for "Install from source" and I comment out the X86, NVPTX and AMDGPU libs in triton/CMakeLists.txt (triton-lang/triton#2922) For Ubuntu 20.04 and 22.04, I do the same as MacOS and also change triton/python/setup.py to use a native Arm64 build of LLVM (triton-lang/triton#2921). @danikhan632 shoot me an email aasmith@microsoft to talk more about Arm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to do some cleanup to this PR and run it :)
I have been working on triton-shared and have been using ubuntu-arm64, so I wanted to add a workflow to the repo that runs the ARM variant as potential optimizations can be made
A few issues with this current PR that would have to be addressed:
This is going to be an issue for where this will run:
build_and_test_triton_shared_arm: runs-on: [self-hosted] #going to need a self-hosted ubuntu-arm64 action runner steps: - name: Force Failure ...
Working on utilizing Arm MLIR dialects for faster GEMMs so would be appreciated if this could be integrated in some form