Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM github workflow #66

Closed
wants to merge 0 commits into from
Closed

ARM github workflow #66

wants to merge 0 commits into from

Conversation

danikhan632
Copy link
Contributor

I have been working on triton-shared and have been using ubuntu-arm64, so I wanted to add a workflow to the repo that runs the ARM variant as potential optimizations can be made

A few issues with this current PR that would have to be addressed:

  1. Github doesn't have an arm64 workflow runner, so that’s an issue, in the integration test I'm self hosting but, it would be better if this was hosted
    This is going to be an issue for where this will run:
build_and_test_triton_shared_arm:
runs-on: [self-hosted] #going to need a self-hosted ubuntu-arm64 action runner

steps:
- name: Force Failure
...
  1. I'm currently hosting the llvm tar bundle on my own GCP bucket. It's probably a good idea that I don’t self host the llvm tarball; should probably be hosted by someone else
rev = "49af6502"
name = f"llvm-{rev}-{system_suffix}"
if system_suffix == 'ubuntu-arm64': #this is probably not a good idea to merge, but it does work
url = "https://storage.googleapis.com/compiled-blob/llvm-49af6502-ubuntu-arm64.tar.gz"
else:
url = f"https://tritonlang.blob.core.windows.net/llvm-builds/{name}.tar.gz"
return Package("llvm", name, url, "LLVM_INCLUDE_DIRS", "LLVM_LIBRARY_DIR", "LLVM_SYSPATH")
  1. I had to do a bit of an odd work around to change the setup.py for triton, this is really hacky on my part but it seemed like a temporary solution for the runner until hopefully llvm binaries are built and integrated into the setup.py
- name: Build/Install Triton
run: |
export TRITON_CODEGEN_TRITON_SHARED=1
cd triton/python
python3 -m pip install --upgrade pip
python3 -m pip install cmake==3.24
python3 -m pip install ninja
python3 -m pip uninstall -y triton
rm setup.py 
mv ../third_party/triton_shared/.github/arm-setup.py ./setup.py
python3 setup.py build

Working on utilizing Arm MLIR dialects for faster GEMMs so would be appreciated if this could be integrated in some form

@manbearian
Copy link
Collaborator

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632
Copy link
Contributor Author

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.

@manbearian
Copy link
Collaborator

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.

On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.

@manbearian
Copy link
Collaborator

manbearian commented Nov 29, 2023

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632 ,

First, ARM-hosted LLVM:

Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work.

Second, regarding ARM64 runners:

i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.

@danikhan632
Copy link
Contributor Author

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632 ,

First, ARM-hosted LLVM:

Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work.

Second, regarding ARM64 runners:

i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.

will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.

@danikhan632
Copy link
Contributor Author

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.

On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.

would love to sync up on this and hear more from your colleagues

@danikhan632
Copy link
Contributor Author

A

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632 ,
First, ARM-hosted LLVM:
Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work.
Second, regarding ARM64 runners:
i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.

will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.

Also wanted to ask if cross-compiling arm64 for the LLVM workflow is a good idea or if the workflow runner should just run on ubuntu arm64 to avoid any cross-compiling issues?

      matrix:
        config:
        - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'x64'}
        - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'arm64'} #should work
        - {runner: 'CentOS 7', runs_on: ['self-hosted', 'CPU'], target-os: 'centos', arch: 'x64'}
        - {runner: 'MacOS X64', runs_on: 'macos-12', target-os: 'macos', arch: 'x64'}
        - {runner: 'MacOS ARM64', runs_on: 'macos-12', target-os: 'macos', arch: 'arm64'}

@manbearian
Copy link
Collaborator

A

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632 ,
First, ARM-hosted LLVM:
Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work.
Second, regarding ARM64 runners:
i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.

will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.

Also wanted to ask if cross-compiling arm64 for the LLVM workflow is a good idea or if the workflow runner should just run on ubuntu arm64 to avoid any cross-compiling issues?

      matrix:
        config:
        - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'x64'}
        - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'arm64'} #should work
        - {runner: 'CentOS 7', runs_on: ['self-hosted', 'CPU'], target-os: 'centos', arch: 'x64'}
        - {runner: 'MacOS X64', runs_on: 'macos-12', target-os: 'macos', arch: 'x64'}
        - {runner: 'MacOS ARM64', runs_on: 'macos-12', target-os: 'macos', arch: 'arm64'}

I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.

@manbearian
Copy link
Collaborator

I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.

Also, i think you'll need changes below as the following steps are conditional and i believe ubuntu-arm64 won't match any of them as is.

@danikhan632
Copy link
Contributor Author

I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.

Also, i think you'll need changes below as the following steps are conditional and i believe ubuntu-arm64 won't match any of them as is.

triton-lang/triton#2726

@NathanielMcVicar
Copy link

Hi @danikhan632 , this is great to see! Please take a look at #71 and see if you can get your pipelines working on that pool. Once you do, feel free to close that PR (@manbearian will have to manually delete the workflow it created I believe, but for now it's worth keeping up for testing).

@danikhan632
Copy link
Contributor Author

danikhan632 commented Dec 1, 2023

Hi @danikhan632 , this is great to see! Please take a look at #71 and see if you can get your pipelines working on that pool. Once you do, feel free to close that PR (@manbearian will have to manually delete the workflow it created I believe, but for now it's worth keeping up for testing).

Just commited again with the 1ES arm workflow runner
(EDIT) also just fixed that mistake in the yaml so might need a re-run

@danikhan632
Copy link
Contributor Author

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.

On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.

Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned

@manbearian
Copy link
Collaborator

Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned

Hi @danikhan632 ,

First, there's convergence of two things impacting the MS teams bandwidth right now: December vacations + Big internal presentation this week. So please bear with us.

Second, i'm sorry about the annoyance around running the workflow. Since the PR is updating the workflow this requires extra permissions to run after you do this. How much are you changing the workflow each submission? Can we get a version check-in that you can use to test?

@danikhan632
Copy link
Contributor Author

Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned

Hi @danikhan632 ,

First, there's convergence of two things impacting the MS teams bandwidth right now: December vacations + Big internal presentation this week. So please bear with us.

Second, i'm sorry about the annoyance around running the workflow. Since the PR is updating the workflow this requires extra permissions to run after you do this. How much are you changing the workflow each submission? Can we get a version check-in that you can use to test?

No worries everyone is out for the holidays, so understandable. Right now the workflow kept failing because the self-hosted workflow that was setup doesn't have pip installed and I created a bit of an issue. Not sure about the version checkin-part, its up to date with current build.

run: |
rm -rf ~/.triton

- name: install pip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i presume this is an ARM thing... i'll follow to see why this is missing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NathanielMcVicar following up :)
how does this look to you?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's an ARM thing, it's that the self-hosted runners come with a bunch of software pre-installed that's not in the base image, so having to install things as part of the run when using the Marketplace image is expected. We could also use Docker if the install time got out of hand, but for now I think just installing it in the yml (or a setup .sh script) is the best approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since the runner is self-hosted and perhaps pip wasn't installed so I added that and was probably going to delete it after it ran with it once.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These runners aren't self-hosted in the sense you mean here. These are hosted by the Microsoft 1ES infrastructure (see here https://github.com/apps/1es-resource-management) and work mostly just like the github hosted runners. In particular, they are stateless, so anything like installing pip you will have to do on every run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
https://github.com/microsoft/triton-shared/actions/runs/7067334608/job/19308128182#step:10:91

looks like it doesn't have a C++ compiler either, is it just build-essential that needs to be installed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -5,11 +5,21 @@ on:
branches: [ "main" ]
push:
branches: [ "main" ]
workflow_dispatch:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the intent is to have test-plugin.yml be the dispatch point... what's this change for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @NathanielMcVicar might have suggested doing that for me to be able to manually trigger the workflows #71 but I could have misconstruded that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, but since it seems like you may not have the permissions it probably doesn't help much. Regardless seems worth allowing for people making changes to these in general (or wanting to test it in their own forks where the permission issues won't exist).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is this script just dispatches to the other script, why not just use that one?

python3 -m pip install cmake==3.24
python3 -m pip install ninja
python3 -m pip uninstall -y triton
rm setup.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the plan to get setup.py updated to handle ARM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a workaround for when my triton didn't support ARM, this can be removed when this repo is upto date with the latest build of triton

Copy link
Collaborator

@manbearian manbearian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a lot of comments. I'm mostly just trying to learn things for now.




build_and_test_triton_shared_arm:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you have this nailed down, can we refactor and resuse the same steps as above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely, this is a pretty hacky workaround but I believe with llvm build 66886578, all of this should be easy to get rid of, also I've been working on lowering linalg.matmul to ArmSVE, so really been enjoying some of that work

@aaronsm
Copy link
Contributor

aaronsm commented Jan 12, 2024

Great to see some Arm64 builds :) Here's the workflow I'm using right now.

On MacOS, I follow the instructions on the Triton github for "Install from source" and I comment out the X86, NVPTX and AMDGPU libs in triton/CMakeLists.txt (triton-lang/triton#2922)

For Ubuntu 20.04 and 22.04, I do the same as MacOS and also change triton/python/setup.py to use a native Arm64 build of LLVM (triton-lang/triton#2921).

@danikhan632 shoot me an email aasmith@microsoft to talk more about Arm.

Copy link
Contributor

@aaronsm aaronsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to do some cleanup to this PR and run it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants