Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan file support #205

Merged
merged 114 commits into from
May 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
5c4f9be
init version
chhwang Jan 16, 2024
442966e
wip
chhwang Jan 19, 2024
53202bb
Split ark.h
chhwang Mar 21, 2024
d00fa8c
Move graph definition
chhwang Mar 21, 2024
3adc23b
Expose the whole OpGraph structure
chhwang Mar 22, 2024
838be3f
Expose the entire Op structure
chhwang Mar 22, 2024
2c59b46
lint
chhwang Mar 22, 2024
c856e4e
fix
chhwang Mar 22, 2024
20840bb
Use ark abbreviation
chhwang Mar 22, 2024
a5608fc
wip
chhwang Apr 26, 2024
f7e50f7
wip
chhwang Apr 28, 2024
03b5cf1
wip
chhwang Apr 28, 2024
1675091
reshape
chhwang Apr 28, 2024
19879ea
add cast
chhwang Apr 29, 2024
8402b44
matmul wip
chhwang Apr 29, 2024
a8e6def
matmul done
chhwang Apr 30, 2024
d2026e3
deprecate GpuContext
chhwang Apr 30, 2024
2059d26
transpose
chhwang Apr 30, 2024
482d796
reduce
chhwang Apr 30, 2024
4d1c308
embedding
chhwang Apr 30, 2024
599cd51
clean up interface wip
chhwang May 1, 2024
2b183b1
tensor interface & code cleanup
chhwang May 1, 2024
dbda660
cleanup headers
chhwang May 1, 2024
fd7a29a
Rename Dims::size into Dims::nelems
chhwang May 1, 2024
bb36944
update data type interface
chhwang May 1, 2024
786ccc8
python binding wip & minor changes
chhwang May 2, 2024
4c981cd
lint
chhwang May 2, 2024
9f2e864
quickstart tutorial works again
chhwang May 2, 2024
fae45a6
rename is_none to is_null
chhwang May 2, 2024
eb8fda1
Enable all ops & support scalar arithmetic
chhwang May 2, 2024
45b1e8d
lint
chhwang May 6, 2024
5ca6d8b
lint
chhwang May 6, 2024
45b7fdc
lint
chhwang May 6, 2024
f16c87e
communication works
chhwang May 7, 2024
e61d327
delete old file
chhwang May 7, 2024
e867f54
minor updates
chhwang May 7, 2024
b40ad8f
update
chhwang May 7, 2024
af78558
update
chhwang May 7, 2024
2e7f711
update
chhwang May 7, 2024
e5f7350
Add json submodule
chhwang May 7, 2024
f361ca5
fix
chhwang May 7, 2024
c9f2f60
lint
chhwang May 7, 2024
c831542
embed json
chhwang May 7, 2024
2e0857b
separate buffer header from tensor header
chhwang May 7, 2024
8d2e62e
embed ordered_json
chhwang May 7, 2024
f5011cc
update ci
chhwang May 7, 2024
d58fc6e
lint
chhwang May 7, 2024
fbfa962
update codecov
chhwang May 7, 2024
d50a38d
updates
chhwang May 7, 2024
bd2086e
fix broadcast
chhwang May 8, 2024
eb34fee
pylint
chhwang May 8, 2024
5e43538
fix
chhwang May 8, 2024
33a380f
fix pg sync
chhwang May 9, 2024
f973a4e
lint
chhwang May 9, 2024
1dc331e
add rope
chhwang May 9, 2024
a07d419
rename json
chhwang May 10, 2024
5f3cfa6
minor updates
chhwang May 10, 2024
efa76e8
clean up
chhwang May 10, 2024
ef6db8c
Update arch types & some fixes
chhwang May 10, 2024
a3a763e
updates
chhwang May 10, 2024
5f87d27
add div error bound & improve coverage
chhwang May 10, 2024
d62ee3e
updates
chhwang May 13, 2024
6d1d6ba
A few changes
chhwang May 13, 2024
799e5f3
Drop ipc_*
chhwang May 13, 2024
01d079b
coverage
chhwang May 13, 2024
87d4a06
beautify
chhwang May 14, 2024
394abc9
minor updates
chhwang May 16, 2024
0167e2c
minor fixes
chhwang May 16, 2024
cd15cc1
tutorial fixed & python ut wip
chhwang May 16, 2024
23c88c0
model docs wip
chhwang May 17, 2024
ae8b1ad
update json files
chhwang May 17, 2024
cc0b2ed
fixes
chhwang May 17, 2024
b1d73e5
minor updates
chhwang May 17, 2024
d0c09ef
more docs
chhwang May 17, 2024
5333da2
more docs
chhwang May 17, 2024
3f48c6e
more docs
chhwang May 17, 2024
8596dbf
updates
chhwang May 17, 2024
2daa05c
a few fixes & improvements
chhwang May 18, 2024
169c127
support padding
chhwang May 19, 2024
b4d3070
remove calculable fields
chhwang May 19, 2024
c0a7012
update docs
chhwang May 19, 2024
ce00e81
minor updates
chhwang May 20, 2024
6bd147a
Denser format
chhwang May 21, 2024
269ba60
log plan & model files
chhwang May 21, 2024
e9230b1
minor fixes
chhwang May 21, 2024
dab984c
rule install & direct copy
chhwang May 23, 2024
c1e0055
tile copy
chhwang May 23, 2024
fff9b5a
plan rule install example
chhwang May 23, 2024
e212d83
minor updates & lint
chhwang May 23, 2024
25efaef
cleanup & improve coverage
chhwang May 23, 2024
1ca5c60
CI update
chhwang May 23, 2024
1fbf38b
tackle some comments
chhwang May 23, 2024
650f15f
more comments
chhwang May 23, 2024
f1d68a8
clear warnings & minor fix
chhwang May 24, 2024
21a3310
fixes
chhwang May 24, 2024
b379b70
Minor updates & use pytest
chhwang May 24, 2024
ecd5afd
Erase comments
chhwang May 24, 2024
30f85c4
lint
chhwang May 24, 2024
3bb2fa2
fix typo
chhwang May 25, 2024
ce5d87a
more verification
chhwang May 25, 2024
920d77e
update codecov
chhwang May 25, 2024
13551e0
remove unused code
chhwang May 25, 2024
5d00c48
minor fix
chhwang May 25, 2024
576dfb0
codeql on self-hosted
chhwang May 25, 2024
fe90b9b
Update codeql.yml
chhwang May 25, 2024
4090037
Update codeql.yml
chhwang May 25, 2024
c1bb305
Update codeql.yml
chhwang May 25, 2024
e422559
Update codeql.yml
chhwang May 25, 2024
eebb2dc
Update codeql.yml
chhwang May 25, 2024
59a4ef7
Update codeql.yml
chhwang May 25, 2024
8835c2e
Update codeql.yml
chhwang May 25, 2024
f8a4896
Update codeql.yml
chhwang May 25, 2024
dd8db38
Update codeql.yml
chhwang May 25, 2024
2e45981
Update codeql.yml
chhwang May 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
14 changes: 13 additions & 1 deletion .codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,22 @@ flag_management:
carryforward: true
paths:
- ark/
- python/ark/

coverage:
status:
project:
default:
target: 80%
target: 85%
threshold: 1%

ignore:
- "/usr/*"
- "/tmp/*"
- "*/build/*"
- "*/dist-packages/*"
- "*/third_party/*"
- "*/ark/*_test.*"
- "*/examples/*"
- "*/python/unittest/*"
- "*/ark/unittest/*"
13 changes: 7 additions & 6 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ on:
- cron: '42 20 * * 4'

jobs:
analyze:
name: Analyze
analyze-cuda:
name: Analyze (CUDA)
strategy:
fail-fast: false
matrix:
language: [ 'cpp' ]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
group: ${{ github.workflow }}-cuda-${{ github.ref }}
cancel-in-progress: true
runs-on: ubuntu-latest
container:
Expand All @@ -38,7 +38,7 @@ jobs:

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}

Expand All @@ -48,10 +48,11 @@ jobs:

- name: Build
run: |
cmake -DBYPASS_GPU_CHECK=ON -DUSE_CUDA=ON .
mkdir build && cd build
cmake -DBYPASS_GPU_CHECK=ON -DUSE_CUDA=ON ..
make -j build

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{matrix.language}}"
52 changes: 33 additions & 19 deletions .github/workflows/ut-cuda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,41 +36,55 @@ jobs:
sudo nvidia-smi -ac $(nvidia-smi --query-gpu=clocks.max.memory,clocks.max.sm --format=csv,noheader,nounits -i $i | sed 's/\ //') -i $i
done

- name: UpdateSubmodules
- name: Dubious ownership exception
run: |
git config --global --add safe.directory /__w/ark/ark
git submodule foreach --recursive git reset --hard
git submodule foreach --recursive git clean -fdx
git submodule foreach git fetch
git submodule update --init --recursive

- name: Build
run: |
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug ..
make -j ut
cmake -DCMAKE_BUILD_TYPE=Debug -DBUILD_PYTHON=ON ..
make -j ut ark_py

- name: RunUT
run: |
cd build && ARK_ROOT=$PWD ARK_IGNORE_BINARY_CACHE=1 ctest --stop-on-failure --verbose --schedule-random

- name: ReportCoverage
- name: Run C++ UT
run: |
cd build
lcov --capture --directory . --output-file coverage.info
lcov --remove coverage.info \
ARK_ROOT=$PWD ctest --stop-on-failure --verbose --schedule-random
lcov --capture --directory . --output-file cpp_coverage.info
lcov --remove cpp_coverage.info \
'/usr/*' \
'/tmp/*' \
'*/build/*' \
'*/third_party/*' \
'*/ark/*_test.*' \
'*/examples/*' \
'*/python/*' \
'*/ark/unittest/unittest_utils.cc' \
--output-file coverage.info
lcov --list coverage.info
bash <(curl -s https://codecov.io/bash) -f coverage.info || echo "Codecov did not collect coverage reports"
'*/ark/unittest/unittest_utils.cpp' \
--output-file cpp_coverage.info
lcov --list cpp_coverage.info

- name: BuildPython
- name: Install Python Dependencies
run: |
python3 -m pip install -r requirements.txt

- name: Run Python UT
run: |
cd build
ARK_ROOT=$PWD pytest --cov --verbose ../python/unittest/test.py

- name: Report Coverage
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
run: |
cd build
bash <(curl -s https://codecov.io/bash) -f cpp_coverage.info || echo "Codecov did not collect C++ coverage reports"
bash <(curl -s https://codecov.io/bash) -f .coverage || echo "Codecov did not collect Python coverage reports"

- name: Install Python
run: |
python3 -m pip install .

- name: Run Tutorials
run: |
python3 ./examples/tutorial/quickstart_tutorial.py
python3 ./examples/tutorial/plan_tutorial.py
6 changes: 1 addition & 5 deletions .github/workflows/ut-rocm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,9 @@ jobs:
- name: Checkout
uses: actions/checkout@v4

- name: UpdateSubmodules
- name: Dubious ownership exception
run: |
git config --global --add safe.directory /__w/ark/ark
git submodule foreach --recursive git reset --hard
git submodule foreach --recursive git clean -fdx
git submodule foreach git fetch
git submodule update --init --recursive

- name: Build
run: |
Expand Down
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,7 @@
[submodule "third_party/mscclpp"]
path = third_party/mscclpp
url = https://github.com/microsoft/mscclpp

[submodule "third_party/json"]
path = third_party/json
url = https://github.com/nlohmann/json
14 changes: 7 additions & 7 deletions ark/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

file(GLOB_RECURSE SOURCES CONFIGURE_DEPENDS *.cc)
file(GLOB_RECURSE UT_SOURCES CONFIGURE_DEPENDS *_test.cc *_test.cu)
file(GLOB_RECURSE UT_COMMON_SOURCES CONFIGURE_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/unittest/*.cc)
file(GLOB_RECURSE SOURCES CONFIGURE_DEPENDS *.cpp)
file(GLOB_RECURSE UT_SOURCES CONFIGURE_DEPENDS *_test.cpp)
file(GLOB_RECURSE UT_COMMON_SOURCES CONFIGURE_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/unittest/*.cpp)
list(REMOVE_ITEM SOURCES ${UT_SOURCES} ${UT_COMMON_SOURCES})

if(USE_ROCM)
file(GLOB_RECURSE CU_SOURCES CONFIGURE_DEPENDS *.cu)
set_source_files_properties(${CU_SOURCES} PROPERTIES LANGUAGE CXX)
endif()

set(COMMON_LIBS ARK::numa ARK::ibverbs mscclpp mscclpp_static pthread rt)
set(COMMON_LIBS ARK::numa ARK::ibverbs pthread rt)

# ARK object
target_include_directories(ark_obj PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include)
target_include_directories(ark_obj PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
target_include_directories(ark_obj SYSTEM PRIVATE
${PROJECT_SOURCE_DIR}/third_party/json
${JSON_INCLUDE_DIRS}
${MSCCLPP_INCLUDE_DIRS}
${IBVERBS_INCLUDE_DIRS}
${NUMA_INCLUDE_DIRS}
Expand All @@ -42,7 +42,7 @@ if(USE_ROCM)
endif()

target_sources(ark_obj PRIVATE ${SOURCES})
target_link_libraries(ark_obj PRIVATE ${COMMON_LIBS})
target_link_libraries(ark_obj PUBLIC mscclpp_static PRIVATE ${COMMON_LIBS})

# ARK unit tests
foreach(ut_source IN ITEMS ${UT_SOURCES})
Expand All @@ -52,7 +52,7 @@ foreach(ut_source IN ITEMS ${UT_SOURCES})
set_target_properties(${exe_name} PROPERTIES EXCLUDE_FROM_ALL TRUE)
target_include_directories(${exe_name} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
target_include_directories(${exe_name} SYSTEM PRIVATE
${PROJECT_SOURCE_DIR}/third_party/json
${JSON_INCLUDE_DIRS}
${IBVERBS_INCLUDE_DIRS}
${NUMA_INCLUDE_DIRS}
)
Expand Down
62 changes: 62 additions & 0 deletions ark/api/data_type.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT license.

#include "ark/data_type.hpp"

#include <map>

#include "bfloat16.h"
#include "half.h"
#include "logging.h"
#include "model/model_data_type.hpp"

namespace ark {

///
/// NOTE: how to add a new data type
/// 1. Add an instance using `DATA_TYPE_INSTANCE()` macro.
/// 2. Add a registration using `DATA_TYPE_REGISTER()` macro.
/// 3. Expose the symbol in `include/ark/data_type.hpp`.
///

#define DATA_TYPE_INSTANCE(_name, _type) \
extern const DataType _name( \
std::make_shared<ModelDataT>(#_name, #_type, sizeof(_type)));

#define DATA_TYPE_REGISTER(_name) instances[#_name] = &_name;

extern const DataType NONE(std::make_shared<ModelDataT>("NONE", "void", 0));
DATA_TYPE_INSTANCE(FP32, float);
DATA_TYPE_INSTANCE(FP16, fp16);
DATA_TYPE_INSTANCE(BF16, bf16);
DATA_TYPE_INSTANCE(INT32, int32_t);
DATA_TYPE_INSTANCE(UINT32, uint32_t);
DATA_TYPE_INSTANCE(INT8, int8_t);
DATA_TYPE_INSTANCE(UINT8, uint8_t);
DATA_TYPE_INSTANCE(BYTE, char);

const DataType &DataType::from_name(const std::string &type_name) {
static std::map<std::string, const DataType *> instances;
if (instances.empty()) {
DATA_TYPE_REGISTER(NONE);
DATA_TYPE_REGISTER(FP32);
DATA_TYPE_REGISTER(FP16);
DATA_TYPE_REGISTER(BF16);
DATA_TYPE_REGISTER(INT32);
DATA_TYPE_REGISTER(UINT32);
DATA_TYPE_REGISTER(INT8);
DATA_TYPE_REGISTER(UINT8);
DATA_TYPE_REGISTER(BYTE);
}
auto it = instances.find(type_name);
if (it == instances.end()) {
ERR(InvalidUsageError, "Unknown data type: ", type_name);

Check warning on line 53 in ark/api/data_type.cpp

View check run for this annotation

Codecov / codecov/patch

ark/api/data_type.cpp#L53

Added line #L53 was not covered by tests
}
return *(it->second);
}

size_t DataType::bytes() const { return ref_->bytes(); }

const std::string &DataType::name() const { return ref_->type_name(); }

Check warning on line 60 in ark/api/data_type.cpp

View check run for this annotation

Codecov / codecov/patch

ark/api/data_type.cpp#L60

Added line #L60 was not covered by tests

} // namespace ark
Loading
Loading