diff --git a/demos/iiswc-20/diagram.png b/demos/iiswc-20/diagram.png
new file mode 120000
index 0000000..cba57f3
--- /dev/null
+++ b/demos/iiswc-20/diagram.png
@@ -0,0 +1 @@
+../../example/model/diagram.png
\ No newline at end of file
diff --git a/demos/iiswc-20/tutorial.ipynb b/demos/iiswc-20/tutorial.ipynb
new file mode 100644
index 0000000..3379d2c
--- /dev/null
+++ b/demos/iiswc-20/tutorial.ipynb
@@ -0,0 +1,1300 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "skip"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "env: PATH=/home/subh/anaconda3/bin:/home/subh/tools/llvm-10/build/bin:/home/subh/tools/llvm-10/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "%set_env PATH=/home/subh/anaconda3/bin:/home/subh/tools/llvm-10/build/bin:/home/subh/tools/llvm-10/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games\n",
+ "%cd ../.."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "# HetSim Demo\n",
+ "We will now demonstrate the use of HetSim for an example scenarios."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "* We will first change an existing target model\n",
+ "* We will then write an application for it and run it through a) detailed simulation, and b) HetSim"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "## Install Dependencies\n",
+ "* Install LLVM (version > 10.0) by following instructions from https://llvm.org/docs/GettingStarted.html\n",
+ "* Install any cross-compilers required for the target\n",
+ " * This example uses an Arm gcc that you can install by running `sudo apt install g++-arm-linux-gnueabihf`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "## Build gem5\n",
+ "We will build gem5 by running a convenience script inside `scripts/`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/scripts\n",
+ "\u001b[32m\u001b[1m[0]: Starting gem5 build for TimingSimpleCPU\u001b[m\n",
+ "\u001b[32m\u001b[1m[1]: gem5 build succeeded\u001b[m\n",
+ "\u001b[32m\u001b[1m[2]: Compiling m5threads library\u001b[m\n",
+ "Makefile:50: warning: overriding recipe for target 'test_omp'\n",
+ "Makefile:42: warning: ignoring old recipe for target 'test_omp'\n",
+ "make: '../pthread.o' is up to date.\n",
+ "\u001b[32m\u001b[1m[3]: build-gem5.sh successfully exiting\u001b[m\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd scripts\n",
+ "!VERBOSE=0 CC=/usr/bin/gcc CXX=/usr/bin/g++ bash build-gem5.sh\n",
+ "%cd .."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Construct a gem5 Model for the Target\n",
+ "Consider a programmable target composed of two types of PEs - **worker** and **manager**."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "
\n",
+ " \n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "* All PEs share a D-Cache, a DSPM, and the main memory\n",
+ "* Each PE has a private instruction cache\n",
+ "* The manager distributes work to the workers via _FIFO queues_"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Tweak the Existing Model\n",
+ "We will make two changes to the existing model.\n",
+ "* Change number of workers from 4 → 16\n",
+ "* Change the depth of each work queue from 4 → 6"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "#### Step 1: Change Macros and Python Bindings"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/example/model\n",
+ "swig -python -module params params.h\n",
+ "gcc -c -fpic params_wrap.c -I/usr/include/python2.7 -o params_wrap.o\n",
+ "gcc -shared params_wrap.o -o _params.so\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd example/model\n",
+ "# change the relevant define in params.h\n",
+ "!sed -i 's/#define NUM_WORKER.*/#define NUM_WORKER 8/g' params.h\n",
+ "!sed -i 's/#define WQ_DEPTH.*/#define WQ_DEPTH 6/g' params.h\n",
+ "# generate Python bindings\n",
+ "!make\n",
+ "%cd ../.."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "#### Step 2: Reflect Changes in User Spec and Libraries"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "* Assign PE IDs to the new worker PEs and queue IDs for the queues corresponding the new workers"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# this step should be done manually!\n",
+ "!sed -i 's/\\[1, 2, 3, 4\\]/[1, 2, 3, 4, 5, 6, 7, 8]/g' spec/spec.json"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "* Generate code to connect the queues in the `gem5` model and the emulation and TRE libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/scripts\n",
+ "\u001b[32m\u001b[1m[0]: Starting gem5 build for TimingSimpleCPU\u001b[m\n",
+ "\u001b[32m\u001b[1m[1]: gem5 build succeeded\u001b[m\n",
+ "\u001b[32m\u001b[1m[2]: Compiling m5threads library\u001b[m\n",
+ "Makefile:50: warning: overriding recipe for target 'test_omp'\n",
+ "Makefile:42: warning: ignoring old recipe for target 'test_omp'\n",
+ "make: '../pthread.o' is up to date.\n",
+ "\u001b[32m\u001b[1m[3]: build-gem5.sh successfully exiting\u001b[m\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd scripts\n",
+ "!python2 populate_init_queues.py\n",
+ "!VERBOSE=0 bash build-gem5.sh\n",
+ "%cd .."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "#### Step 3: Regenerate Compiler Plugin\n",
+ "Finally, we regenerate the compiler plugin and build the tracing library."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/scripts\n",
+ "/home/subh/research/hetsim-rel/tracer\n",
+ "/home/subh/research/hetsim-rel/tracer/build\n",
+ "/usr/bin/ar: creating t.a\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target LLVMHetsim\u001b[0m\n",
+ "[ 20%] \u001b[32mBuilding CXX object compiler-pass/CMakeFiles/LLVMHetsim.dir/hetsim-analysis.cpp.o\u001b[0m\n",
+ "[ 40%] \u001b[32mBuilding CXX object compiler-pass/CMakeFiles/LLVMHetsim.dir/hetsim-codegen.cpp.o\u001b[0m\n",
+ "[ 60%] \u001b[32m\u001b[1mLinking CXX shared module LLVMHetsim.so\u001b[0m\n",
+ "[ 60%] Built target LLVMHetsim\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target hetsim_default_rt\u001b[0m\n",
+ "[ 80%] \u001b[32mBuilding CXX object runtime/default/CMakeFiles/hetsim_default_rt.dir/hetsim_default_rt.cpp.o\u001b[0m\n",
+ "[100%] \u001b[32m\u001b[1mLinking CXX shared library libhetsim_default_rt.so\u001b[0m\n",
+ "[100%] Built target hetsim_default_rt\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd scripts\n",
+ "!python generate_model.py ../spec/spec.json > /dev/null\n",
+ "%cd ../tracer\n",
+ "%mkdir -p build\n",
+ "%cd build\n",
+ "!cmake .. > /dev/null && make\n",
+ "%cd ../.."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Write Application for the Target\n",
+ "We will write a program to do **vector addition** on the target hardware."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Header Files\n",
+ "```Cpp\n",
+ "#include \"params.h\" // import parameters of target hardware as macros\n",
+ "#include \"util.h\" // import primitive definitions\n",
+ "#include \n",
+ "#include \n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Boilerplate Initialization\n",
+ "```Cpp\n",
+ "\n",
+ "void *work(void *arg) { // manager \"spawns\" worker threads with tid=1,2,3...\n",
+ " unsigned tid = *(unsigned *)(arg);\n",
+ " __register_core_id(tid);\n",
+ " ...\n",
+ "}\n",
+ "int main() {\n",
+ " __init_queues(WQ_DEPTH);\n",
+ " __register_core_id(0); // manager is assigned core-id 0\n",
+ " ...\n",
+ " __teardown_queues();\n",
+ " return 0;\n",
+ "}\n",
+ "```\n",
+ "\n",
+ " \n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "#### Note\n",
+ "The `core_id` must be the same across the application, `spec.json` and `target.py`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " \"mgr\": {\r\n",
+ " \"id\": [0],\r\n",
+ " \"__push(unsigned int, unsigned long)\" : {\r\n",
+ "--\r\n",
+ " \"wrkr\": {\r\n",
+ " \"id\": [1, 2, 3, 4, 5, 6, 7, 8],\r\n",
+ " \"__pop(unsigned int)\": {\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "# spec.json\n",
+ "!grep -A1 -B1 \"id\" spec/spec.json"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " system.mgr = TRE(\r\n",
+ " id=0, # This must correspond to the ID assigned in the user spec file.\r\n",
+ " queue_depth=WQ_DEPTH,\r\n",
+ "--\r\n",
+ " wrkr.append(TRE(\r\n",
+ " id=i+1, # This must correspond to the ID assigned in the user spec file.\r\n",
+ " max_outstanding_addrs=MAX_OUTSTANDING_REQS\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "# target.py\n",
+ "!grep -A1 -B1 \"id=\" example/model/target.py"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Manager PE code: `main()`\n",
+ "Memory allocation is done at the beginning part of `main()`.\n",
+ "\n",
+ "```Cpp\n",
+ " // main memory allocation\n",
+ " // in this example, we are working with 3 float arrays each of size N\n",
+ " size_t RAM_SIZE_BYTES = 3 * N * sizeof(float);\n",
+ " char *ram = (char *)mmap((void *)(RAM_BASE_ADDR), RAM_SIZE_BYTES,\n",
+ " PROT_READ | PROT_WRITE | PROT_EXEC, \n",
+ " MAP_ANON | MAP_PRIVATE, 0, 0);\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ " // scratchpad memory allocation\n",
+ "#ifdef EMULATION\n",
+ " // for emulation\n",
+ " char *dspm = (char *)mmap((void *)(SPM_BASE_ADDR), SPM_SIZE_BYTES,\n",
+ " PROT_READ | PROT_WRITE | PROT_EXEC, \n",
+ " MAP_ANON | MAP_PRIVATE, 0, 0);\n",
+ "#else // !EMULATION\n",
+ " // the model uses physically-addressed scratchpad that does not need explicit allocation\n",
+ " char *dspm = (char *)SPM_BASE_ADDR;\n",
+ "#endif // EMULATION\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "We \"allocate\" the input and output vectors at the pre-allocated main memory and initialize them.\n",
+ "```Cpp\n",
+ " // allocate the vectors and populate them\n",
+ " float *a = (float *)(ram);\n",
+ " float *b = (float *)(ram + N * sizeof(float));\n",
+ " float *c = (float *)(ram + 2 * N * sizeof(float));\n",
+ " for (int i = 0; i < N; ++i) {\n",
+ " a[i] = float(i + 1);\n",
+ " b[i] = float(i + 1);\n",
+ " c[i] = 0.0;\n",
+ " }\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "For illustration, we allocate a barrier in the shared DSPM.\n",
+ "```Cpp\n",
+ " // allocate barrier object for synchronization\n",
+ " pthread_barrier_t *bar = (pthread_barrier_t *)(dspm);\n",
+ " // initialize barrier with participants = 1 manager + NUM_WORKER workers\n",
+ " __barrier_init(bar, NUM_WORKER + 1); \n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Next, we allocate and spawn the worker threads.\n",
+ "```Cpp\n",
+ " // allocate thread objects for each \"worker\" PE\n",
+ " pthread_t *workers = new pthread_t[NUM_WORKER];\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ " // create vector of core IDs to send to each thread\n",
+ " unsigned *tids = new unsigned[NUM_WORKER];\n",
+ " for (int i = 0; i < NUM_WORKER; ++i) {\n",
+ " tids[i] = i + 1;\n",
+ " // spawn worker thread\n",
+ " pthread_create(workers + i, NULL, work, &tids[i]);\n",
+ " }\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "The most important part of the code for the manager -- distribute and push work to the workers!\n",
+ "```Cpp\n",
+ " // partition the work and push work \"packets\"\n",
+ " for (int i = 0; i < NUM_WORKER; ++i) {\n",
+ " // each worker is assigned floor(N / NUM_WORKER) elements\n",
+ " int n = N / NUM_WORKER;\n",
+ " int start_idx = i * n;\n",
+ " int end_idx = (i + 1) * n - 1;\n",
+ "\n",
+ " // handle trailing elements by assigning to final worker\n",
+ " if (i == NUM_WORKER - 1) {\n",
+ " end_idx = N - 1;\n",
+ " }\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ " // push through work queues\n",
+ " __push(i + 1, (uintptr_t)(a));\n",
+ " __push(i + 1, (uintptr_t)(b));\n",
+ " __push(i + 1, (uintptr_t)(c));\n",
+ " __push(i + 1, (unsigned)(start_idx));\n",
+ " __push(i + 1, (unsigned)(end_idx));\n",
+ " __push(i + 1, (uintptr_t)(bar));\n",
+ " }\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ "// ----- ROI begin -----\n",
+ " __reset_stats(); // begin recording time here\n",
+ " for (int i = 0; i < NUM_WORKER; ++i) {\n",
+ " __push(i + 1, 0); // start signal, value is ignored\n",
+ " }\n",
+ " __barrier_wait(bar); // synchronize with worker threads\n",
+ "\n",
+ " __dump_reset_stats(); // end recording time here\n",
+ "// ----- ROI end -----\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ " // join with all threads\n",
+ " for (int tid = 0; tid < NUM_WORKER; ++tid) {\n",
+ " pthread_join(workers[tid], NULL);\n",
+ " }\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ " // clean up\n",
+ "#ifdef EMULATION\n",
+ " munmap(dspm, SPM_SIZE_BYTES);\n",
+ "#endif // EMULATION\n",
+ " munmap(ram, RAM_SIZE_BYTES);\n",
+ " delete[] workers;\n",
+ " delete[] tids;\n",
+ " __teardown_queues();\n",
+ "} // end of main()\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Worker PE Code: `work()`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ "void *work(void *arg) {\n",
+ " unsigned tid = *(unsigned *)(arg);\n",
+ " __register_core_id(tid);\n",
+ "\n",
+ " // retrieve variables from work queue\n",
+ " volatile float *a = (volatile float *)__pop(0);\n",
+ " volatile float *b = (volatile float *)__pop(0);\n",
+ " volatile float *c = (volatile float *)__pop(0);\n",
+ " int start_idx = (int)__pop(0);\n",
+ " int end_idx = (int)__pop(0);\n",
+ " pthread_barrier_t *bar = (pthread_barrier_t *)__pop(0);\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ " // receive start signal\n",
+ " __pop(0);\n",
+ "\n",
+ " // perform actual computation\n",
+ " for (int i = start_idx; i <= end_idx; ++i) {\n",
+ " c[i] += a[i] + b[i];\n",
+ " }\n",
+ "\n",
+ " // synchronize with manager\n",
+ " __barrier_wait(bar);\n",
+ "\n",
+ " return NULL;\n",
+ "} // end of work()\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Verify Functionality of Emulated Code"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/emu\n",
+ "/home/subh/research/hetsim-rel/emu/build\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target hetsim_prim\u001b[0m\n",
+ "[ 50%] \u001b[32mBuilding CXX object CMakeFiles/hetsim_prim.dir/src/util.cpp.o\u001b[0m\n",
+ "[100%] \u001b[32m\u001b[1mLinking CXX shared library libhetsim_prim.so\u001b[0m\n",
+ "[100%] Built target hetsim_prim\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "# build emulator library\n",
+ "%cd emu\n",
+ "!mkdir -p build\n",
+ "%cd build\n",
+ "!cmake .. > /dev/null && make\n",
+ "%cd ../.."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/example/app\n",
+ "/home/subh/research/hetsim-rel/example/app/build\n",
+ "MODE set to EMU\n",
+ "Processing application: serial_factorial\n",
+ "Processing application: vector_add\n",
+ "Processing application: workq_mutex\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target workq_mutex\u001b[0m\n",
+ "[ 16%] \u001b[32mBuilding CXX object CMakeFiles/workq_mutex.dir/src/workq_mutex.cpp.o\u001b[0m\n",
+ "[ 33%] \u001b[32m\u001b[1mLinking CXX executable workq_mutex\u001b[0m\n",
+ "[ 33%] Built target workq_mutex\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target serial_factorial\u001b[0m\n",
+ "[ 50%] \u001b[32mBuilding CXX object CMakeFiles/serial_factorial.dir/src/serial_factorial.cpp.o\u001b[0m\n",
+ "[ 66%] \u001b[32m\u001b[1mLinking CXX executable serial_factorial\u001b[0m\n",
+ "[ 66%] Built target serial_factorial\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target vector_add\u001b[0m\n",
+ "[ 83%] \u001b[32mBuilding CXX object CMakeFiles/vector_add.dir/src/vector_add.cpp.o\u001b[0m\n",
+ "[100%] \u001b[32m\u001b[1mLinking CXX executable vector_add\u001b[0m\n",
+ "[100%] Built target vector_add\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "# build application with emulation library\n",
+ "%cd example/app\n",
+ "%rm -rf build\n",
+ "%mkdir -p build\n",
+ "%cd build\n",
+ "!CC=/usr/bin/gcc CXX=/usr/bin/g++ MODE=EMU cmake .. > /dev/null && make\n",
+ "%cd ../../.."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/example/app/build\n",
+ "== Vector Add Test with N = 100000, NUM_WORKER = 8\n",
+ "== Test Passed ==\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "# run emulated application for functional verification\n",
+ "%cd example/app/build\n",
+ "!./vector_add\n",
+ "%cd ../../.."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Simulation on Detailed Model"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "skip"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "env: CMAKE_C_COMPILER=/usr/bin/arm-linux-gnueabihf-gcc\n",
+ "env: CMAKE_CXX_COMPILER=/usr/bin/arm-linux-gnueabihf-g++\n"
+ ]
+ }
+ ],
+ "source": [
+ "%set_env CMAKE_C_COMPILER=/usr/bin/arm-linux-gnueabihf-gcc\n",
+ "%set_env CMAKE_CXX_COMPILER=/usr/bin/arm-linux-gnueabihf-g++"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/example/app\n",
+ "/home/subh/research/hetsim-rel/example/app/build\n",
+ "MODE set to SIM\n",
+ "Processing application: serial_factorial\n",
+ "Processing application: vector_add\n",
+ "Processing application: workq_mutex\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target m5threads\u001b[0m\n",
+ "[ 12%] \u001b[32mBuilding C object CMakeFiles/m5threads.dir/home/subh/research/hetsim-rel/m5threads/pthread.c.o\u001b[0m\n",
+ "\u001b[01m\u001b[K/home/subh/research/hetsim-rel/m5threads/pthread.c:\u001b[m\u001b[K In function ‘\u001b[01m\u001b[Kpthread_create\u001b[m\u001b[K’:\n",
+ "\u001b[01m\u001b[K/home/subh/research/hetsim-rel/m5threads/pthread.c:256:3:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kimplicit declaration of function ‘\u001b[01m\u001b[Kclone\u001b[m\u001b[K’ [-Wimplicit-function-declaration]\n",
+ " clone(__pthread_trampoline, tcb->stack_start_addr, CLONE_VM|CLONE_FS|CLONE_FI\n",
+ "\u001b[01;32m\u001b[K ^\u001b[m\u001b[K\n",
+ "[ 25%] \u001b[32m\u001b[1mLinking C static library libm5threads.a\u001b[0m\n",
+ "[ 25%] Built target m5threads\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target serial_factorial\u001b[0m\n",
+ "[ 37%] \u001b[32mBuilding CXX object CMakeFiles/serial_factorial.dir/src/serial_factorial.cpp.o\u001b[0m\n",
+ "[ 50%] \u001b[32m\u001b[1mLinking CXX executable serial_factorial\u001b[0m\n",
+ "[ 50%] Built target serial_factorial\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target workq_mutex\u001b[0m\n",
+ "[ 62%] \u001b[32mBuilding CXX object CMakeFiles/workq_mutex.dir/src/workq_mutex.cpp.o\u001b[0m\n",
+ "[ 75%] \u001b[32m\u001b[1mLinking CXX executable workq_mutex\u001b[0m\n",
+ "[ 75%] Built target workq_mutex\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target vector_add\u001b[0m\n",
+ "[ 87%] \u001b[32mBuilding CXX object CMakeFiles/vector_add.dir/src/vector_add.cpp.o\u001b[0m\n",
+ "[100%] \u001b[32m\u001b[1mLinking CXX executable vector_add\u001b[0m\n",
+ "[100%] Built target vector_add\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "# build application with emulation library\n",
+ "%cd example/app\n",
+ "%rm -rf build\n",
+ "%mkdir -p build\n",
+ "%cd build\n",
+ "!MODE=SIM cmake .. > /dev/null && make\n",
+ "%cd ../../../"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/scripts\n",
+ "...\n",
+ "== Vector Add Test with N = 100000, NUM_WORKER = 8\n",
+ "warn: PowerState: Already in the requested power state, request ignored\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "warn: User mode does not have SPSR\n",
+ "== Test Passed ==\n",
+ "Exiting @ tick 9534907382 because exiting with last active thread context\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "# run gem5 simulation\n",
+ "%cd scripts\n",
+ "!MODE=SIM APP=vector_add bash run-gem5.sh > /dev/null\n",
+ "!echo \"...\" && tail -n20 ../gem5/m5out/run.log\n",
+ "%cd ../\n",
+ "### Gather Runtime\n",
+ "%mkdir -p res\n",
+ "%cp gem5/m5out/stats.txt res/stats.det.txt\n",
+ "!grep sim_ticks res/stats.det.txt | head -n1 | tr -s ' ' | cut -d' ' -f2 > ticks.det.txt"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "skip"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "env: CMAKE_C_COMPILER=/usr/bin/gcc\n",
+ "env: CMAKE_CXX_COMPILER=/usr/bin/g++\n"
+ ]
+ }
+ ],
+ "source": [
+ "%set_env CMAKE_C_COMPILER=/usr/bin/gcc\n",
+ "%set_env CMAKE_CXX_COMPILER=/usr/bin/g++"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Run Trace Generation\n",
+ "We will first make the required changes to enable tracing in the CPP program that we wrote and then run the trace generation step."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ "#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)\n",
+ "#include \"hetsim_default_rt.h\"\n",
+ "#endif\n",
+ "\n",
+ "void *work(void *arg) {\n",
+ " unsigned tid = *(unsigned *)(arg);\n",
+ " __register_core_id(tid);\n",
+ "#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)\n",
+ " __open_trace_log(tid);\n",
+ "#endif // AUTO_TRACING || MANUAL_TRACING\n",
+ " // ROI begin\n",
+ " ...\n",
+ " // ROI end\n",
+ "#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)\n",
+ " __close_trace_log(tid);\n",
+ "#endif // AUTO_TRACING || MANUAL_TRACING\n",
+ "}\n",
+ "\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "```Cpp\n",
+ "int main() {\n",
+ " ...\n",
+ "#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)\n",
+ " __open_trace_log(0); // use core-id as argument\n",
+ "#endif\n",
+ " ...\n",
+ " __barrier_init(bar, NUM_WORKER + 1); // set number of participants to 1 manager + NUM_WORKER worker PEs\n",
+ " ... \n",
+ " __dump_reset_stats(); // end recording time here\n",
+ " // ----- ROI end -----\n",
+ "\n",
+ "#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)\n",
+ " __close_trace_log(0);\n",
+ "#endif // AUTO_TRACING || MANUAL_TRACING\n",
+ " ...\n",
+ "} // end of main()\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/example/app/build\n",
+ "MODE set to EMU_AUTO_TRACE\n",
+ "Processing application: serial_factorial\n",
+ "Processing application: vector_add\n",
+ "Processing application: workq_mutex\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target workq_mutex\u001b[0m\n",
+ "[ 16%] \u001b[32mBuilding CXX object CMakeFiles/workq_mutex.dir/src/workq_mutex.cpp.o\u001b[0m\n",
+ "\u001b[1m/home/subh/research/hetsim-rel/example/app/src/workq_mutex.cpp:36:9: \u001b[0m\u001b[0;1;35mwarning: \u001b[0m\u001b[1mTracing enabled for this run [-W#pragma-messages]\u001b[0m\n",
+ "#pragma message(\"Tracing enabled for this run\")\n",
+ "\u001b[0;1;32m ^\n",
+ "\u001b[0m1 warning generated.\n",
+ "[ 33%] \u001b[32m\u001b[1mLinking CXX executable workq_mutex\u001b[0m\n",
+ "[ 33%] Built target workq_mutex\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target serial_factorial\u001b[0m\n",
+ "[ 50%] \u001b[32mBuilding CXX object CMakeFiles/serial_factorial.dir/src/serial_factorial.cpp.o\u001b[0m\n",
+ "\u001b[1m/home/subh/research/hetsim-rel/example/app/src/serial_factorial.cpp:35:9: \u001b[0m\u001b[0;1;35mwarning: \u001b[0m\u001b[1mTracing enabled for this run [-W#pragma-messages]\u001b[0m\n",
+ "#pragma message(\"Tracing enabled for this run\")\n",
+ "\u001b[0;1;32m ^\n",
+ "\u001b[0m1 warning generated.\n",
+ "[ 66%] \u001b[32m\u001b[1mLinking CXX executable serial_factorial\u001b[0m\n",
+ "[ 66%] Built target serial_factorial\n",
+ "\u001b[35m\u001b[1mScanning dependencies of target vector_add\u001b[0m\n",
+ "[ 83%] \u001b[32mBuilding CXX object CMakeFiles/vector_add.dir/src/vector_add.cpp.o\u001b[0m\n",
+ "[100%] \u001b[32m\u001b[1mLinking CXX executable vector_add\u001b[0m\n",
+ "[100%] Built target vector_add\n",
+ "== Vector Add Test with N = 100000, NUM_WORKER = 8\n",
+ "== Test Passed ==\n"
+ ]
+ }
+ ],
+ "source": [
+ "# run trace generation\n",
+ "%cd example/app/build\n",
+ "!mkdir -p traces\n",
+ "%rm -f CMakeCache.txt\n",
+ "!MODE=EMU_AUTO_TRACE cmake .. > /dev/null && make\n",
+ "!./vector_add"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "==> traces/pe_0.trace <==\n",
+ "BARINIT 0xe0101000 9 10\n",
+ "ST @11 0x1dfdf60 ( )\n",
+ "ST @12 0x1dfdf64 ( )\n",
+ "ST @13 0x1dfdf68 ( )\n",
+ "ST @14 0x1dfdf6c ( )\n",
+ "ST @15 0x1dfdf70 ( )\n",
+ "ST @16 0x1dfdf74 ( )\n",
+ "ST @17 0x1dfdf78 ( )\n",
+ "ST @18 0x1dfdf7c ( )\n",
+ "PUSH 1 1\n",
+ "\n",
+ "==> traces/pe_1.trace <==\n",
+ "POP 0 1\n",
+ "POP 0 1\n",
+ "POP 0 1\n",
+ "POP 0 1\n",
+ "POP 0 1\n",
+ "POP 0 1\n",
+ "POP 0 1\n",
+ "STALL 3 ( )\n",
+ "LD @1 0x40000000 ( )\n",
+ "LD @2 0x40061a80 ( 0x40000000 )\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "### Trace Format\n",
+ "!head -n10 traces/pe_[01].trace \n",
+ "%cd ../../../"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Run Trace Replay\n",
+ "We will now run the generated traces through the TRE-enabled `gem5` model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/subh/research/hetsim-rel/scripts\n",
+ "...\n",
+ "TRE[6]: halted @694256563 after completing 100005 trace entries\n",
+ "Number of TREs IDLE now: 1/9\n",
+ "TRE[8]: halted @694258565 after completing 100005 trace entries\n",
+ "Number of TREs IDLE now: 2/9\n",
+ "TRE[1]: halted @694260567 after completing 100005 trace entries\n",
+ "Number of TREs IDLE now: 3/9\n",
+ "TRE[3]: halted @694262569 after completing 100005 trace entries\n",
+ "Number of TREs IDLE now: 4/9\n",
+ "TRE[2]: halted @694264571 after completing 100005 trace entries\n",
+ "Number of TREs IDLE now: 5/9\n",
+ "TRE[7]: halted @694266573 after completing 100005 trace entries\n",
+ "Number of TREs IDLE now: 6/9\n",
+ "TRE[5]: halted @694268575 after completing 100005 trace entries\n",
+ "Number of TREs IDLE now: 7/9\n",
+ "TRE[4]: halted @694270577 after completing 100005 trace entries\n",
+ "Number of TREs IDLE now: 8/9\n",
+ "TRE[0]: triggered DMPRST\n",
+ "TRE[0]: halted @694273580 after completing 0 trace entries\n",
+ "Number of TREs IDLE now: 9/9\n",
+ "Exiting @ tick 694273580 because all TREs are done\n",
+ "/home/subh/research/hetsim-rel\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd scripts\n",
+ "!MODE=EMU_TRACE APP=vector_add bash run-gem5.sh > /dev/null\n",
+ "!echo \"...\" && tail -n20 ../gem5/m5out/run.log\n",
+ "%cd .."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "#### Comparison between Detailed and HetSim Runs\n",
+ "As the final step, we will compare the runtime between the detailed `gem5` run and the TRE-enabled `gem5` run."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "752808056\n",
+ "694044351\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cp gem5/m5out/stats.txt res/stats.tre.txt\n",
+ "!grep sim_ticks res/stats.tre.txt | head -n1 | tr -s ' ' | cut -d' ' -f2 > ticks.tre.txt\n",
+ "!cat ticks.det.txt\n",
+ "!cat ticks.tre.txt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "> This Jupyter Notebook is available in the public repository: https://github.com/umich-cadre/HetSim-gem5/demos/iiswc-20/tutorial.ipynb"
+ ]
+ }
+ ],
+ "metadata": {
+ "celltoolbar": "Slideshow",
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/demos/iiswc-20/tutorial.slides.html b/demos/iiswc-20/tutorial.slides.html
new file mode 100644
index 0000000..59a4e7c
--- /dev/null
+++ b/demos/iiswc-20/tutorial.slides.html
@@ -0,0 +1,14542 @@
+
+
+
+
+
+
+
+
+
+
+
+
+tutorial slides
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
HetSim Demo¶ We will now demonstrate the use of HetSim for an example scenarios.
+
+
+
+
+
+
+
+
+We will first change an existing target model
+We will then write an application for it and run it through a) detailed simulation, and b) HetSim
+
+
+
+
+
+
+
+
+
Install Dependencies¶
+Install LLVM (version > 10.0) by following instructions from https://llvm.org/docs/GettingStarted.html
+Install any cross-compilers required for the target
+This example uses an Arm gcc that you can install by running sudo apt install g++-arm-linux-gnueabihf
+
+
+
+
+
+
+
+
+
+
+
Build gem5¶ We will build gem5 by running a convenience script inside scripts/
.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/scripts
+[0]: Starting gem5 build for TimingSimpleCPU
+[1]: gem5 build succeeded
+[2]: Compiling m5threads library
+Makefile:50: warning: overriding recipe for target 'test_omp'
+Makefile:42: warning: ignoring old recipe for target 'test_omp'
+make: '../pthread.o' is up to date.
+[3]: build-gem5.sh successfully exiting
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
Construct a gem5 Model for the Target¶ Consider a programmable target composed of two types of PEs - worker and manager .
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+All PEs share a D-Cache, a DSPM, and the main memory
+Each PE has a private instruction cache
+The manager distributes work to the workers via FIFO queues
+
+
+
+
+
+
+
+
+
Tweak the Existing Model¶ We will make two changes to the existing model.
+
+Change number of workers from 4 → 16
+Change the depth of each work queue from 4 → 6
+
+
+
+
+
+
+
+
+
Step 1: Change Macros and Python Bindings¶
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/example/model
+swig -python -module params params.h
+gcc -c -fpic params_wrap.c -I/usr/include/python2.7 -o params_wrap.o
+gcc -shared params_wrap.o -o _params.so
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
Step 2: Reflect Changes in User Spec and Libraries¶
+
+
+
+
+
+
+
+Assign PE IDs to the new worker PEs and queue IDs for the queues corresponding the new workers
+
+
+
+
+
+
+
+
+
+Generate code to connect the queues in the gem5
model and the emulation and TRE libraries
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/scripts
+[0]: Starting gem5 build for TimingSimpleCPU
+[1]: gem5 build succeeded
+[2]: Compiling m5threads library
+Makefile:50: warning: overriding recipe for target 'test_omp'
+Makefile:42: warning: ignoring old recipe for target 'test_omp'
+make: '../pthread.o' is up to date.
+[3]: build-gem5.sh successfully exiting
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
Step 3: Regenerate Compiler Plugin¶ Finally, we regenerate the compiler plugin and build the tracing library.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/scripts
+/home/subh/research/hetsim-rel/tracer
+/home/subh/research/hetsim-rel/tracer/build
+/usr/bin/ar: creating t.a
+Scanning dependencies of target LLVMHetsim
+[ 20%] Building CXX object compiler-pass/CMakeFiles/LLVMHetsim.dir/hetsim-analysis.cpp.o
+[ 40%] Building CXX object compiler-pass/CMakeFiles/LLVMHetsim.dir/hetsim-codegen.cpp.o
+[ 60%] Linking CXX shared module LLVMHetsim.so
+[ 60%] Built target LLVMHetsim
+Scanning dependencies of target hetsim_default_rt
+[ 80%] Building CXX object runtime/default/CMakeFiles/hetsim_default_rt.dir/hetsim_default_rt.cpp.o
+[100%] Linking CXX shared library libhetsim_default_rt.so
+[100%] Built target hetsim_default_rt
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
Write Application for the Target¶ We will write a program to do vector addition on the target hardware.
+
+
+
+
+
+
+
+
#include "params.h" // import parameters of target hardware as macros
+#include "util.h" // import primitive definitions
+#include <pthread.h>
+#include <sys/mman.h>
+
+
+
+
+
+
+
+
+
Boilerplate Initialization¶ void * work ( void * arg ) { // manager "spawns" worker threads with tid=1,2,3...
+ unsigned tid = * ( unsigned * )( arg );
+ __register_core_id ( tid );
+ ...
+}
+int main () {
+ __init_queues ( WQ_DEPTH );
+ __register_core_id ( 0 ); // manager is assigned core-id 0
+ ...
+ __teardown_queues ();
+ return 0 ;
+}
+
+
+
+
+
+
+
+
+
Note¶ The core_id
must be the same across the application, spec.json
and target.py
.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
"mgr": {
+ "id": [0],
+ "__push(unsigned int, unsigned long)" : {
+--
+ "wrkr": {
+ "id": [1, 2, 3, 4, 5, 6, 7, 8],
+ "__pop(unsigned int)": {
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
system.mgr = TRE(
+ id=0, # This must correspond to the ID assigned in the user spec file.
+ queue_depth=WQ_DEPTH,
+--
+ wrkr.append(TRE(
+ id=i+1, # This must correspond to the ID assigned in the user spec file.
+ max_outstanding_addrs=MAX_OUTSTANDING_REQS
+
+
+
+
+
+
+
+
+
+
+
+
Manager PE code: main()
¶ Memory allocation is done at the beginning part of main()
.
+
// main memory allocation
+ // in this example, we are working with 3 float arrays each of size N
+ size_t RAM_SIZE_BYTES = 3 * N * sizeof ( float );
+ char * ram = ( char * ) mmap (( void * )( RAM_BASE_ADDR ), RAM_SIZE_BYTES ,
+ PROT_READ | PROT_WRITE | PROT_EXEC ,
+ MAP_ANON | MAP_PRIVATE , 0 , 0 );
+
+
+
+
+
+
+
+
+
// scratchpad memory allocation
+#ifdef EMULATION
+ // for emulation
+ char * dspm = ( char * ) mmap (( void * )( SPM_BASE_ADDR ), SPM_SIZE_BYTES ,
+ PROT_READ | PROT_WRITE | PROT_EXEC ,
+ MAP_ANON | MAP_PRIVATE , 0 , 0 );
+#else // !EMULATION
+ // the model uses physically-addressed scratchpad that does not need explicit allocation
+ char * dspm = ( char * ) SPM_BASE_ADDR ;
+#endif // EMULATION
+
+
+
+
+
+
+
+
+
We "allocate" the input and output vectors at the pre-allocated main memory and initialize them.
+
// allocate the vectors and populate them
+ float * a = ( float * )( ram );
+ float * b = ( float * )( ram + N * sizeof ( float ));
+ float * c = ( float * )( ram + 2 * N * sizeof ( float ));
+ for ( int i = 0 ; i < N ; ++ i ) {
+ a [ i ] = float ( i + 1 );
+ b [ i ] = float ( i + 1 );
+ c [ i ] = 0.0 ;
+ }
+
+
+
+
+
+
+
+
+
For illustration, we allocate a barrier in the shared DSPM.
+
// allocate barrier object for synchronization
+ pthread_barrier_t * bar = ( pthread_barrier_t * )( dspm );
+ // initialize barrier with participants = 1 manager + NUM_WORKER workers
+ __barrier_init ( bar , NUM_WORKER + 1 );
+
+
+
+
+
+
+
+
+
Next, we allocate and spawn the worker threads.
+
// allocate thread objects for each "worker" PE
+ pthread_t * workers = new pthread_t [ NUM_WORKER ];
+
+
+
+
+
+
+
+
+
// create vector of core IDs to send to each thread
+ unsigned * tids = new unsigned [ NUM_WORKER ];
+ for ( int i = 0 ; i < NUM_WORKER ; ++ i ) {
+ tids [ i ] = i + 1 ;
+ // spawn worker thread
+ pthread_create ( workers + i , NULL , work , & tids [ i ]);
+ }
+
+
+
+
+
+
+
+
+
The most important part of the code for the manager -- distribute and push work to the workers!
+
// partition the work and push work "packets"
+ for ( int i = 0 ; i < NUM_WORKER ; ++ i ) {
+ // each worker is assigned floor(N / NUM_WORKER) elements
+ int n = N / NUM_WORKER ;
+ int start_idx = i * n ;
+ int end_idx = ( i + 1 ) * n - 1 ;
+
+ // handle trailing elements by assigning to final worker
+ if ( i == NUM_WORKER - 1 ) {
+ end_idx = N - 1 ;
+ }
+
+
+
+
+
+
+
+
+
// push through work queues
+ __push ( i + 1 , ( uintptr_t )( a ));
+ __push ( i + 1 , ( uintptr_t )( b ));
+ __push ( i + 1 , ( uintptr_t )( c ));
+ __push ( i + 1 , ( unsigned )( start_idx ));
+ __push ( i + 1 , ( unsigned )( end_idx ));
+ __push ( i + 1 , ( uintptr_t )( bar ));
+ }
+
+
+
+
+
+
+
+
+
// ----- ROI begin -----
+ __reset_stats (); // begin recording time here
+ for ( int i = 0 ; i < NUM_WORKER ; ++ i ) {
+ __push ( i + 1 , 0 ); // start signal, value is ignored
+ }
+ __barrier_wait ( bar ); // synchronize with worker threads
+
+ __dump_reset_stats (); // end recording time here
+// ----- ROI end -----
+
+
+
+
+
+
+
+
+
// join with all threads
+ for ( int tid = 0 ; tid < NUM_WORKER ; ++ tid ) {
+ pthread_join ( workers [ tid ], NULL );
+ }
+
+
+
+
+
+
+
+
+
// clean up
+#ifdef EMULATION
+ munmap ( dspm , SPM_SIZE_BYTES );
+#endif // EMULATION
+ munmap ( ram , RAM_SIZE_BYTES );
+ delete [] workers ;
+ delete [] tids ;
+ __teardown_queues ();
+} // end of main()
+
+
+
+
+
+
+
+
+
Worker PE Code: work()
¶
+
+
+
+
+
+
+
void * work ( void * arg ) {
+ unsigned tid = * ( unsigned * )( arg );
+ __register_core_id ( tid );
+
+ // retrieve variables from work queue
+ volatile float * a = ( volatile float * ) __pop ( 0 );
+ volatile float * b = ( volatile float * ) __pop ( 0 );
+ volatile float * c = ( volatile float * ) __pop ( 0 );
+ int start_idx = ( int ) __pop ( 0 );
+ int end_idx = ( int ) __pop ( 0 );
+ pthread_barrier_t * bar = ( pthread_barrier_t * ) __pop ( 0 );
+
+
+
+
+
+
+
+
+
// receive start signal
+ __pop ( 0 );
+
+ // perform actual computation
+ for ( int i = start_idx ; i <= end_idx ; ++ i ) {
+ c [ i ] += a [ i ] + b [ i ];
+ }
+
+ // synchronize with manager
+ __barrier_wait ( bar );
+
+ return NULL ;
+} // end of work()
+
+
+
+
+
+
+
+
+
Verify Functionality of Emulated Code¶
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/emu
+/home/subh/research/hetsim-rel/emu/build
+Scanning dependencies of target hetsim_prim
+[ 50%] Building CXX object CMakeFiles/hetsim_prim.dir/src/util.cpp.o
+[100%] Linking CXX shared library libhetsim_prim.so
+[100%] Built target hetsim_prim
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/example/app
+/home/subh/research/hetsim-rel/example/app/build
+MODE set to EMU
+Processing application: serial_factorial
+Processing application: vector_add
+Processing application: workq_mutex
+Scanning dependencies of target workq_mutex
+[ 16%] Building CXX object CMakeFiles/workq_mutex.dir/src/workq_mutex.cpp.o
+[ 33%] Linking CXX executable workq_mutex
+[ 33%] Built target workq_mutex
+Scanning dependencies of target serial_factorial
+[ 50%] Building CXX object CMakeFiles/serial_factorial.dir/src/serial_factorial.cpp.o
+[ 66%] Linking CXX executable serial_factorial
+[ 66%] Built target serial_factorial
+Scanning dependencies of target vector_add
+[ 83%] Building CXX object CMakeFiles/vector_add.dir/src/vector_add.cpp.o
+[100%] Linking CXX executable vector_add
+[100%] Built target vector_add
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/example/app/build
+== Vector Add Test with N = 100000, NUM_WORKER = 8
+== Test Passed ==
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
Simulation on Detailed Model¶
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/example/app
+/home/subh/research/hetsim-rel/example/app/build
+MODE set to SIM
+Processing application: serial_factorial
+Processing application: vector_add
+Processing application: workq_mutex
+Scanning dependencies of target m5threads
+[ 12%] Building C object CMakeFiles/m5threads.dir/home/subh/research/hetsim-rel/m5threads/pthread.c.o
+/home/subh/research/hetsim-rel/m5threads/pthread.c: In function ‘pthread_create ’:
+/home/subh/research/hetsim-rel/m5threads/pthread.c:256:3: warning: implicit declaration of function ‘clone ’ [-Wimplicit-function-declaration]
+ clone(__pthread_trampoline, tcb->stack_start_addr, CLONE_VM|CLONE_FS|CLONE_FI
+ ^
+[ 25%] Linking C static library libm5threads.a
+[ 25%] Built target m5threads
+Scanning dependencies of target serial_factorial
+[ 37%] Building CXX object CMakeFiles/serial_factorial.dir/src/serial_factorial.cpp.o
+[ 50%] Linking CXX executable serial_factorial
+[ 50%] Built target serial_factorial
+Scanning dependencies of target workq_mutex
+[ 62%] Building CXX object CMakeFiles/workq_mutex.dir/src/workq_mutex.cpp.o
+[ 75%] Linking CXX executable workq_mutex
+[ 75%] Built target workq_mutex
+Scanning dependencies of target vector_add
+[ 87%] Building CXX object CMakeFiles/vector_add.dir/src/vector_add.cpp.o
+[100%] Linking CXX executable vector_add
+[100%] Built target vector_add
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/scripts
+...
+== Vector Add Test with N = 100000, NUM_WORKER = 8
+warn: PowerState: Already in the requested power state, request ignored
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+warn: User mode does not have SPSR
+== Test Passed ==
+Exiting @ tick 9534907382 because exiting with last active thread context
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
Run Trace Generation¶ We will first make the required changes to enable tracing in the CPP program that we wrote and then run the trace generation step.
+
+
+
+
+
+
+
+
#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)
+#include "hetsim_default_rt.h"
+#endif
+
+void * work ( void * arg ) {
+ unsigned tid = * ( unsigned * )( arg );
+ __register_core_id ( tid );
+#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)
+ __open_trace_log ( tid );
+#endif // AUTO_TRACING || MANUAL_TRACING
+ // ROI begin
+ ...
+ // ROI end
+#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)
+ __close_trace_log ( tid );
+#endif // AUTO_TRACING || MANUAL_TRACING
+}
+
+
+
+
+
+
+
+
+
int main () {
+ ...
+#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)
+ __open_trace_log ( 0 ); // use core-id as argument
+#endif
+ ...
+ __barrier_init ( bar , NUM_WORKER + 1 ); // set number of participants to 1 manager + NUM_WORKER worker PEs
+ ...
+ __dump_reset_stats (); // end recording time here
+ // ----- ROI end -----
+
+#if defined(AUTO_TRACING) || defined(MANUAL_TRACING)
+ __close_trace_log ( 0 );
+#endif // AUTO_TRACING || MANUAL_TRACING
+ ...
+} // end of main()
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/example/app/build
+MODE set to EMU_AUTO_TRACE
+Processing application: serial_factorial
+Processing application: vector_add
+Processing application: workq_mutex
+Scanning dependencies of target workq_mutex
+[ 16%] Building CXX object CMakeFiles/workq_mutex.dir/src/workq_mutex.cpp.o
+/home/subh/research/hetsim-rel/example/app/src/workq_mutex.cpp:36:9: warning: Tracing enabled for this run [-W#pragma-messages]
+#pragma message("Tracing enabled for this run")
+ ^
+ 1 warning generated.
+[ 33%] Linking CXX executable workq_mutex
+[ 33%] Built target workq_mutex
+Scanning dependencies of target serial_factorial
+[ 50%] Building CXX object CMakeFiles/serial_factorial.dir/src/serial_factorial.cpp.o
+/home/subh/research/hetsim-rel/example/app/src/serial_factorial.cpp:35:9: warning: Tracing enabled for this run [-W#pragma-messages]
+#pragma message("Tracing enabled for this run")
+ ^
+ 1 warning generated.
+[ 66%] Linking CXX executable serial_factorial
+[ 66%] Built target serial_factorial
+Scanning dependencies of target vector_add
+[ 83%] Building CXX object CMakeFiles/vector_add.dir/src/vector_add.cpp.o
+[100%] Linking CXX executable vector_add
+[100%] Built target vector_add
+== Vector Add Test with N = 100000, NUM_WORKER = 8
+== Test Passed ==
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
==> traces/pe_0.trace <==
+BARINIT 0xe0101000 9 10
+ST @11 0x1dfdf60 ( )
+ST @12 0x1dfdf64 ( )
+ST @13 0x1dfdf68 ( )
+ST @14 0x1dfdf6c ( )
+ST @15 0x1dfdf70 ( )
+ST @16 0x1dfdf74 ( )
+ST @17 0x1dfdf78 ( )
+ST @18 0x1dfdf7c ( )
+PUSH 1 1
+
+==> traces/pe_1.trace <==
+POP 0 1
+POP 0 1
+POP 0 1
+POP 0 1
+POP 0 1
+POP 0 1
+POP 0 1
+STALL 3 ( )
+LD @1 0x40000000 ( )
+LD @2 0x40061a80 ( 0x40000000 )
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
Run Trace Replay¶ We will now run the generated traces through the TRE-enabled gem5
model.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
/home/subh/research/hetsim-rel/scripts
+...
+TRE[6]: halted @694256563 after completing 100005 trace entries
+Number of TREs IDLE now: 1/9
+TRE[8]: halted @694258565 after completing 100005 trace entries
+Number of TREs IDLE now: 2/9
+TRE[1]: halted @694260567 after completing 100005 trace entries
+Number of TREs IDLE now: 3/9
+TRE[3]: halted @694262569 after completing 100005 trace entries
+Number of TREs IDLE now: 4/9
+TRE[2]: halted @694264571 after completing 100005 trace entries
+Number of TREs IDLE now: 5/9
+TRE[7]: halted @694266573 after completing 100005 trace entries
+Number of TREs IDLE now: 6/9
+TRE[5]: halted @694268575 after completing 100005 trace entries
+Number of TREs IDLE now: 7/9
+TRE[4]: halted @694270577 after completing 100005 trace entries
+Number of TREs IDLE now: 8/9
+TRE[0]: triggered DMPRST
+TRE[0]: halted @694273580 after completing 0 trace entries
+Number of TREs IDLE now: 9/9
+Exiting @ tick 694273580 because all TREs are done
+/home/subh/research/hetsim-rel
+
+
+
+
+
+
+
+
+
+
+
+
Comparison between Detailed and HetSim Runs¶ As the final step, we will compare the runtime between the detailed gem5
run and the TRE-enabled gem5
run.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
752808056
+694044351
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+