This project can read Alibaba's publicly available microservice traces dataset and then create windowed graphs from these data.
Due to very efficient use of parallelization, It can efficiently process terabytes of data in a single personal computer.
The Rust code can be used to build binary executables, which can later be called from python. It's a similar implementation like Numpy or Pandas work internally.
- Python, Pip and Pipenv (virtual environment manager for Python).
- Rust and Cargo.
- Jupyter Notebook (automatically gets installed as a pipenv dependency).
- Clone the repository.
- In the project directory run
pipenv install
. - Get into the virtual environment by running
pipenv shell
. - To build the Rust project run
cargo build
. - To build and install the executables for Python run
maturin develop
.
This project supports three environment variables.
FILE_DURATION_IN_SECONDS
- The duration of each Alibaba log file in seconds. This is 180 seconds (3 minutes) and this is fixed by Alibaba. The default value is set to 180. We will not change it.RAW_TRACE_DIR
- The directory where raw trace files downloaded from Alibaba server will get stored. By default this is<project_root>/data/raw
. But we can override this to any directory inside the env files.WINDOWS_DIR
- The directory where the processed window files will get stored. By default this is<project_root>/data/windows
. But we can override this to any directory inside the env files.
To update RAW_TRACE_DIR
and WINDOWS_DIR
without adding files in git copy the .env
file to a new file named .env.local
. Then create entries for RAW_TRACE_DIR
and / or WINDOWS_DIR
. The directories in the env
files have to be absolute directories.
This project can run two commands.
load
- This command downloads the appropriate trace files from Alibaba server, windows the data and stores the windows in disk.
It supports few options.
start_time
- Start time in seconds to load the trace data (starting from 0).end_time
- End time in seconds to load the trace data (starting from 0).
If we provide start_time
and end_time
, then the breakdown time parameters can be ignored.
start_day
- Start day to load the trace data (0 - 12).start_hour
- Start hour to load the trace data (0 - 23).start_minute
- Start minute to load the trace data (0 - 59).start_second
- Start second to load the trace data (0 - 59).end_day
- End day to load the trace data (0 - 13, exclusive).end_hour
- End hour to load the trace data (0 - 24, exclusive).end_minute
- End minute to load the trace data (0 - 60, exclusive).start_second
- Start second to load the trace data (0 - 60, exclusive).
Other parameters are,
connection_prop
- Can be eitherinstance_id
ormicroservice_id
. Ifinstance_id
, then graph edges are connected by individual instance ids, otherwise edges will be connected by microservice ids. Default ismicroservice_id
.window_indexing_type
- Can befrom_zero
orseq_from_start
. Suppose we are importing fromstart_hour=1
andwindow_size=60
. Ifwindow_indexing_type
isfrom_zero
, then the first window index will be 0. If it isseq_from_start
then the first window index will be 60. Default isseq_from_start
.window_size
<required>
- The window size in seconds.
process
- This command processes the produced windows and run some operation
on them to get some results.
It supports one option.
op
required
- The operation we want to run on each window. The list is not exhaustive, because we will keep adding more operations. But this should be a valid operation key.
cargo run load start_time=180 end_time=540 window_size=60 connection_prop=instance_id window_indexing_type=seq_from_start
cargo run load start_day=1 end_day=2 window_size=30 window_indexing_type=from_zero
cargo run load start_day=1 start_hour=9 end_day=2 end_minute=9 window_size=60
cargo run process op=average_degree
This project has both application and library modes. Above-mentioned commands run in application mode. But this project has a PyO3 interface with a function named run_op
, which takes an op identifier as argument. After compilation the produced executable can be used to call this function from Python code.
Maturin
is a tool which makes PyO3 compilation easy for Rust code. In this project Maturin is installed by Pipenv. From the project root directory, we can run maturin develop
to build and install this project as a Python dependency. After running this, we can run import alibaba_graph_rust
from Python codes inside graph_py/
.