GitHub - CESNET/cesnet-tszoo: CESNET Ts-Zoo is a toolkit for working with large time series network traffic datasets.

The goal of cesnet-tszoo project is to provide time series datasets with useful tools for preprocessing and reproducibility. Such as:

API for downloading, configuring and loading CESNET-TimeSeries24, CESNET-AGG23 datasets. Each with various sources and aggregations.
Example of configuration options:
- Data can be split into train/val/test sets. Split can be done by time series or by time periods.
- Transforming of data with built-in scalers or with custom scalers.
- Handling missing values built-in fillers or with custom fillers.
Creation and import of benchmarks, for easy reproducibility of experiments.
Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.

Datasets

Name	CESNET-TimeSeries24	CESNET-AGG23
Published in	2025	2023
Collection duration	40 weeks	10 weeks
Collection period	9.10.2023 - 14.7.2024	25.2.2023 - 3.5.2023
Aggregation window	1 day, 1 hour, 10 min	1 min
Sources	CESNET3: Institutions, Institution subnets, IP addresses	CESNET2
Number of time series	Institutions: 849, Institution subnets: 1644, IP addresses: 825372	1
Cite	https://doi.org/10.1038/s41597-025-04603-x	https://doi.org/10.23919/CNSM59352.2023.10327823
Zenodo URL	https://zenodo.org/records/13382427	https://zenodo.org/records/8053021
Related papers

Installation

Install the package from pip with:

pip install cesnet-tszoo

or for editable install with:

pip install -e git+https://github.com/CESNET/cesnet-tszoo

Examples

Initialize dataset to create train, validation, and test dataframes

Using `TimeBasedCesnetDataset` dataset

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
from cesnet_tszoo.configs import TimeBasedConfig

dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False)
config = TimeBasedConfig(
    ts_ids=50, # number of randomly selected time series from dataset
    train_time_period=range(0, 100), 
    val_time_period=range(100, 150), 
    test_time_period=range(150, 250), 
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Time-based datasets are configured with TimeBasedConfig.

Using `SeriesBasedCesnetDataset` dataset

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
from cesnet_tszoo.configs import SeriesBasedConfig

dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, is_series_based=True)
config = SeriesBasedConfig(
    time_period=range(0, 250), 
    train_ts=100, # number of randomly selected time series from dataset
    val_ts=30, # number of randomly selected time series from dataset
    test_ts=20, # number of randomly selected time series from dataset
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Series-based datasets are configured with SeriesBasedConfig.

Using `load_benchmark`

from cesnet_tszoo.benchmarks import load_benchmark

benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset()

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Whether loaded dataset is series-based or time-based depends on the benchmark. What can be loaded corresponds to previous datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
cesnet_tszoo		cesnet_tszoo
docs		docs
tutorial_notebooks		tutorial_notebooks
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datasets

Installation

Examples

Initialize dataset to create train, validation, and test dataframes

Using `TimeBasedCesnetDataset` dataset

Using `SeriesBasedCesnetDataset` dataset

Using `load_benchmark`

Papers

About

Releases

Packages

Contributors 3

Languages

License

CESNET/cesnet-tszoo

Folders and files

Latest commit

History

Repository files navigation

Datasets

Installation

Examples

Initialize dataset to create train, validation, and test dataframes

Using TimeBasedCesnetDataset dataset

Using SeriesBasedCesnetDataset dataset

Using load_benchmark

Papers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Using `TimeBasedCesnetDataset` dataset

Using `SeriesBasedCesnetDataset` dataset

Using `load_benchmark`

Packages