From c4d37b2be3da30e2f93a35f17920718f0c85979f Mon Sep 17 00:00:00 2001 From: bingyanglin Date: Tue, 15 Oct 2019 11:30:49 +0800 Subject: [PATCH 1/9] Initial commit --- .../0000-chronicle-module.md | 94 +++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 text/0000-chronicle-module/0000-chronicle-module.md diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md new file mode 100644 index 00000000..ad28aa40 --- /dev/null +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -0,0 +1,94 @@ ++ Feature name: `chronicle-module` ++ Start date: 2019-10-14 ++ RFC PR: [iotaledger/bee-rfcs#xx](https://github.com/iotaledger/bee-rfcs/pull/xx) ++ Bee issue: [iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) + +**To Do**: break 120 characters for each line + +# Summary + +This RFC proposes a `chronicle` module to provide a flexible and powerful framework for storing/accessing historical/new-coming transactions for a long period of time. + +# Motivation + +In current IOTA nodes, old transactions are removed after snapshots, in order to reduce the storage cost. It does not mean, however, the old transactions are valueless in real applications. For data analytics point of view, it is essential to keep all of the historical data (if the cost is affordable) to ensure no information is missed. It is impossible to ensure that the historical data are useless for the target application before the corresponding research/analysis on the historical data is done. Also, to apply machine learning (including deep learning) for model building on the data, a huge amount of data is crucial for training and testing. Hence the chronicle module is important and should be developed. In the following we list some application examples. + +* From the aspect of financial analysis, one may want to build a model to predict the future prices and/or transaction volumes, which needs a large amount of historical data for training and testing. + +* Developers can further improve/enhance the framework design and structure by identifying the weakness of them. + +* A company/organization can collect transaction data and provide customized services. + + +This module provides a framework to implement chronicle in an efficient and robust way, as well as transaction/bundle-related metrics, dashboard, and analytics examples. + +# Detailed design + +The chronicle design should leverage the fundamental crates used in IOTA Bee, including trit/transaction/bundle crates, so as to be consistent with other Bee projects and easy to maintain. This crate mainly focuses on the `runtime` implementation, which makes the transaction storing/retrieving/adding/deleting in the database efficient. + +The high-level of the chronicle event (future type) +we have two options for the flow loop: + +Please note: both options are based on shared-nothing-archiecture: + +**1 -** **channel -> executor -> reactor**, whose specs follow. + +* Channel: + * Unbounded channel per thread. + * Channel enables the thread to receive events in FIFO fashion. + +* Executor + * Runs one operation to collect the events from the channel. + * Has a loop through the collected events. + * The event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. An event contains a tuple (`actor_id`, `msg_function`, `msg`). + * `actor_id` is the task_key in the tasks_map, therefore we will use the actor_id to fetch the task(actor) from the map. + * `msg_function` indicates which function in the actor should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: + ```rust + fn add(msg ,actor_state) -> updated_actor_state{ + msg + actor_state // 1 + 9 + } + ``` + the complexity of the executor in option#1 is ~O(n) where n is the number of actors that are ready to do progress. + +* Reactor + * It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll function. For instance: + ```rust + Async::ready(io_events) -> io_events_list.push(io_events) + ``` + * The reactor should have access to all the sockets in the thread + + + +**2 -** **executor -> reactor**, whose specs follow. + +Note: each actor has its own mpsc channel + +* Executor + * Owns a `vector` with all the actors that belong to the executor's thread + * Has a loop through all the actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is ready to do progress by checking if there are any events in its channel, so this elimiante the need for notifications to wake up given actors, because we are already looping through all the actors. + * The possibility of leverging SIMD/AVX on mutual indepentent actors. + + the complexity of the executor in option#2 is ~O(n/w) where n is the total number of all the actors that belong to a given thread. and w is the instruction weight, in general w=1 , but w=width in case we want to leverage SIMD/AVX. +* Reactor as above. + + +The chronicle framework also provides useful metrics for ease of data analytics. Good examples of these metrices are in [tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescope) in IOTA entangled project. + + +# Drawbacks +- Using this chronicle framework needs cloud to store the historical/new-coming transactions, consuming more power and storage than to maintain a node w/ periodic data deletion in snapshots. + +# Rationale and alternatives +- SyllaDB is adopted as the Chronicle database currently +- The use can define the deletion period of the transactions with specific characteristics. The characteristics contain + - **To Do** + +# Unresolved questions + +- [Options] Which option do you prefer, notifications-based with channel per thread or iteration-like with channel per actor? +- [Executor] Should we implement the actor (top level of a future) using async block, or implementing a custom futures? +- [Reactor] What syscalls we should apply? + - Adopt epoll? (notification based) + - Adopt io_submit/io_getevents? (Batching io_events, where io_submit is blocking because it does the most work) + - Adopt Io_uring? + - Adopt epoll-like syscalls on top of io_uring? From 050ca35eeea0bb04f4b642b727c97db0d16d38e2 Mon Sep 17 00:00:00 2001 From: bingyanglin Date: Tue, 15 Oct 2019 11:45:03 +0800 Subject: [PATCH 2/9] Header names --- text/0000-chronicle-module/0000-chronicle-module.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md index ad28aa40..5c2cf0f6 100644 --- a/text/0000-chronicle-module/0000-chronicle-module.md +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -1,6 +1,6 @@ + Feature name: `chronicle-module` + Start date: 2019-10-14 -+ RFC PR: [iotaledger/bee-rfcs#xx](https://github.com/iotaledger/bee-rfcs/pull/xx) ++ RFC PR: [iotaledger/bee-rfcs#22](https://github.com/iotaledger/bee-rfcs/pull/22) + Bee issue: [iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) **To Do**: break 120 characters for each line From 073f5de7d874399fae1551c2232623f074f3342d Mon Sep 17 00:00:00 2001 From: Yu-Wei Wu Date: Tue, 15 Oct 2019 13:41:44 +0800 Subject: [PATCH 3/9] Word wrapping --- .../0000-chronicle-module.md | 98 ++++++++++--------- 1 file changed, 52 insertions(+), 46 deletions(-) diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md index 5c2cf0f6..48273c8e 100644 --- a/text/0000-chronicle-module/0000-chronicle-module.md +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -1,61 +1,62 @@ -+ Feature name: `chronicle-module` -+ Start date: 2019-10-14 -+ RFC PR: [iotaledger/bee-rfcs#22](https://github.com/iotaledger/bee-rfcs/pull/22) -+ Bee issue: [iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) ++ Feature name: `chronicle-module` + Start date: 2019-10-14 + RFC PR: +[iotaledger/bee-rfcs#22](https://github.com/iotaledger/bee-rfcs/pull/22) + Bee issue: +[iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) **To Do**: break 120 characters for each line # Summary -This RFC proposes a `chronicle` module to provide a flexible and powerful framework for storing/accessing historical/new-coming transactions for a long period of time. +This RFC proposes a `chronicle` module to provide a flexible and powerful framework for storing/accessing +historical/new-coming transactions for a long period of time. # Motivation -In current IOTA nodes, old transactions are removed after snapshots, in order to reduce the storage cost. It does not mean, however, the old transactions are valueless in real applications. For data analytics point of view, it is essential to keep all of the historical data (if the cost is affordable) to ensure no information is missed. It is impossible to ensure that the historical data are useless for the target application before the corresponding research/analysis on the historical data is done. Also, to apply machine learning (including deep learning) for model building on the data, a huge amount of data is crucial for training and testing. Hence the chronicle module is important and should be developed. In the following we list some application examples. +In current IOTA nodes, old transactions are removed after snapshots, in order to reduce the storage cost. It does not +mean, however, the old transactions are valueless in real applications. For data analytics point of view, it is +essential to keep all of the historical data (if the cost is affordable) to ensure no information is missed. It is +impossible to ensure that the historical data are useless for the target application before the corresponding +research/analysis on the historical data is done. Also, to apply machine learning (including deep learning) for model +building on the data, a huge amount of data is crucial for training and testing. Hence the chronicle module is important +and should be developed. In the following we list some application examples. -* From the aspect of financial analysis, one may want to build a model to predict the future prices and/or transaction volumes, which needs a large amount of historical data for training and testing. +* From the aspect of financial analysis, one may want to build a model to predict the future prices and/or transaction +* volumes, which needs a large amount of historical data for training and testing. * Developers can further improve/enhance the framework design and structure by identifying the weakness of them. * A company/organization can collect transaction data and provide customized services. -This module provides a framework to implement chronicle in an efficient and robust way, as well as transaction/bundle-related metrics, dashboard, and analytics examples. +This module provides a framework to implement chronicle in an efficient and robust way, as well as +transaction/bundle-related metrics, dashboard, and analytics examples. # Detailed design -The chronicle design should leverage the fundamental crates used in IOTA Bee, including trit/transaction/bundle crates, so as to be consistent with other Bee projects and easy to maintain. This crate mainly focuses on the `runtime` implementation, which makes the transaction storing/retrieving/adding/deleting in the database efficient. +The chronicle design should leverage the fundamental crates used in IOTA Bee, including trit/transaction/bundle crates, +so as to be consistent with other Bee projects and easy to maintain. This crate mainly focuses on the `runtime` +implementation, which makes the transaction storing/retrieving/adding/deleting in the database efficient. -The high-level of the chronicle event (future type) -we have two options for the flow loop: +The high-level of the chronicle event (future type) we have two options for the flow loop: Please note: both options are based on shared-nothing-archiecture: **1 -** **channel -> executor -> reactor**, whose specs follow. -* Channel: - * Unbounded channel per thread. - * Channel enables the thread to receive events in FIFO fashion. - -* Executor - * Runs one operation to collect the events from the channel. - * Has a loop through the collected events. - * The event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. An event contains a tuple (`actor_id`, `msg_function`, `msg`). - * `actor_id` is the task_key in the tasks_map, therefore we will use the actor_id to fetch the task(actor) from the map. - * `msg_function` indicates which function in the actor should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: - ```rust - fn add(msg ,actor_state) -> updated_actor_state{ - msg + actor_state // 1 + 9 - } - ``` - the complexity of the executor in option#1 is ~O(n) where n is the number of actors that are ready to do progress. +* Channel: Unbounded channel per thread. Channel enables the thread to receive events in FIFO fashion. + +* Executor Runs one operation to collect the events from the channel. Has a loop through the collected events. The +* event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. +* An event contains a tuple (`actor_id`, `msg_function`, `msg`). `actor_id` is the task_key in the tasks_map, therefore +* we will use the actor_id to fetch the task(actor) from the map. `msg_function` indicates which function in the actor +* should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: + ```rust fn add(msg ,actor_state) -> updated_actor_state{ msg + actor_state // 1 + 9 } ``` the complexity of the + executor in option#1 is ~O(n) where n is the number of actors that are ready to do progress. -* Reactor - * It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll function. For instance: - ```rust - Async::ready(io_events) -> io_events_list.push(io_events) - ``` - * The reactor should have access to all the sockets in the thread +* Reactor It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks +* send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll +* function. For instance: + ```rust Async::ready(io_events) -> io_events_list.push(io_events) ``` + * The reactor should have access to all the sockets in the thread @@ -63,32 +64,37 @@ Please note: both options are based on shared-nothing-archiecture: Note: each actor has its own mpsc channel -* Executor - * Owns a `vector` with all the actors that belong to the executor's thread - * Has a loop through all the actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is ready to do progress by checking if there are any events in its channel, so this elimiante the need for notifications to wake up given actors, because we are already looping through all the actors. - * The possibility of leverging SIMD/AVX on mutual indepentent actors. +* Executor Owns a `vector` with all the actors that belong to the executor's thread Has a loop through all the +* actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is +* ready to do progress by checking if there are any events in its channel, so this elimiante the need for notifications +* to wake up given actors, because we are already looping through all the actors. The possibility of leverging SIMD/AVX +* on mutual indepentent actors. - the complexity of the executor in option#2 is ~O(n/w) where n is the total number of all the actors that belong to a given thread. and w is the instruction weight, in general w=1 , but w=width in case we want to leverage SIMD/AVX. + the complexity of the executor in option#2 is ~O(n/w) where n is the total number of all the actors that belong to a + given thread. and w is the instruction weight, in general w=1 , but w=width in case we want to leverage SIMD/AVX. * Reactor as above. -The chronicle framework also provides useful metrics for ease of data analytics. Good examples of these metrices are in [tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescope) in IOTA entangled project. +The chronicle framework also provides useful metrics for ease of data analytics. Good examples of these metrices are in +[tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescope) in IOTA entangled project. # Drawbacks -- Using this chronicle framework needs cloud to store the historical/new-coming transactions, consuming more power and storage than to maintain a node w/ periodic data deletion in snapshots. +- Using this chronicle framework needs cloud to store the historical/new-coming transactions, consuming more power and + storage than to maintain a node w/ periodic data deletion in snapshots. # Rationale and alternatives - SyllaDB is adopted as the Chronicle database currently - The use can define the deletion period of the transactions with specific characteristics. The characteristics contain - - **To Do** + - **To Do** # Unresolved questions -- [Options] Which option do you prefer, notifications-based with channel per thread or iteration-like with channel per actor? +- [Options] Which option do you prefer, notifications-based with channel per thread or iteration-like with channel per + actor? - [Executor] Should we implement the actor (top level of a future) using async block, or implementing a custom futures? - [Reactor] What syscalls we should apply? - - Adopt epoll? (notification based) - - Adopt io_submit/io_getevents? (Batching io_events, where io_submit is blocking because it does the most work) - - Adopt Io_uring? - - Adopt epoll-like syscalls on top of io_uring? + - Adopt epoll? (notification based) + - Adopt io_submit/io_getevents? (Batching io_events, where io_submit is blocking because it does the most work) + - Adopt Io_uring? + - Adopt epoll-like syscalls on top of io_uring? From 726e6acd2269857d9d0357ac8af5bd3a6b0dc00f Mon Sep 17 00:00:00 2001 From: bingyanglin Date: Tue, 15 Oct 2019 22:15:39 +0800 Subject: [PATCH 4/9] Corrrect format --- .../0000-chronicle-module.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md index 48273c8e..0d3c2193 100644 --- a/text/0000-chronicle-module/0000-chronicle-module.md +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -20,7 +20,7 @@ building on the data, a huge amount of data is crucial for training and testing. and should be developed. In the following we list some application examples. * From the aspect of financial analysis, one may want to build a model to predict the future prices and/or transaction -* volumes, which needs a large amount of historical data for training and testing. + volumes, which needs a large amount of historical data for training and testing. * Developers can further improve/enhance the framework design and structure by identifying the weakness of them. @@ -45,16 +45,16 @@ Please note: both options are based on shared-nothing-archiecture: * Channel: Unbounded channel per thread. Channel enables the thread to receive events in FIFO fashion. * Executor Runs one operation to collect the events from the channel. Has a loop through the collected events. The -* event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. + event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. * An event contains a tuple (`actor_id`, `msg_function`, `msg`). `actor_id` is the task_key in the tasks_map, therefore -* we will use the actor_id to fetch the task(actor) from the map. `msg_function` indicates which function in the actor -* should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: + we will use the actor_id to fetch the task(actor) from the map. `msg_function` indicates which function in the actor + should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: ```rust fn add(msg ,actor_state) -> updated_actor_state{ msg + actor_state // 1 + 9 } ``` the complexity of the executor in option#1 is ~O(n) where n is the number of actors that are ready to do progress. * Reactor It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks -* send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll -* function. For instance: + send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll + function. For instance: ```rust Async::ready(io_events) -> io_events_list.push(io_events) ``` * The reactor should have access to all the sockets in the thread @@ -65,10 +65,10 @@ Please note: both options are based on shared-nothing-archiecture: Note: each actor has its own mpsc channel * Executor Owns a `vector` with all the actors that belong to the executor's thread Has a loop through all the -* actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is -* ready to do progress by checking if there are any events in its channel, so this elimiante the need for notifications -* to wake up given actors, because we are already looping through all the actors. The possibility of leverging SIMD/AVX -* on mutual indepentent actors. + actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is + ready to do progress by checking if there are any events in its channel, so this elimiante the need for notifications + to wake up given actors, because we are already looping through all the actors. The possibility of leverging SIMD/AVX + on mutual indepentent actors. the complexity of the executor in option#2 is ~O(n/w) where n is the total number of all the actors that belong to a given thread. and w is the instruction weight, in general w=1 , but w=width in case we want to leverage SIMD/AVX. From a03f14d89c7f6a7235053d5238e03d17fee009e6 Mon Sep 17 00:00:00 2001 From: bingyanglin Date: Tue, 15 Oct 2019 22:15:39 +0800 Subject: [PATCH 5/9] Correct format --- .../0000-chronicle-module.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md index 48273c8e..0d3c2193 100644 --- a/text/0000-chronicle-module/0000-chronicle-module.md +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -20,7 +20,7 @@ building on the data, a huge amount of data is crucial for training and testing. and should be developed. In the following we list some application examples. * From the aspect of financial analysis, one may want to build a model to predict the future prices and/or transaction -* volumes, which needs a large amount of historical data for training and testing. + volumes, which needs a large amount of historical data for training and testing. * Developers can further improve/enhance the framework design and structure by identifying the weakness of them. @@ -45,16 +45,16 @@ Please note: both options are based on shared-nothing-archiecture: * Channel: Unbounded channel per thread. Channel enables the thread to receive events in FIFO fashion. * Executor Runs one operation to collect the events from the channel. Has a loop through the collected events. The -* event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. + event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. * An event contains a tuple (`actor_id`, `msg_function`, `msg`). `actor_id` is the task_key in the tasks_map, therefore -* we will use the actor_id to fetch the task(actor) from the map. `msg_function` indicates which function in the actor -* should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: + we will use the actor_id to fetch the task(actor) from the map. `msg_function` indicates which function in the actor + should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: ```rust fn add(msg ,actor_state) -> updated_actor_state{ msg + actor_state // 1 + 9 } ``` the complexity of the executor in option#1 is ~O(n) where n is the number of actors that are ready to do progress. * Reactor It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks -* send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll -* function. For instance: + send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll + function. For instance: ```rust Async::ready(io_events) -> io_events_list.push(io_events) ``` * The reactor should have access to all the sockets in the thread @@ -65,10 +65,10 @@ Please note: both options are based on shared-nothing-archiecture: Note: each actor has its own mpsc channel * Executor Owns a `vector` with all the actors that belong to the executor's thread Has a loop through all the -* actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is -* ready to do progress by checking if there are any events in its channel, so this elimiante the need for notifications -* to wake up given actors, because we are already looping through all the actors. The possibility of leverging SIMD/AVX -* on mutual indepentent actors. + actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is + ready to do progress by checking if there are any events in its channel, so this elimiante the need for notifications + to wake up given actors, because we are already looping through all the actors. The possibility of leverging SIMD/AVX + on mutual indepentent actors. the complexity of the executor in option#2 is ~O(n/w) where n is the total number of all the actors that belong to a given thread. and w is the instruction weight, in general w=1 , but w=width in case we want to leverage SIMD/AVX. From f8d01ee765aacce7acf03015ceac5addba7e7ab4 Mon Sep 17 00:00:00 2001 From: bingyanglin Date: Tue, 15 Oct 2019 22:20:54 +0800 Subject: [PATCH 6/9] Correct headers --- text/0000-chronicle-module/0000-chronicle-module.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md index 0d3c2193..74a37e5c 100644 --- a/text/0000-chronicle-module/0000-chronicle-module.md +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -1,6 +1,7 @@ -+ Feature name: `chronicle-module` + Start date: 2019-10-14 + RFC PR: -[iotaledger/bee-rfcs#22](https://github.com/iotaledger/bee-rfcs/pull/22) + Bee issue: -[iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) ++ Feature name: `chronicle-module` ++ Start date: 2019-10-14 ++ RFC PR: [iotaledger/bee-rfcs#22](https://github.com/iotaledger/bee-rfcs/pull/22) ++ Bee issue: [iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) **To Do**: break 120 characters for each line From 97c75b437e5bff86f25532d21853df604aa801bb Mon Sep 17 00:00:00 2001 From: bingyanglin Date: Tue, 22 Oct 2019 00:33:22 +0800 Subject: [PATCH 7/9] Add more design details --- .../0000-chronicle-module.md | 131 +++++++++--------- 1 file changed, 68 insertions(+), 63 deletions(-) diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md index 74a37e5c..29519637 100644 --- a/text/0000-chronicle-module/0000-chronicle-module.md +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -1,101 +1,106 @@ -+ Feature name: `chronicle-module` -+ Start date: 2019-10-14 -+ RFC PR: [iotaledger/bee-rfcs#22](https://github.com/iotaledger/bee-rfcs/pull/22) -+ Bee issue: [iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) +- Feature name: `chronicle-module` +- Start date: 2019-10-14 +- RFC PR: [iotaledger/bee-rfcs#22](https://github.com/iotaledger/bee-rfcs/pull/22) +- Bee issue: [iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) -**To Do**: break 120 characters for each line +**TO-DO**: Word Wrapping for 120 characters # Summary -This RFC proposes a `chronicle` module to provide a flexible and powerful framework for storing/accessing -historical/new-coming transactions for a long period of time. +This RFC proposes a `chronicle` module to provide a flexible and powerful framework for storing/accessing historical/incoming transactions for a long period of time. # Motivation -In current IOTA nodes, old transactions are removed after snapshots, in order to reduce the storage cost. It does not -mean, however, the old transactions are valueless in real applications. For data analytics point of view, it is -essential to keep all of the historical data (if the cost is affordable) to ensure no information is missed. It is -impossible to ensure that the historical data are useless for the target application before the corresponding -research/analysis on the historical data is done. Also, to apply machine learning (including deep learning) for model -building on the data, a huge amount of data is crucial for training and testing. Hence the chronicle module is important -and should be developed. In the following we list some application examples. +In current IOTA nodes, old transactions are removed after snapshots, in order to reduce the storage cost. It does not mean, however, the old transactions are valueless in real applications. For data analytics point of view, it is essential to keep all of the historical data (if the cost is affordable) to ensure no information is missed. It is impossible to ensure that the historical data are useless for the target application before the corresponding research/analysis on the historical data is done. Also, to apply machine learning (including deep learning) for model building on the data, a huge amount of data is crucial for training and testing. Hence the chronicle module is important and should be developed. In the following we list some application examples. -* From the aspect of financial analysis, one may want to build a model to predict the future prices and/or transaction - volumes, which needs a large amount of historical data for training and testing. +- From the aspect of financial analysis, one may want to build a model to predict the future prices and/or transaction volumes, which needs a large amount of historical data for training and testing. +- Developers can further improve/enhance the framework design and structure by identifying the weakness of them. +- A company/organization can collect transaction data and provide customized services. -* Developers can further improve/enhance the framework design and structure by identifying the weakness of them. +This module provides a framework to implement chronicle in an efficient and robust way, as well as transaction/bundle-related metrics, dashboard, and analytics examples. -* A company/organization can collect transaction data and provide customized services. +# Detailed design +The chronicle design should leverage the fundamental crates used in IOTA Bee, including transaction/bundle crates, so as to be consistent with other Bee projects, efficient to perform operations, and easy to maintain. The chronicle module subscribes the unconfirmed transactions received by ledger node(s) (achieved by [gossip crates](https://To-DO)), filter out unnecessary transactions, and then store the transactions into cloud databases (ScyllaDB is adopted in our first version). The user must define the _time to live_ (TTL) for different transaction categories, which is defined as how many seconds the transaction should be reserved, after that the transactions will be deleted automatically. If TTL is defined as 0, then the transactions of the category will be reserved until deletion operation is issued by the user. The TTL ranges from 0 to 630,720,000 seconds (20 years). -This module provides a framework to implement chronicle in an efficient and robust way, as well as -transaction/bundle-related metrics, dashboard, and analytics examples. +The filter/TTL behavior should be based on the transaction categories, which are classified by -# Detailed design +- Confirmed/unconfirmed transactions +- Valid/invalid transactions after verification +- Zero/non-zero value transactions + +This crate mainly focuses on the `runtime` implementation, which makes the transaction storing/retrieving/adding/deleting in the database efficient. -The chronicle design should leverage the fundamental crates used in IOTA Bee, including trit/transaction/bundle crates, -so as to be consistent with other Bee projects and easy to maintain. This crate mainly focuses on the `runtime` -implementation, which makes the transaction storing/retrieving/adding/deleting in the database efficient. +The high-level of the chronicle event (has `future` trait) +we have two options for the flow loop: -The high-level of the chronicle event (future type) we have two options for the flow loop: +Please note: both options are based on shared-nothing-architecture: -Please note: both options are based on shared-nothing-archiecture: +# Runtime Design Option 1: channel -> executor -> reactor -**1 -** **channel -> executor -> reactor**, whose specs follow. +**Channel** -* Channel: Unbounded channel per thread. Channel enables the thread to receive events in FIFO fashion. +- Unbounded channel per thread. +- Channel enables the thread to receive events in FIFO fashion. -* Executor Runs one operation to collect the events from the channel. Has a loop through the collected events. The - event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. -* An event contains a tuple (`actor_id`, `msg_function`, `msg`). `actor_id` is the task_key in the tasks_map, therefore - we will use the actor_id to fetch the task(actor) from the map. `msg_function` indicates which function in the actor - should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: - ```rust fn add(msg ,actor_state) -> updated_actor_state{ msg + actor_state // 1 + 9 } ``` the complexity of the - executor in option#1 is ~O(n) where n is the number of actors that are ready to do progress. - -* Reactor It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks - send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll - function. For instance: - ```rust Async::ready(io_events) -> io_events_list.push(io_events) ``` - * The reactor should have access to all the sockets in the thread +**Executor** +- Runs one operation to collect the events from the channel. +- Has a loop through the collected events. +- The event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. An event contains a tuple (`actor_id`, `msg_function`, `msg`). + - `actor_id` is the task_key in the tasks_map, therefore we will use the actor_id to fetch the task(actor) from the map. + - `msg_function` indicates which function in the actor should be executed with the params (`msg`, `actor_state`). + A toy example of adding msg and actor_state: + ```rust + fn add(msg ,actor_state) -> updated_actor_state{ + msg + actor_state // 1 + 9 + } + ``` + - the complexity of the executor in Option 1 is ~O(n) where n is the number of actors that are ready to do progress. +**Reactor** -**2 -** **executor -> reactor**, whose specs follow. +- It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll function. For instance: + ```rust + Async::ready(io_events) -> io_events_list.push(io_events) + ``` +- The reactor should have access to all the sockets in the thread + +# Runtime Design Option 2: channel -> executor -> reactor Note: each actor has its own mpsc channel -* Executor Owns a `vector` with all the actors that belong to the executor's thread Has a loop through all the - actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is - ready to do progress by checking if there are any events in its channel, so this elimiante the need for notifications - to wake up given actors, because we are already looping through all the actors. The possibility of leverging SIMD/AVX - on mutual indepentent actors. +**Executor** - the complexity of the executor in option#2 is ~O(n/w) where n is the total number of all the actors that belong to a - given thread. and w is the instruction weight, in general w=1 , but w=width in case we want to leverage SIMD/AVX. -* Reactor as above. +- Owns a `vector` with all the actors that belong to the executor's thread +- Has a loop through all the actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is ready to do progress by checking if there are any events in its channel, so this eliminate the need for notifications to wake up given actors, because we are already looping through all the actors. +- The possibility of leveraging SIMD/AVX on mutual independent actors. +- the complexity of the executor in Option 2 is ~O(n/w) where n is the total number of all the actors that belong to a given thread. and w is the instruction weight, in general w=1, but w=width in case we want to leverage SIMD/AVX. +**Reactor** -The chronicle framework also provides useful metrics for ease of data analytics. Good examples of these metrices are in -[tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescope) in IOTA entangled project. +- as above. +The chronicle framework also provides useful metrics for ease of data analytics. Good examples of these metrics are in [tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescope) in IOTA entangled project. # Drawbacks -- Using this chronicle framework needs cloud to store the historical/new-coming transactions, consuming more power and - storage than to maintain a node w/ periodic data deletion in snapshots. + +- Using this chronicle framework needs cloud to store the historical/new-coming transactions, consuming more power and storage than to maintain a node w/ periodic data deletion in snapshots. # Rationale and alternatives -- SyllaDB is adopted as the Chronicle database currently + +- ScyllaDB is adopted as the Chronicle database currently - The use can define the deletion period of the transactions with specific characteristics. The characteristics contain - - **To Do** + - **To Do** # Unresolved questions -- [Options] Which option do you prefer, notifications-based with channel per thread or iteration-like with channel per - actor? +- Is TTL range needed to be modified? +- Should the runtime details be separated to different RFCs? +- [Options] Which option do you prefer, notifications-based with channel per thread or iteration-like with channel per actor? - [Executor] Should we implement the actor (top level of a future) using async block, or implementing a custom futures? -- [Reactor] What syscalls we should apply? - - Adopt epoll? (notification based) - - Adopt io_submit/io_getevents? (Batching io_events, where io_submit is blocking because it does the most work) - - Adopt Io_uring? - - Adopt epoll-like syscalls on top of io_uring? +- [Reactor] What syscalls we should apply? + - Adopt epoll? (notification based) + - Adopt io_submit/io_getevents? (Batching io_events, where io_submit is blocking because it does the most work) + - Adopt Io_uring? + - Adopt epoll-like syscalls on top of io_uring? From 4534de7681332157cccccaa858f886e8692dcc20 Mon Sep 17 00:00:00 2001 From: bingyanglin Date: Wed, 23 Oct 2019 19:39:03 +0800 Subject: [PATCH 8/9] Change format --- .../0000-chronicle-module.md | 65 +++++++++++++------ 1 file changed, 44 insertions(+), 21 deletions(-) diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md index 29519637..98235be6 100644 --- a/text/0000-chronicle-module/0000-chronicle-module.md +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -7,21 +7,36 @@ # Summary -This RFC proposes a `chronicle` module to provide a flexible and powerful framework for storing/accessing historical/incoming transactions for a long period of time. +This RFC proposes a `chronicle` module to provide a flexible and powerful framework for storing/accessing +historical/incoming transactions for a long period of time. # Motivation -In current IOTA nodes, old transactions are removed after snapshots, in order to reduce the storage cost. It does not mean, however, the old transactions are valueless in real applications. For data analytics point of view, it is essential to keep all of the historical data (if the cost is affordable) to ensure no information is missed. It is impossible to ensure that the historical data are useless for the target application before the corresponding research/analysis on the historical data is done. Also, to apply machine learning (including deep learning) for model building on the data, a huge amount of data is crucial for training and testing. Hence the chronicle module is important and should be developed. In the following we list some application examples. - -- From the aspect of financial analysis, one may want to build a model to predict the future prices and/or transaction volumes, which needs a large amount of historical data for training and testing. -- Developers can further improve/enhance the framework design and structure by identifying the weakness of them. -- A company/organization can collect transaction data and provide customized services. - -This module provides a framework to implement chronicle in an efficient and robust way, as well as transaction/bundle-related metrics, dashboard, and analytics examples. +In current IOTA nodes, old transactions are removed after snapshots, in order to reduce the storage cost. It does not +mean, however, the old transactions are valueless in real applications. For data analytics point of view, it is +essential to keep all of the historical data (if the cost is affordable) to ensure no information is missed. It is +impossible to ensure that the historical data are useless for the target application before the corresponding +research/analysis on the historical data is done. Also, to apply machine learning (including deep learning) for model +building on the data, a huge amount of data is crucial for training and testing. Hence the chronicle module is +important and should be developed. In the following we list some application examples: 1) From the aspect of +financial analysis, one may want to build a model to predict the future prices and/or transaction volumes, which +needs a large amount of historical data for training and testing. 2) Developers can further improve/enhance the +framework design and structure by identifying the weakness of them. 3) A company/organization can collect transaction +data and provide customized services. + +This module provides a framework to implement chronicle in an efficient and robust way, as well as +transaction/bundle-related metrics, dashboard, and analytics examples. # Detailed design -The chronicle design should leverage the fundamental crates used in IOTA Bee, including transaction/bundle crates, so as to be consistent with other Bee projects, efficient to perform operations, and easy to maintain. The chronicle module subscribes the unconfirmed transactions received by ledger node(s) (achieved by [gossip crates](https://To-DO)), filter out unnecessary transactions, and then store the transactions into cloud databases (ScyllaDB is adopted in our first version). The user must define the _time to live_ (TTL) for different transaction categories, which is defined as how many seconds the transaction should be reserved, after that the transactions will be deleted automatically. If TTL is defined as 0, then the transactions of the category will be reserved until deletion operation is issued by the user. The TTL ranges from 0 to 630,720,000 seconds (20 years). +The chronicle design should leverage the fundamental crates used in IOTA Bee, including transaction/bundle crates, so +as to be consistent with other Bee projects, efficient to perform operations, and easy to maintain. The chronicle +module subscribes the unconfirmed transactions received by ledger node(s) (achieved by [gossip +crates](https://To-DO)), filter out unnecessary transactions, and then store the transactions into cloud databases +(ScyllaDB is adopted in our first version). The user must define the _time to live_ (TTL) for different transaction +categories, which is defined as how many seconds the transaction should be reserved, after that the transactions will +be deleted automatically. If TTL is defined as 0, then the transactions of the category will be reserved until +deletion operation is issued by the user. The TTL ranges from 0 to 630,720,000 seconds (20 years). The filter/TTL behavior should be based on the transaction categories, which are classified by @@ -29,14 +44,14 @@ The filter/TTL behavior should be based on the transaction categories, which are - Valid/invalid transactions after verification - Zero/non-zero value transactions -This crate mainly focuses on the `runtime` implementation, which makes the transaction storing/retrieving/adding/deleting in the database efficient. +This crate mainly focuses on the `runtime` implementation, which makes the transaction +storing/retrieving/adding/deleting in the database efficient. -The high-level of the chronicle event (has `future` trait) -we have two options for the flow loop: +The high-level of the chronicle event (has `future` trait) we have two options for the flow loop: Please note: both options are based on shared-nothing-architecture: -# Runtime Design Option 1: channel -> executor -> reactor +## Runtime Design Option 1: channel -> executor -> reactor **Channel** @@ -47,7 +62,8 @@ Please note: both options are based on shared-nothing-architecture: - Runs one operation to collect the events from the channel. - Has a loop through the collected events. -- The event’s data-structure enables the executor to fetch the right task which is ready to do progress from the tasks_map. An event contains a tuple (`actor_id`, `msg_function`, `msg`). +- The event’s data-structure enables the executor to fetch the right task which is ready to do progress from the + tasks_map. An event contains a tuple (`actor_id`, `msg_function`, `msg`). - `actor_id` is the task_key in the tasks_map, therefore we will use the actor_id to fetch the task(actor) from the map. - `msg_function` indicates which function in the actor should be executed with the params (`msg`, `actor_state`). A toy example of adding msg and actor_state: @@ -60,32 +76,39 @@ Please note: both options are based on shared-nothing-architecture: **Reactor** -- It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks send the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll function. For instance: +- It should receive the io_events from the tasks somehow, we can have another channel per thread where the tasks send + the io_events to it or if possible we return the io_events (i.e., socket.write/read, if any) from the poll function. + For instance: ```rust Async::ready(io_events) -> io_events_list.push(io_events) ``` - The reactor should have access to all the sockets in the thread -# Runtime Design Option 2: channel -> executor -> reactor +## Runtime Design Option 2: channel -> executor -> reactor Note: each actor has its own mpsc channel **Executor** - Owns a `vector` with all the actors that belong to the executor's thread -- Has a loop through all the actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is ready to do progress by checking if there are any events in its channel, so this eliminate the need for notifications to wake up given actors, because we are already looping through all the actors. -- The possibility of leveraging SIMD/AVX on mutual independent actors. -- the complexity of the executor in Option 2 is ~O(n/w) where n is the total number of all the actors that belong to a given thread. and w is the instruction weight, in general w=1, but w=width in case we want to leverage SIMD/AVX. +- Has a loop through all the actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we + can determine if the actor is ready to do progress by checking if there are any events in its channel, so this + eliminate the need for notifications to wake up given actors, because we are already looping through all the actors. +- The possibility of leveraging SIMD/AVX on mutual independent actors. - the complexity of the executor in Option 2 + is ~O(n/w) where n is the total number of all the actors that belong to a given thread. and w is the instruction + weight, in general w=1, but w=width in case we want to leverage SIMD/AVX. **Reactor** - as above. -The chronicle framework also provides useful metrics for ease of data analytics. Good examples of these metrics are in [tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescope) in IOTA entangled project. +The chronicle framework also provides useful metrics for ease of data analytics. Good examples of these metrics are +in [tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescope) in IOTA entangled project. # Drawbacks -- Using this chronicle framework needs cloud to store the historical/new-coming transactions, consuming more power and storage than to maintain a node w/ periodic data deletion in snapshots. +- Using this chronicle framework needs cloud to store the historical/new-coming transactions, consuming more power + and storage than to maintain a node w/ periodic data deletion in snapshots. # Rationale and alternatives From e54c9ee59d43eb24265561ae7076c3dcb2e1bb01 Mon Sep 17 00:00:00 2001 From: bingyanglin Date: Mon, 28 Oct 2019 16:42:05 +0800 Subject: [PATCH 9/9] Add more descriptions and definitions --- .../0000-chronicle-module/0000-chronicle-module.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/text/0000-chronicle-module/0000-chronicle-module.md b/text/0000-chronicle-module/0000-chronicle-module.md index 98235be6..ac827f80 100644 --- a/text/0000-chronicle-module/0000-chronicle-module.md +++ b/text/0000-chronicle-module/0000-chronicle-module.md @@ -3,8 +3,6 @@ - RFC PR: [iotaledger/bee-rfcs#22](https://github.com/iotaledger/bee-rfcs/pull/22) - Bee issue: [iotaledger/bee#64](https://github.com/iotaledger/bee/issues/64) -**TO-DO**: Word Wrapping for 120 characters - # Summary This RFC proposes a `chronicle` module to provide a flexible and powerful framework for storing/accessing @@ -46,10 +44,12 @@ The filter/TTL behavior should be based on the transaction categories, which are This crate mainly focuses on the `runtime` implementation, which makes the transaction storing/retrieving/adding/deleting in the database efficient. +In chronicle runtime, **scheduler** represents the one runs inside a single thread and handles the execution flow of +(channel -> executor -> reactor), and **ring** represents [scylla-visualized-ring](https://docs.scylladb.com/architecture/ringarchitecture/). The high-level of the chronicle event (has `future` trait) we have two options for the flow loop: -Please note: both options are based on shared-nothing-architecture: +Please note: both options are based on shared-nothing-architecture ## Runtime Design Option 1: channel -> executor -> reactor @@ -90,7 +90,7 @@ Note: each actor has its own mpsc channel **Executor** -- Owns a `vector` with all the actors that belong to the executor's thread +- Owns a `vector` with all the actors that belong to the executor's thread. - Has a loop through all the actors using SIMD/AVX(if possible) to check on which actors are ready to do progress, we can determine if the actor is ready to do progress by checking if there are any events in its channel, so this eliminate the need for notifications to wake up given actors, because we are already looping through all the actors. @@ -100,7 +100,7 @@ Note: each actor has its own mpsc channel **Reactor** -- as above. +- As above. The chronicle framework also provides useful metrics for ease of data analytics. Good examples of these metrics are in [tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescope) in IOTA entangled project. @@ -120,6 +120,10 @@ in [tanglescope](https://github.com/iotaledger/entangled/tree/develop/tanglescop - Is TTL range needed to be modified? - Should the runtime details be separated to different RFCs? +- How do we guarantee/verify that all txs are collected in Chronicle nodes? (Subscribe redundant IOTA nodes?) +- Do we need extra mechanism to verify that no txs are lost? +- How do we detect that some txs are lost? +- How do we recollect the lost txs when exceptions occur (e.g., zmq disconnected) - [Options] Which option do you prefer, notifications-based with channel per thread or iteration-like with channel per actor? - [Executor] Should we implement the actor (top level of a future) using async block, or implementing a custom futures? - [Reactor] What syscalls we should apply?