Skip to content

Commit

Permalink
Merge branch 'jennifer/2624-add-a-health-indicator-for-testnet-on-ten…
Browse files Browse the repository at this point in the history
…scan-and-the-gateway' of https://github.com/ten-protocol/go-ten into jennifer/health-check-ui-on-tenscan
  • Loading branch information
Jennievon committed Dec 29, 2023
2 parents e71c46e + d87f71a commit 1e681f3
Show file tree
Hide file tree
Showing 24 changed files with 141 additions and 37 deletions.
2 changes: 1 addition & 1 deletion contracts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Running the following command will regenerate the bindings in the `generated` di
npx hardhat generate-abi-bindings --output-dir generated
```

The command internally uses the abi and bytecode exporter plugins and searches the path configured in their configs for exporting for relevant files in order to launch the `abigen` executable with the correct paramaters. More info on installing `abigen` can be found [here](https://geth.ethereum.org/docs/dapp/abigen)
The command internally uses the abi and bytecode exporter plugins and searches the path configured in their configs for exporting for relevant files in order to launch the `abigen` executable with the correct parameters. More info on installing `abigen` can be found [here](https://geth.ethereum.org/docs/dapp/abigen)


Additionally you can pass the `noCompile` flag which will disable running the contract compilation beforehand. This allows to build go bindings for abi/bins where the actual solidity source files are missing.
Expand Down
2 changes: 1 addition & 1 deletion design/bridge/bridge_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ When a transaction on the `L2` results in `LogMessagePublished`, the event will
### Alternative approaches

1. Ten only ever pushes the hash of the message. The user has the responsibility of providing the full message which will only be accepted if it matches one of the hashes, if neccessary.
1. Ten only ever pushes the hash of the message. The user has the responsibility of providing the full message which will only be accepted if it matches one of the hashes, if necessary.
* This simplifies gas cost calculations, but the problem described in the `Fees` section remains.
* Contracts can hash their messages before passing them to the `MessageBus` and achieve nearly the same outcome if they want to.
2. Ten only pushes to L2. Messages on L1 are provided signed by the enclave through an RPC and the MessageBus contract verifies that they have been signed by a correct enclave-owned key.
Expand Down
4 changes: 2 additions & 2 deletions design/scratchpad/Design_escape_hatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ It describes the ultimate way by which users can exit Ten.
## High level requirement

The "escape hatch" is the mechanism that kicks in when **all hope is lost**, the last resort.
Something that happens just before the network goes down permanently for some unforseen reason.
Something that happens just before the network goes down permanently for some unforeseen reason.

For example, wen the central sequencer is no longer able to produce blocks and the Ten foundation is unable to replace it with something working in due time.

Expand All @@ -29,7 +29,7 @@ The "Escape mode" will be a flag on the Management contract, that can be set und

### Assumptions

There will be at least one node in posession of the master seed that is able to publish it when this event is triggered.
There will be at least one node in possession of the master seed that is able to publish it when this event is triggered.


## High level overview of the solution
Expand Down
2 changes: 1 addition & 1 deletion design/security/high_availability.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Introducing startup delays does not help too much in this case, because the oper


An alternative solution is to introduce transparency into the lifecycle events of the sequencer enclaves, such that the Ten network
can assess the likelyhood of bad behaviour.
can assess the likelihood of bad behaviour.

Lifecycle events:
- enclave starting up
Expand Down
2 changes: 1 addition & 1 deletion design/ux/Ten_Gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The TG will be a superset of the WE functionality, so this document will only co

The TG will be a [Confidential Web Service](https://medium.com/p/983a2a67fc08), running inside SGX.

The current WE is designed to be used by a single user holding multiple addresses accross potentially multiple wallets.
The current WE is designed to be used by a single user holding multiple addresses across potentially multiple wallets.

The TG must support mutiple users, each with multiple addresses. It can be seen as offering a WE per user.

Expand Down
6 changes: 3 additions & 3 deletions design/ux/user_data_incentives.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,18 +58,18 @@ Node operators could charge fees for this service.
Given that everyone is now expecting this to be a free service, this is unlikely to be something that has a chance.


#### 3. Incentives payed by the protocol.
#### 3. Incentives paid by the protocol.

The network (or protocol) charges fees from user when submitting transactions. This is something that users expect to pay.

The Ten protocol is designed in such a way that it decouples the income from the costs by maintaing a buffer.
The Ten protocol is designed in such a way that it decouples the income from the costs by maintaining a buffer.

We can use this designed mechanism to pay for node usage as well along with the L1 gas fees and the general incentives to follow the protocol.


##### Measuring node usage

As a proxy for a node responding to user requests, we can use a model where a node is payed a percentage of the gas fees that originated from their node.
As a proxy for a node responding to user requests, we can use a model where a node is paid a percentage of the gas fees that originated from their node.

A user that is connected to a node that doesn't respond to requests (like transaction receipts, or events), will leave that node and connect to a different node.

Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This is the Ten Doc Site and it looks like [this](https://docs.obscu.ro/).

1. Clone this repository: https://github.com/ten-protocol/go-ten
2. Create your new content as a Markdown file in the `/docs` folder of the repo. Take care with the folder structure.
As a general rule, new titles in the left hand navigation menu should have their content contained in a seperate
As a general rule, new titles in the left hand navigation menu should have their content contained in a separate
subfolder under docs, for example, `/docs/testnet` contains all the Markdown files relation to the testnet docs.
3. To have this new content shown in the left-hand navigation menu you need to modify the file
`/docs/_data/navigation.yml`. Follow the same format to add new headings and content titles. Remember to specify the
Expand Down
2 changes: 1 addition & 1 deletion docs/_docs/testnet/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -512,7 +512,7 @@

# February 2023-02-23 (v0.10)
* A list of the PRs merged in this release is as below;
* `d81f5f9a` Run a schedule deploy on the l1, and trigger l2 if succesful (#1129)
* `d81f5f9a` Run a schedule deploy on the l1, and trigger l2 if successful (#1129)
* `481dc317` Wrap leveldb so its error types don't leak into our codebase (#1128)
* `e5d8c398` Resilient to rpc requests while sequencer unknown (#1130)
* `aa3eaea2` Updated go version and ego version. (#1124)
Expand Down
35 changes: 35 additions & 0 deletions go/common/async/timestamp.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
package async

import (
"sync"
"time"
)

// Timestamp is a thread safe timestamp
type Timestamp struct {
lastTimestamp time.Time
mutex sync.RWMutex
}

func NewAsyncTimestamp(lastTimestamp time.Time) *Timestamp {
return &Timestamp{
lastTimestamp: lastTimestamp,
mutex: sync.RWMutex{},
}
}

// Mark sets the timestamp with the current time
func (at *Timestamp) Mark() {
at.mutex.Lock()
defer at.mutex.Unlock()
at.lastTimestamp = time.Now()
}

// LastTimestamp returns the last set timestamp
func (at *Timestamp) LastTimestamp() time.Time {
at.mutex.RLock()
defer at.mutex.RUnlock()

newTimestamp := at.lastTimestamp
return newTimestamp
}
28 changes: 23 additions & 5 deletions go/enclave/components/batch_registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,15 @@ import (
"fmt"
"math/big"
"sync"
"time"

"github.com/ethereum/go-ethereum/core/types"
"github.com/ten-protocol/go-ten/go/enclave/storage"

"github.com/ethereum/go-ethereum/core/state"
gethlog "github.com/ethereum/go-ethereum/log"
gethrpc "github.com/ethereum/go-ethereum/rpc"
"github.com/ten-protocol/go-ten/go/common/async"
"github.com/ten-protocol/go-ten/go/common/errutil"
"github.com/ten-protocol/go-ten/go/common/log"
"github.com/ten-protocol/go-ten/go/common/measure"
Expand All @@ -24,8 +26,10 @@ type batchRegistry struct {
logger gethlog.Logger
headBatchSeq *big.Int // keep track of the last executed batch to optimise db access

batchesCallback func(*core.Batch, types.Receipts)
callbackMutex sync.RWMutex
batchesCallback func(*core.Batch, types.Receipts)
callbackMutex sync.RWMutex
healthTimeout time.Duration
lastExecutedBatch *async.Timestamp
}

func NewBatchRegistry(storage storage.Storage, logger gethlog.Logger) BatchRegistry {
Expand All @@ -42,9 +46,11 @@ func NewBatchRegistry(storage storage.Storage, logger gethlog.Logger) BatchRegis
headBatchSeq = headBatch.SeqNo()
}
return &batchRegistry{
storage: storage,
headBatchSeq: headBatchSeq,
logger: logger,
storage: storage,
headBatchSeq: headBatchSeq,
logger: logger,
healthTimeout: time.Minute,
lastExecutedBatch: async.NewAsyncTimestamp(time.Now().Add(-time.Minute)),
}
}

Expand Down Expand Up @@ -75,6 +81,8 @@ func (br *batchRegistry) OnBatchExecuted(batch *core.Batch, receipts types.Recei
if br.batchesCallback != nil {
br.batchesCallback(batch, receipts)
}

br.lastExecutedBatch.Mark()
}

func (br *batchRegistry) HasGenesisBatch() (bool, error) {
Expand Down Expand Up @@ -193,3 +201,13 @@ func (br *batchRegistry) GetBatchAtHeight(height gethrpc.BlockNumber) (*core.Bat
}
return batch, nil
}

// HealthCheck checks if the last executed batch was more than healthTimeout ago
func (br *batchRegistry) HealthCheck() (bool, error) {
lastExecutedBatchTime := br.lastExecutedBatch.LastTimestamp()
if time.Now().After(lastExecutedBatchTime.Add(br.healthTimeout)) {
return false, fmt.Errorf("last executed batch was %s ago", time.Since(lastExecutedBatchTime))
}

return true, nil
}
20 changes: 18 additions & 2 deletions go/enclave/components/block_processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@ package components
import (
"errors"
"fmt"
"time"

"github.com/ten-protocol/go-ten/go/common/async"
"github.com/ten-protocol/go-ten/go/enclave/core"

"github.com/ten-protocol/go-ten/go/enclave/gas"
"github.com/ten-protocol/go-ten/go/enclave/storage"

Expand All @@ -27,7 +28,9 @@ type l1BlockProcessor struct {

// we store the l1 head to avoid expensive db access
// the host is responsible to always submitting the head l1 block
currentL1Head *common.L1BlockHash
currentL1Head *common.L1BlockHash
healthTimeout time.Duration
lastIngestedBlock *async.Timestamp
}

func NewBlockProcessor(storage storage.Storage, cc *crosschain.Processors, gasOracle gas.Oracle, logger gethlog.Logger) L1BlockProcessor {
Expand All @@ -48,6 +51,8 @@ func NewBlockProcessor(storage storage.Storage, cc *crosschain.Processors, gasOr
gasOracle: gasOracle,
crossChainProcessors: cc,
currentL1Head: l1BlockHash,
healthTimeout: time.Minute,
lastIngestedBlock: async.NewAsyncTimestamp(time.Now().Add(-time.Minute)),
}
}

Expand Down Expand Up @@ -77,9 +82,20 @@ func (bp *l1BlockProcessor) Process(br *common.BlockAndReceipts) (*BlockIngestio

h := br.Block.Hash()
bp.currentL1Head = &h
bp.lastIngestedBlock.Mark()
return ingestion, nil
}

// HealthCheck checks if the last ingested block was more than healthTimeout ago
func (bp *l1BlockProcessor) HealthCheck() (bool, error) {
lastIngestedBlockTime := bp.lastIngestedBlock.LastTimestamp()
if time.Now().After(lastIngestedBlockTime.Add(bp.healthTimeout)) {
return false, fmt.Errorf("last ingested block was %s ago", time.Since(lastIngestedBlockTime))
}

return true, nil
}

func (bp *l1BlockProcessor) tryAndInsertBlock(br *common.BlockAndReceipts) (*BlockIngestionType, error) {
block := br.Block

Expand Down
3 changes: 3 additions & 0 deletions go/enclave/components/interfaces.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ type L1BlockProcessor interface {
Process(br *common.BlockAndReceipts) (*BlockIngestionType, error)
GetHead() (*common.L1Block, error)
GetCrossChainContractAddress() *gethcommon.Address
HealthCheck() (bool, error)
}

// BatchExecutionContext - Contains all of the data that each batch depends on
Expand Down Expand Up @@ -102,6 +103,8 @@ type BatchRegistry interface {
HasGenesisBatch() (bool, error)

HeadBatchSeq() *big.Int

HealthCheck() (bool, error)
}

type RollupProducer interface {
Expand Down
18 changes: 16 additions & 2 deletions go/enclave/enclave.go
Original file line number Diff line number Diff line change
Expand Up @@ -1248,9 +1248,23 @@ func (e *enclaveImpl) HealthCheck() (bool, common.SystemError) {
e.logger.Info("HealthCheck failed for the enclave storage", log.ErrKey, err)
return false, nil
}

// todo (#1148) - enclave healthcheck operations
enclaveHealthy := true
return storageHealthy && enclaveHealthy, nil
l1blockHealthy, err := e.l1BlockProcessor.HealthCheck()
if err != nil {
// simplest iteration, log the error and just return that it's not healthy
e.logger.Info("HealthCheck failed for the l1 block processor", log.ErrKey, err)
return false, nil
}

l2batchHealthy, err := e.registry.HealthCheck()
if err != nil {
// simplest iteration, log the error and just return that it's not healthy
e.logger.Info("HealthCheck failed for the l2 batch registry", log.ErrKey, err)
return false, nil
}

return storageHealthy && l1blockHealthy && l2batchHealthy, nil
}

func (e *enclaveImpl) DebugTraceTransaction(txHash gethcommon.Hash, config *tracers.TraceConfig) (json.RawMessage, common.SystemError) {
Expand Down
8 changes: 6 additions & 2 deletions go/enclave/storage/storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -266,10 +266,14 @@ func (s *storageImpl) HealthCheck() (bool, error) {
defer s.logDuration("HealthCheck", measure.NewStopwatch())
headBatch, err := s.FetchHeadBatch()
if err != nil {
s.logger.Info("HealthCheck failed for enclave storage", log.ErrKey, err)
return false, err
}
return headBatch != nil, nil

if headBatch == nil {
return false, fmt.Errorf("head batch is nil")
}

return true, nil
}

func (s *storageImpl) FetchHeadBatchForBlock(blockHash common.L1BlockHash) (*core.Batch, error) {
Expand Down
5 changes: 5 additions & 0 deletions go/enclave/txpool/txpool_mock_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ func (m *mockBatchRegistry) HasGenesisBatch() (bool, error) {
panic("implement me")
}

func (m *mockBatchRegistry) HealthCheck() (bool, error) {
// TODO implement me
panic("implement me")
}

func (m *mockBatchRegistry) HeadBatchSeq() *big.Int {
return m.currentBatch.SeqNo()
}
Expand Down
4 changes: 2 additions & 2 deletions go/obsclient/obsclient.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ func (oc *ObsClient) BatchHeaderByHash(hash gethcommon.Hash) (*common.BatchHeade
return batchHeader, err
}

// HealthStatusOfNode returns the health of the node.
func (oc *ObsClient) HealthStatusOfNode() (bool, error) {
// Health returns the health of the node.
func (oc *ObsClient) Health() (bool, error) {
var healthy *hostcommon.HealthCheck
err := oc.rpcClient.Call(&healthy, rpc.Health)
if err != nil {
Expand Down
2 changes: 1 addition & 1 deletion integration/networktest/env/dev_network.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ func awaitHealthStatus(rpcAddress string, timeout time.Duration) error {
return fmt.Errorf("failed dial host (%s): %w", rpcAddress, err)
}
defer c.Close()
healthy, err := c.HealthStatusOfNode()
healthy, err := c.Health()
if err != nil {
return fmt.Errorf("failed to get host health (%s): %w", rpcAddress, err)
}
Expand Down
2 changes: 1 addition & 1 deletion integration/networktest/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ func NodeHealthCheck(rpcAddress string) error {
if err != nil {
return err
}
health, err := client.HealthStatusOfNode()
health, err := client.Health()
if err != nil {
return err
}
Expand Down
2 changes: 1 addition & 1 deletion integration/simulation/network/socket.go
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ func (n *networkOfSocketNodes) createConnections(simParams *params.SimParams) er
startTime := time.Now()
healthy := false
for ; !healthy; time.Sleep(500 * time.Millisecond) {
healthy, _ = client.HealthStatusOfNode()
healthy, _ = client.Health()
if time.Now().After(startTime.Add(3 * time.Minute)) {
return fmt.Errorf("nodes not healthy after 3 minutes")
}
Expand Down
12 changes: 10 additions & 2 deletions integration/simulation/simulation.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
"github.com/ten-protocol/go-ten/go/common"
"github.com/ten-protocol/go-ten/go/common/errutil"
"github.com/ten-protocol/go-ten/go/common/log"
"github.com/ten-protocol/go-ten/go/common/retry"
"github.com/ten-protocol/go-ten/go/ethadapter"
"github.com/ten-protocol/go-ten/go/wallet"
"github.com/ten-protocol/go-ten/integration/common/testlog"
Expand Down Expand Up @@ -288,8 +289,15 @@ func (s *Simulation) prefundL1Accounts() {

func (s *Simulation) checkHealthStatus() {
for _, client := range s.RPCHandles.ObscuroClients {
if healthy, err := client.HealthStatusOfNode(); !healthy || err != nil {
panic("Client is not healthy: " + err.Error())
err := retry.Do(func() error {
healthy, err := client.Health()
if !healthy || err != nil {
return fmt.Errorf("client is not healthy: %w", err)
}
return nil
}, retry.NewTimeoutStrategy(30*time.Second, 100*time.Millisecond))
if err != nil {
panic(err)
}
}
}
Expand Down
Loading

0 comments on commit 1e681f3

Please sign in to comment.