Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FLIP 204 - Smart-Contract-Specified Epoch Switchover Timing #204

Merged
merged 7 commits into from
Oct 20, 2023
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions protocol/20231003-service-account-specified-epoch-switchover.md.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
---
status: draft
flip: 204 (set to the issue number)
authors: Jordan Schalm ([email protected])
sponsor: Jordan Schalm ([email protected])
updated: 2023-10-03
---

# FLIP 204: Smart-Contract-Specified Epoch Switchover Timing

## Objective

- Increase robustness of Cruise Control System
- Create an explicit target time for epoch switchover, defined by the service account

## Motivation

[Cruise Control: Automated Block Rate & Epoch Timing (Design)](https://www.notion.so/Cruise-Control-Automated-Block-Rate-Epoch-Timing-Design-4dbcb0dab1394fc7b91966d7d84ad48d?pvs=21) defines an existing system for controlling system block production to achieve a target *block rate*, in turn to achieve a target *epoch switchover time*. This system has been deployed on Mainnet since May.

At the time of writing, the target epoch switchover time is inferred based on a baked-in assumption of week-long epochs, and a configurable weekly switchover time. Therefore, each node’s **Process Variable** (switchover time) is determined by a heuristic, which has several downsides:

- In extreme edge cases (timing off by several days), different nodes may disagree about the target Process Variable value.
- Networks with different-length epochs (Canary, Testnet) can not use Cruise Control at all.

## User Benefit

- Chain-queriable target epoch switchover time
- Increased robustness of Cruise Control System (reliable and consistent epoch and block timing)

## Design Proposal


- The `FlowEpoch` smart contract determines and broadcasts a `TargetEndTime` for each epoch, within the `EpochSetup` event.
- The `cruisectl.BlockTimeController` component reads this `TargetEndTime` and uses it as the Process Variable value for its PID controller, rather than the current heuristic method.

### `TargetEndTime` Definition

Below are two options for how to configure and compute the `TargetEndTime`. Overall the author is in favour of **Option 2.**


#### Option 1: Duration-Only

The configuration consists only of the epoch duration. Each epoch’s `TargetEndTime` is computed based on a reference time/view pair obtained via the `getBlock` API.

```swift
pub struct EpochTimingConfig {
duration: UInt64 // in seconds
}
```

```swift
// Compute the target switchover time based on the current time/view.
// Invoked when transitioning into the EpochSetup phase.
pub fun getTargetEndTimeForEpoch(
curBlock: Block,
epoch: EpochMetadata,
config EpochTimingConfig,
): UInt64 {
let now = curBlock.timestamp
let viewsToEpochEnd = nextEpoch.finalView - curBlock.view
let estSecondsToNextEpochEnd = UFix64(viewsToNextEpochEnd) / UFix64(nextEpoch.lengthInViews) * config.duration
return UInt64(estSecondsToNextEpochEnd)
}
```

```swift
// Memorize the end time of each epoch.
// Invoked when transitioning into a new epoch.
pub fun memorizeEpochEndTime(curBlock: Block, epoch: EpochMetadata) {
epoch.endedAt = curBlock.timestamp
}

// Compute the switchover time based on the last memorized reference timestamp.
pub fun getTargetEndTimeForEpoch(
refEpoch: EpochMetadata,
targetEpochCounter: UInt64,
config EpochTimingConfig,
): UInt64 {
return refEpoch.endedAt + config.duration * (targetEpochCounter-refEpoch.counter)
}
```

##### Pros

- Simpler configuration
- Does not require manual config changes to account for a durable switchover timing change.

##### Cons

- Drift can accumulate over time
- **Approach 1.2** Does not work well with `resetEpoch` process, as that involves an epoch transition at a non-target time
- Depends on block time API
- Approach 2 requires additional storage/logic in smart contract changes

#### Option 2: Duration & Reference Switchover

The configuration consists of the epoch duration and a reference counter/end-time pair. Each epoch’s `TargetEndTime` is computed solely based on the target epoch’s counter, the reference counter/end-time pair, and the duration.

```swift
pub struct EpochTimingConfig {
duration: UInt64 // in seconds
refCounter: UInt64 // the counter of a reference epoch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the reference epoch chosen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter. We can pick any previous epoch transition. Initially we would just pick the most recent one.

The same reference epoch can be used for as long as we want, so long as there is no discontinuity in epoch timing (very long period of downtime or epoch fallback mode being active for an extended period). If that happens, then we would pick a new reference epoch (the first one after the discontinuity).

refTimestamp: UInt64 // the UNIX timestamp (UTC) at which refCounter ended
}
```

```swift
// Compute target switchover time based on offset from reference counter/switchover.
pub fun getTargetEndTimeForEpoch(
targetEpochCounter: UInt64,
config EpochTimingConfig,
): UInt64 {
return config.refTimestamp + config.duration * (targetEpochCounter-refCounter)
}
```

##### Pros

- Simple computation
- Drift cannot accumulate over time
- Does not use block time API
- Compatible with `resetEpoch` process

##### Cons

- More complex configuration specification
- Requires manual config changes for durable switchover time changes


### Implementation Plan

#### Smart Contract

- Add `targetEndTime` field to `EpochSetup` event, `EpochMetadata`
- Add config for determining `targetEndTime` to smart contract `ConfigMetadata`
- Add logic to compute `targetEndTime` to `startEpochSetup`
- Add function for service account to adjust new config
- Testing
- Validate field is set as expected
- Validate field is computed correctly
- Validate setter/getter for new config values

#### Core Protocol

- Add `TargetEndTime` field to `EpochSetup` event, `Epoch` API
- Update `EpochSetup` service event conversion function
- Read `TargetEndTime` field
- Ensure conversion is backward-compatible
- Update `cruisectl.BlockRateController`
- Remove `EpochTransitionTime` inference heuristic
- Replace `EpochTransitionTime` with `time.Duration`, retrieved from `EpochSetup` event
- Add mechanism to set `TargetEndTime` in bootstrapping/sporking process
- *Comment: currently Cruise Control is disabled by default.*
- Option 1: Add an optional flag to explicitly the desired epoch duration (seconds). We can compute a reference counter/timestamp.
- Option 2: Compute an initial `duration` config value, based on the committee size and epoch length and expected view rate.
- Update network instantiation
- Default value for deploying `FlowEpoch`

### Deployment Plan

As usual, deploy to Canary → Testnet → Mainnet

1. Upgrade `FlowEpoch`

<aside>
⚠️ Caution: First, ensure service event conversion logic is tolerant of additional fields (ignores additional fields)

</aside>

2. Upgrade Consensus Nodes

*Comment: Since we are modifying the EpochSetup model, this will likely require a spork.*



### Drawbacks

This proposal assumes that the service account is more reliable than the heuristic currently in use for determining target epoch switchover times.
Many more important system functions already depend on correct operation of the service account.
However, Cruise Control will become susceptible to faults in the service account's defined switchover time, rather than faults in the current heuristic.

### Alternatives Considered

See 2 options above.

### Performance Implications

None anticipated.

### Dependencies

There are internal dependencies that will need to be updated in lockstep as part of this FLIP. No external dependencies.

### Engineering Impact

* Do you expect changes to binary size / build time / test times?
* Who will maintain this code? Is this code in its own buildable unit?
Can this code be tested in its own?
Is visibility suitably restricted to only a small API surface for others to use?

### Best Practices

N/A.

### Tutorials and Examples

N/A.

### Compatibility

The change will not break compatibility.

### User Impact

N/A.

## Related Issues

N/A.

## Prior Art

N/A.

## Questions and Discussion Topics

- Do you prefer Option 1 or Option 2?
- Do you foresee any significant hurdles beyond those outlined in the Implementation Plan?

### Q&A
#### What happens during a `resetEpoch`?

In ****************Option 1.1**************** and **2**, the timing of a particular epoch transition does not affect the target timing for other epochs. Therefore, the `TargetEndTime` computation of an epoch during a spork will not behave differently from any other epoch.

#### Why do we set `TargetEndTime` rather than `TargetStartTime`?

The time information is specified in the `EpochSetup` event, which occurs partway through the current epoch. If we specified a `TargetStartTime`, then the PID Controller’s Process Variable would have an undefined value for part of the epoch, and the Cruise Control system would be unable to function. On Mainnet, this corresponds to about 90% of the duration of an epoch.