Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(conductor)!: re-fetch genesis or stop at set height #1928

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

SuperFluffy
Copy link
Member

@SuperFluffy SuperFluffy commented Jan 24, 2025

Summary

Makes the changes necessary in the monorepo to migrate Forma to Mainnet.

Background

Migrating Forma to Astria's main net requires introducing stop and start logic into the Conductor. A migration using these changes could look like this:

  1. Running against a fresh rollup node, Conductor starts and fetches the initial astria.execution.v2.GenesisInfo from the rollup using the execution API, which contains rollup_start_number = 1, and rollup_stop_number = 1000.
  2. Conductor executes (syncs) the rollup node from rollup block numbers 1 to 999 (1000 exclusive)
  3. Upon executing rollup number 999, conductor resets its inner state and fetches a new astria.execution.v2.GenesisInfo from the rollup with new information from which it will continue executing.

For cases where rollup_stop_number is unset (or equivalently set to 0), conductor will never reset or halt as there is no stop condition.

For cases where halt_at_rollup_stop_number is set in the genesis info, conductor will exit without refetching a new genesis info.

For conductors running in firm-only or in soft-and-firm modes only the firm-executed block triggers a restart or halt (in soft-and-firm mode, upon soft-executing at the rollup stop number no blocks will be executed past it).

For conductors running in soft-only mode the soft-executed block triggers the restart or halt.

The aforementioned rollup changes to be made in astria-geth necessitate changing our current charts config for the following. "Forks" now describes a set of instructions for when and how to stop and restart the EVM at different heights, such that changes (such as network migration) can be made.

config:
        # < non astria-prefixed items... >
        AstriaOverrideGenesisExtraData:
        AstriaRollupName:
        AstriaForks: []
                # - < fork name >:
                #         height: ""
                #         halt: ""
                #         snapshotChecksum: ""
                #         extraDataOverride: ""
                #         feeCollector: ""
                #         EIP1559Params: {}
                #         sequencer:
                #                 chainId: ""
                #                 addressPrefix: ""
                #                 startHeight: ""
                #         celestia:
                #                 chainId: ""
                #                 startHeight: ""
                #                 heightVariance: ""
                #         bridgeAddresses: []

Changes

Added new astria.execution.v2.GenesisInfo protobuf spec

Changes from v1:

  • Renamed field sequencer_genesis_height to sequencer_start_height with the same tag
  • Added fields rollup_start_number, rollup_stop_number, sequencer_chain_id, celestia_chain_id, halt_at_rollup_stop_number.

In astria-core

  • Added astria_core::execution::v2::GenesisInfo. Changes from v1:
    • Removed method sequencer_genesis_block_height
    • Added methods sequencer_start_height, rollup_start_block_number, rollup_stop_block_number, sequencer_chain_id, and celestia_chain_id, and halt_at_rollup_stop_number

In astria-conductor

  • Changed sequencer-to-rollup-height calculation to start from rollup_start_block_number in GenesisInfo, so that Conductor doesn't start from rollup height 1 again after restarting.
  • Removed ASTRIA_CONDUCTOR_SEQUENCER_CHAIN_ID and ASTRIA_CONDUCTOR_CELESTIA_CHAIN_ID env vars from Conductor (chain ID checks will now be performed against the values retrieved from the genesis info)
  • Added restart and halt logic for when the stop block number is reached: Conductor will shut down at the stop height if halt_at_rollup_stop_number is true, and restart otherwise.

In charts

  • Added astriaForks to EVM rollup config, to reflect changes in astria-geth: https://github.com/astriaorg/astria-geth/pull/59/files.
  • Add just command for deploying the EVM rollup restart test, which functions the same as our regular smoke test with a restart at sequencer height 20.

Testing

  • Added miscellaneous unit tests to ensure heights are correctly calculated and whether to restart is successfully determined.
  • Added blackbox tests to ensure conductor correctly restarts in each mode, and continues to execute at the correct height after fetching updated genesis info and commitment state.
  • Added blackbox test in SoftAndFirm mode to ensure the conductor shuts down at the stop height if GenesisInfo.halt_at_rollup_stop_number is true.
  • Added EVM rollup restart test which restarts EVM and Conductor at sequencer height 20 during the regular smoke test.

Changelogs

Changelogs updated.

Breaking Changelist

  • Proto changes are not technically breaking, since fields were only added, but the old genesis info shape will be rejected since it does not contain the chain IDs to validate against the networks the conductor is connecting to. Thus, rollups will need to provide an updated GenesisInfo to conductor (includingastria-geth).
  • Breaks domain type GenesisInfo.
  • Breaks Conductor config.

Related Issues

closes #1841

@github-actions github-actions bot added conductor pertaining to the astria-conductor crate proto pertaining to the Astria Protobuf spec labels Jan 24, 2025
string sequencer_chain_id = 6;
string celestia_chain_id = 7;
// True if the conductor should halt at the stop height instead of attempting restart.
bool halt_at_stop_height = 8;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels somewhat odd having the rollup dictate the behavior of conductor here - but how else to do it? Seems to make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideologically the conductor is meant to act as a sidecar to the rollup, so this makes sense to me. After all, this functionality is meant to accommodate a rollup's need for forks, as opposed to the conductor's.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there being two potential states and workflows here:

  1. The rollup genesis defines a halt and has no follow up configuration. The chain should just halt. The rollup can halt itself, but continue to receive committment updates, just refuse execution of new blocks.

    The conundrum is, how does conductor handle that? How does it know without being told essentially "finalize this fork before moving on"?

  2. There is a follow up fork configured. If given that information the rollup is technically ready to proceed, there is no real need to halt. However, it creates complications in conductor if there is a change in sequencer chain id. Similar problem as above, how do we make sure we finish validation of firm blocks? Probably going to need an update to configuration of node for rpc urls if any chain id changes or anything.

Maybe we change the name of this to finalize_at_end or something like that? I believe in terms of actual behaivior this is all we are dictating. It's a way to ensure that conductor doesn't restart or anything because execution failed at a halted state, and to ensure that we get firm commitments for all entries in the set before moving forward. Make soft only mode a bit odd i guess?

I dunno, seems awkward but "fine" for a sidecar at least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here is that halt means indefinitely stop processing new blocks (ie requires manual intervention). If there is a fork defined to halt and a followup fork after also defined, the rollup will never reach the followup fork. You have to manually intervene, remove the halt flag on the fork, re-init, and restart the rollup. The geth PR is coded to refuse execution of new blocks on a halted fork.

My assumption is that "finalize this fork before moving on" is true at every fork. i.e, at every fork, whether halted or not, soft and firm should come into sync before moving on. I believe this is how @ethanoroshiba coded it but I haven't looked too closely.

The halt_at_stop_height was intended to just be a way for conductor to gracefully shut itself down when you know that it cannot process anything more. Without it then I believe conductor will get stuck in a restarting loop or a crash loop in k8s when it tries to execute a new block and gets an execution error from geth.

i.e, the intention is this field does not add any functionality but just adds a nicety for ops. I'm not strongly tied to it if it makes sense to remove

@SuperFluffy SuperFluffy force-pushed the superfluffy/forma-restart-logic branch from 7a41c53 to 9eb12f1 Compare January 27, 2025 13:12
@ethanoroshiba ethanoroshiba added allow-breaking-proto docker-build used to trigger docker builds on PRs labels Jan 27, 2025
string sequencer_chain_id = 6;
string celestia_chain_id = 7;
// True if the conductor should halt at the stop height instead of attempting restart.
bool halt_at_stop_height = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideologically the conductor is meant to act as a sidecar to the rollup, so this makes sense to me. After all, this functionality is meant to accommodate a rollup's need for forks, as opposed to the conductor's.

@github-actions github-actions bot added documentation Improvements or additions to documentation ci issues that are related to ci and github workflows cd labels Jan 27, 2025
@ethanoroshiba ethanoroshiba dismissed their stale review January 28, 2025 14:31

Mission-critical changes for Forma are here, but would still be nice to address some of the other suggestions before merging.

@SuperFluffy SuperFluffy marked this pull request as ready for review January 28, 2025 17:41
@SuperFluffy SuperFluffy requested review from a team as code owners January 28, 2025 17:41
@SuperFluffy SuperFluffy force-pushed the superfluffy/forma-restart-logic branch from 2462a0f to f5051d5 Compare February 4, 2025 19:05
@SuperFluffy SuperFluffy changed the title feat(conductor)!: restart or stop at set height feat(conductor)!: re-fetch genesis or stop at set height Feb 7, 2025
@@ -11,68 +11,76 @@ images:
repo: ghcr.io/astriaorg/astria-geth
pullPolicy: IfNotPresent
tag: 1.0.0
devTag: latest
devTag: pr-59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps a specific sha version would be more stable

if commit_level.is_with_firm() {
let _ = map_firm_to_sequencer_height(&genesis_info, &commitment_state)?;
}
if commit_level.is_with_soft() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is wrong. Soft must update with firm in lock-step even if soft is not set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
allow-breaking-proto cd ci issues that are related to ci and github workflows conductor pertaining to the astria-conductor crate docker-build used to trigger docker builds on PRs documentation Improvements or additions to documentation override-freeze proto pertaining to the Astria Protobuf spec
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Forma migration
5 participants