Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation for Row gossiping #443

Closed
wants to merge 25 commits into from

Conversation

Wondertan
Copy link
Member

@Wondertan Wondertan commented Jun 30, 2021

Implementation of #43 is ready for review. There is still work remaining to be done, but the required skeleton is implemented and should be reviewed before any further changes. Currently, we skip a few tests related to Polka cases and will unSkip those before merging.

PR bases on #427 and looks there for convenience. #427 should be merged before this.

TODO(All this should be done before merging)

  • Make RowSet jsonable
  • Currently implementation sends whole extended square, send vertical rectangle instead
  • Add DAHeader to Vote to unSkip few tests
  • Consider more block exec validation rules
  • Double check on fields to be added to proposal
  • Add more coverage and docs to RowSet
  • Add NumOriginalDataShares to Proposal
  • Disallow sending more than a half of rows for each peer.

@Wondertan Wondertan requested review from liamsi and evan-forbes June 30, 2021 10:40
@Wondertan Wondertan self-assigned this Jun 30, 2021
@@ -1042,7 +1036,7 @@ func (app *badApp) Commit() abci.ResponseCommit {
//--------------------------
// utils for making blocks

func makeBlockchainFromWAL(wal WAL) ([]*types.Block, []*types.Commit, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liamsi, need your feedback on this. Previously, it was possible to make blockchain from WAL for testing from msgs only, but now that requires state as well. Confirm if the solution is correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is only a test, I'm not really concerned much about the change. I'm wondering if this expands to other places using the WAL as well 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not

Comment on lines +1830 to +1834
if cs.ProposalBlockRows.TotalSize() > int(cs.state.ConsensusParams.Block.MaxBytes) {
return fmt.Errorf("propasal for block exceeding maximum block size (%d > %d)",
cs.ProposalBlockRows.TotalSize(), cs.state.ConsensusParams.Block.MaxBytes,
)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, instead of receiving parts, we can know in advance if the block exceeds the maximum size just by looking at Proposal.DAHeader, which is good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of validating if the row is too large, by looking at the number of shares, you deduce this to the whole block and look if this would be too large. Interesting. I think that works and gives more assurance than just the row level. Looping in @adlerjohn to double-check if we need to change anything here for spec compliance (independent of your changes: we can keep determining the max block size by ConsensusParams.Block.MaxBytes, right?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were planning on eventually removing the DA header from proposal since it's not strictly necessary. If we do remove it then we need another way of determining block size, which fortunately is deteminable from the Header.availableDataOriginalSharesUsed field.

The size of the original data square, availableDataOriginalSquareSize, isn't explicitly declared in the block header. Instead, it is implicitly computed as the smallest power of 2 whose square is at least availableDataOriginalSharesUsed (in other words, the smallest power of 4 that is at least availableDataOriginalSharesUsed).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adlerjohn, in current tm's implementation(using PartSetHeader) a node checks for blocks size every time it receives a chunk(part). In the above implementation(using DAHeader) we only need a Proposal and zero chunks(rows) to understand that block exceeds the limit. In case we want to rely on Header.availableDataOriginalSharesUsed, we would need to receive the entire block to compute the field and only after rejecting it.

Comment on lines 1881 to 1911
var commit *types.Commit
switch {
case cs.Height == cs.state.InitialHeight:
// We're creating a proposal for the first block.
// The commit is empty, but not nil.
commit = types.NewCommit(0, 0, types.BlockID{}, nil)
case cs.LastCommit.HasTwoThirdsMajority():
// Make the commit from LastCommit
commit = cs.LastCommit.MakeCommit()
default: // This shouldn't happen.
return added, fmt.Errorf("no commit for the previous block")
}

cs.ProposalBlock = block
cs.ProposalBlockRows, err = block.RowSet(context.TODO(), cs.dag)
cs.ProposalBlock = cs.state.MakeBlock(
cs.Proposal.Height,
data.Txs,
data.Evidence.Evidence,
data.IntermediateStateRoots.RawRootsList,
data.Messages,
commit,
cs.Validators.GetProposer().Address,
)

// TODO(Wondertan): This is unnecessary in general, but for now it writes needed fields
// and specifically NumOriginalDataShares, which likely should be par of the proposal
cs.ProposalBlockRows, err = cs.ProposalBlock.RowSet(context.TODO(), mdutils.Mock())
if err != nil {
return false, err
return added, err
}
cs.ProposalBlockParts = cs.ProposalBlock.MakePartSet(types.BlockPartSizeBytes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is important to be reviewed. We were discussing that the Proposal needs to have LastCommit if row gossiping is implemented, but the node can also make it itself, is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I see a reason to add NumOriginalDataShares to Proposal, otherwise, validators would need to ComputeShares themselves to get the number and add it to Header.

Comment on lines +187 to +189
// This test injects invalid field to block and checks if state discards it by voting nil
func TestStateBadProposal(t *testing.T) {
t.Skip("Block Executor don't have any validation for types.Data fields and we can't inject bad data there")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointing to this. Currently don't have ideas on what validation rules to add to block exec for types.Data specifically.

@Wondertan
Copy link
Member Author

Wondertan commented Jun 30, 2021

NOTE: Proto breaking failure is expected

@Wondertan
Copy link
Member Author

NOTE: I bet EDS caching in Block should fix timeouts issues in CI

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first preliminary pass: this is extra dope and I'm amazed by how quickly this was put together!

I think before we merge this, we should:

  • run (realistic) experiments with the current block gossiping
  • run (realistic) experiments and tests with the changed block gossiping here
  • (although slightly orthogonal) better understand how we want to move forward with the full storage nodes and how we want to store the data on tendermint nodes

Also, I wonder if we should instead try to propose these changes suggested here and here to the tendermint team directly?

@marbar3778 @tessr is that sth the tendermint team would be e interested in or are erasure coding off the table and you guys are aiming to improve gossiping via other means?

Comment on lines +159 to +165
for i, r := range rs.rows {
r.ForEachShare(func(j int, share []byte) {
shares[(i*size)+j] = share
})
}
return rsmt2d.ImportExtendedDataSquare(
shares,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only to work around the fact that rsmt2d does not support incrementally handling rows directly, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we need to pass there only shares. Also, that should change to Repair.

Comment on lines +215 to +217
if !rs.DAHeader.RowsRoots[row.Index].Equal(&root) {
return false, ErrInvalidRow
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: what happens after we detected an invalid row?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can swap out the row with a part in the question. We can try following tm's behavior

@@ -1042,7 +1036,7 @@ func (app *badApp) Commit() abci.ResponseCommit {
//--------------------------
// utils for making blocks

func makeBlockchainFromWAL(wal WAL) ([]*types.Block, []*types.Commit, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is only a test, I'm not really concerned much about the change. I'm wondering if this expands to other places using the WAL as well 🤔

Comment on lines +1830 to +1834
if cs.ProposalBlockRows.TotalSize() > int(cs.state.ConsensusParams.Block.MaxBytes) {
return fmt.Errorf("propasal for block exceeding maximum block size (%d > %d)",
cs.ProposalBlockRows.TotalSize(), cs.state.ConsensusParams.Block.MaxBytes,
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of validating if the row is too large, by looking at the number of shares, you deduce this to the whole block and look if this would be too large. Interesting. I think that works and gives more assurance than just the row level. Looping in @adlerjohn to double-check if we need to change anything here for spec compliance (independent of your changes: we can keep determining the max block size by ConsensusParams.Block.MaxBytes, right?).

@@ -384,7 +385,7 @@ func byzantineDecideProposalFunc(t *testing.T, height int64, round int32, cs *St
// Avoid sending on internalMsgQueue and running consensus state.

// Create a new proposal block from state/txs from the mempool.
block1, blockParts1, _ := cs.createProposalBlock()
block1, blockParts1, blockRows1 := cs.createProposalBlock(cs.privValidatorPubKey.Address())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not asking to do this in this PR but I'm wondering if the blockParts would be removed entirely as part of this work. This also trickles into the storage I guess? As tendermint currently stores the data in parts 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should also affect storing.

Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mad props. I think I have a much clearer picture now that it's basically already implemented 😅

Sorry I don't have much to add/comment, I went through each change and everything seems rational. I'll try and re-review after chewing on it more and getting a better grasp of the consensus reactor.

Any failing non e2e tests just seem flaky

As for the e2e, the CI isn't posting logs, so I figured I'd post some here. It looks like there are some nil references occuring while handling messages

full01 logs
generating ED25519 keypair...done
peer identity: 12D3KooWP8efcDSYcrhnJxobCSgCtY1ZvA5uGf8rtJqGZspbSAA1
I[2021-07-07|23:48:18.764] Successfully initialized IPFS repository     module=main ipfs-path=ipfs
I[2021-07-07|23:48:19.671] Successfully created embedded IPFS node      module=main ipfs-repo=ipfs
I[2021-07-07|23:48:19.672] Version info                                 module=main software= block=11 p2p=8
I[2021-07-07|23:48:19.681] Starting Node service                        module=main impl=Node
I[2021-07-07|23:48:19.682] Starting StateSyncShim service               module=statesync impl=StateSyncShim
I[2021-07-07|23:48:19.682] Starting StateSync service                   module=statesync impl=StateSync
I[2021-07-07|23:48:20.020] Executed block                               module=state height=1000 validTxs=11 invalidTxs=0
I[2021-07-07|23:48:20.021] Committed state                              module=state height=1000 txs=11 appHash=99E8778DB43EF2EE8797F2EBE67C3034C2670347315361228BAEDF6D07509E8E
I[2021-07-07|23:48:20.047] Executed block                               module=state height=1001 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.047] Committed state                              module=state height=1001 txs=7 appHash=DF439F355ED9C29CCF0B8D2562EE772EC5F88FC2FE9FAA5706E340C2CF7799A0
I[2021-07-07|23:48:20.075] Executed block                               module=state height=1002 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.076] Committed state                              module=state height=1002 txs=7 appHash=DF61E17EF23DE2E4CE6E10828043830C0045F12051D1CE1FF09C4128D82C8A71
I[2021-07-07|23:48:20.104] Executed block                               module=state height=1003 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.105] Committed state                              module=state height=1003 txs=7 appHash=FBA99B42D277EB6C14444EF39A8FFD0D4C3373C40AC134AD2E9BF3F318302430
I[2021-07-07|23:48:20.132] Executed block                               module=state height=1004 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.133] Committed state                              module=state height=1004 txs=7 appHash=0996D3AE39D50B34E6669853B8FA5BFBC423108ABF7ED96A569715B42EB0A035
I[2021-07-07|23:48:20.163] Executed block                               module=state height=1005 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.165] Committed state                              module=state height=1005 txs=7 appHash=EB1CDE49EB0BA750E500EBF13A2B3859ABBD5B2AFB0C5700A5FC9E9A088F6995
I[2021-07-07|23:48:20.196] Executed block                               module=state height=1006 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.198] Committed state                              module=state height=1006 txs=7 appHash=498850D69E85E8CC00CFFA921ECF0C72ED48C72DE3B344F48B04BEC0E53DD691
I[2021-07-07|23:48:20.225] Executed block                               module=state height=1007 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.226] Committed state                              module=state height=1007 txs=7 appHash=37E3B36FC00D0AD4B1A64FE94F8F1B3877B6931D9E494724A6D6CC9A489D9DFA
I[2021-07-07|23:48:20.252] Executed block                               module=state height=1008 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.254] Committed state                              module=state height=1008 txs=7 appHash=239E0EA404DF6B8C6777091C440D679D82A49F6BF05C7964C4D5E26B2EE8629C
I[2021-07-07|23:48:20.280] Executed block                               module=state height=1009 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:20.281] Committed state                              module=state height=1009 txs=7 appHash=AFDA7FF9F989343376E181AD37130EF1F51572771B8EAC48DAC5AFC49B608B7E
I[2021-07-07|23:48:20.307] Executed block                               module=state height=1010 validTxs=6 invalidTxs=0
I[2021-07-07|23:48:20.307] Updates to validators                        module=state updates=32DC06149F04267667E5653B361373206B1536C6:50
I[2021-07-07|23:48:20.309] Committed state                              module=state height=1010 txs=6 appHash=54E872FADD688FA35F6E98135180B466AAE90C5FC711390C5BCF7FEA96DCAC06
E[2021-07-07|23:48:20.887] CONSENSUS FAILURE!!!                         module=consensus err="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 11472 [running]:\nruntime/debug.Stack(0xc0212753a0, 0x25034e0, 0x3a0f350)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9f\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).receiveRoutine.func2(0xc01322a380, 0x2a72f68)\n\t/src/tendermint/consensus/state.go:720 +0x57\npanic(0x25034e0, 0x3a0f350)\n\t/usr/local/go/src/runtime/panic.go:969 +0x1b9\ngithub.com/lazyledger/lazyledger-core/consensus.(*Reactor).broadcastNewValidBlockMessage(0xc01321cd80, 0xc01322a448)\n\t/src/tendermint/consensus/reactor.go:443 +0x67\ngithub.com/lazyledger/lazyledger-core/consensus.(*Reactor).subscribeToBroadcastEvents.func2(0x263c320, 0xc01322a448)\n\t/src/tendermint/consensus/reactor.go:415 +0x45\ngithub.com/lazyledger/lazyledger-core/libs/events.(*eventCell).FireEvent(0xc0196e6460, 0x263c320, 0xc01322a448)\n\t/src/tendermint/libs/events/events.go:198 +0x1e3\ngithub.com/lazyledger/lazyledger-core/libs/events.(*eventSwitch).FireEvent(0xc000ae6850, 0x27e7d34, 0xa, 0x263c320, 0xc01322a448)\n\t/src/tendermint/libs/events/events.go:158 +0xa7\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).enterCommit(0xc01322a380, 0x3f3, 0x0)\n\t/src/tendermint/consensus/state.go:1520 +0x971\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).addVote(0xc01322a380, 0xc00c7f5220, 0xc006cf22d0, 0x28, 0xc021275ac8, 0xe35ea7, 0xc01322a438)\n\t/src/tendermint/consensus/state.go:2159 +0xbe5\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).tryAddVote(0xc01322a380, 0xc00c7f5220, 0xc006cf22d0, 0x28, 0x3b9eae0, 0x34de33d7, 0xed8783444)\n\t/src/tendermint/consensus/state.go:1954 +0x59\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).handleMsg(0xc01322a380, 0x2c343e0, 0xc020fce7d0, 0xc006cf22d0, 0x28)\n\t/src/tendermint/consensus/state.go:820 +0x865\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).receiveRoutine(0xc01322a380, 0x0)\n\t/src/tendermint/consensus/state.go:753 +0x7d6\ncreated by github.com/lazyledger/lazyledger-core/consensus.(*State).OnStart\n\t/src/tendermint/consensus/state.go:393 +0x896\n"
E[2021-07-07|23:48:39.007] Error on broadcastTxCommit                   module=rpc err="timed out waiting for tx to be included in a block"
E[2021-07-07|23:48:45.516] Error on broadcastTxCommit                   module=rpc err="timed out waiting for tx to be included in a block"
E[2021-07-07|23:48:49.008] Error on broadcastTxCommit                   module=rpc err="timed out waiting for tx to be included in a block"
E[2021-07-07|23:48:49.785] Stopping peer for error                      module=p2p peer="Peer{MConn{10.186.73.4:26656} 0a9c99096b50a6d72a24d5aa5286d3f7022b3555 out}" err=EOF
E[2021-07-07|23:48:49.785] Stopping peer for error                      module=p2p peer="Peer{MConn{10.186.73.5:26656} 8773a83e2f9fa4bc7a3e94303eb4df33af294288 out}" err=EOF
validator02 logs
I[2021-07-07|23:47:49.387] Starting SignerServer service                impl=SignerServer
I[2021-07-07|23:47:49.387] Remote signer connecting to tcp://0.0.0.0:27559 
D[2021-07-07|23:47:49.387] SignerDialer: Reconnection failed            retries=1 max=100 err="dial tcp 0.0.0.0:27559: connect: connection refused"
D[2021-07-07|23:47:50.387] SignerDialer: Reconnection failed            retries=2 max=100 err="dial tcp 0.0.0.0:27559: connect: connection refused"
D[2021-07-07|23:47:51.388] SignerDialer: Connection Ready               
generating ED25519 keypair...done
peer identity: 12D3KooWPuqfdneFeVXwk9oMuGfqadLdG727nnsF4pTUBtuzcp4L
I[2021-07-07|23:47:51.399] Successfully initialized IPFS repository     module=main ipfs-path=ipfs
I[2021-07-07|23:47:52.558] Successfully created embedded IPFS node      module=main ipfs-repo=ipfs
I[2021-07-07|23:47:52.559] Version info                                 module=main software= block=11 p2p=8
I[2021-07-07|23:47:52.570] Starting Node service                        module=main impl=Node
I[2021-07-07|23:47:52.572] Starting StateSyncShim service               module=statesync impl=StateSyncShim
I[2021-07-07|23:47:52.572] Starting StateSync service                   module=statesync impl=StateSync
E[2021-07-07|23:47:52.674] Stopping peer for error                      module=p2p peer="Peer{MConn{10.186.73.5:26656} 8773a83e2f9fa4bc7a3e94303eb4df33af294288 out}" err=EOF
I[2021-07-07|23:47:59.918] Executed block                               module=state height=1000 validTxs=11 invalidTxs=0
I[2021-07-07|23:47:59.918] Committed state                              module=state height=1000 txs=11 appHash=99E8778DB43EF2EE8797F2EBE67C3034C2670347315361228BAEDF6D07509E8E
I[2021-07-07|23:48:01.575] Executed block                               module=state height=1001 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:01.576] Committed state                              module=state height=1001 txs=7 appHash=DF439F355ED9C29CCF0B8D2562EE772EC5F88FC2FE9FAA5706E340C2CF7799A0
I[2021-07-07|23:48:03.117] Executed block                               module=state height=1002 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:03.118] Committed state                              module=state height=1002 txs=7 appHash=DF61E17EF23DE2E4CE6E10828043830C0045F12051D1CE1FF09C4128D82C8A71
I[2021-07-07|23:48:05.152] Executed block                               module=state height=1003 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:05.153] Committed state                              module=state height=1003 txs=7 appHash=FBA99B42D277EB6C14444EF39A8FFD0D4C3373C40AC134AD2E9BF3F318302430
I[2021-07-07|23:48:06.781] Executed block                               module=state height=1004 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:06.782] Committed state                              module=state height=1004 txs=7 appHash=0996D3AE39D50B34E6669853B8FA5BFBC423108ABF7ED96A569715B42EB0A035
I[2021-07-07|23:48:08.565] Executed block                               module=state height=1005 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:08.566] Committed state                              module=state height=1005 txs=7 appHash=EB1CDE49EB0BA750E500EBF13A2B3859ABBD5B2AFB0C5700A5FC9E9A088F6995
I[2021-07-07|23:48:10.265] Executed block                               module=state height=1006 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:10.266] Committed state                              module=state height=1006 txs=7 appHash=498850D69E85E8CC00CFFA921ECF0C72ED48C72DE3B344F48B04BEC0E53DD691
E[2021-07-07|23:48:11.264] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.264] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.264] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.264] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.264] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.264] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Failed to provide to DHT                     module=consensus height=1002 err="context canceled"
E[2021-07-07|23:48:11.266] Providing Block didn't finish in time and was terminated module=consensus height=1002
I[2021-07-07|23:48:11.856] Executed block                               module=state height=1007 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:11.862] Committed state                              module=state height=1007 txs=7 appHash=37E3B36FC00D0AD4B1A64FE94F8F1B3877B6931D9E494724A6D6CC9A489D9DFA
I[2021-07-07|23:48:13.498] Executed block                               module=state height=1008 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:13.499] Committed state                              module=state height=1008 txs=7 appHash=239E0EA404DF6B8C6777091C440D679D82A49F6BF05C7964C4D5E26B2EE8629C
I[2021-07-07|23:48:15.592] Executed block                               module=state height=1009 validTxs=7 invalidTxs=0
I[2021-07-07|23:48:15.593] Committed state                              module=state height=1009 txs=7 appHash=AFDA7FF9F989343376E181AD37130EF1F51572771B8EAC48DAC5AFC49B608B7E
I[2021-07-07|23:48:17.204] Executed block                               module=state height=1010 validTxs=6 invalidTxs=0
I[2021-07-07|23:48:17.204] Updates to validators                        module=state updates=32DC06149F04267667E5653B361373206B1536C6:50
I[2021-07-07|23:48:17.205] Committed state                              module=state height=1010 txs=6 appHash=54E872FADD688FA35F6E98135180B466AAE90C5FC711390C5BCF7FEA96DCAC06
I[2021-07-07|23:48:18.784] Executed block                               module=state height=1011 validTxs=5 invalidTxs=0
I[2021-07-07|23:48:18.785] Committed state                              module=state height=1011 txs=5 appHash=0DEA959BBD42E163234AD3C1C46539B6DE1C646C693E1063AFB01FFECE7B024A
E[2021-07-07|23:48:19.766] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.767] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Failed to provide to DHT                     module=consensus height=1007 err="context canceled"
E[2021-07-07|23:48:19.768] Providing Block didn't finish in time and was terminated module=consensus height=1007
E[2021-07-07|23:48:28.876] Error on broadcastTxCommit                   module=rpc err="timed out waiting for tx to be included in a block"
E[2021-07-07|23:48:33.502] Error on broadcastTxCommit                   module=rpc err="timed out waiting for tx to be included in a block"
seed02 logs
generating ED25519 keypair...done
peer identity: 12D3KooWSKZKJNGD45HbTA8PEBTGgETi6HQ6cBo5aYB6ixh4KLb8
I[2021-07-07|23:47:46.008] Successfully initialized IPFS repository     module=main ipfs-path=ipfs
I[2021-07-07|23:47:46.947] Successfully created embedded IPFS node      module=main ipfs-repo=ipfs
I[2021-07-07|23:47:46.947] Version info                                 module=main software= block=11 p2p=8
I[2021-07-07|23:47:46.957] Starting Node service                        module=main impl=Node
I[2021-07-07|23:47:46.957] Starting StateSyncShim service               module=statesync impl=StateSyncShim
I[2021-07-07|23:47:46.958] Starting StateSync service                   module=statesync impl=StateSync
E[2021-07-07|23:47:47.058] Stopping peer for error                      module=p2p peer="Peer{MConn{10.186.73.4:26656} 0a9c99096b50a6d72a24d5aa5286d3f7022b3555 out}" err=EOF
E[2021-07-07|23:48:17.063] Stopping peer for error                      module=p2p peer="Peer{MConn{10.186.73.4:26656} 0a9c99096b50a6d72a24d5aa5286d3f7022b3555 out}" err=EOF
E[2021-07-07|23:48:17.262] CONSENSUS FAILURE!!!                         module=consensus err="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 1116 [running]:\nruntime/debug.Stack(0xc013ed93a0, 0x25034e0, 0x3a0f350)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9f\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).receiveRoutine.func2(0xc013f32a80, 0x2a72f68)\n\t/src/tendermint/consensus/state.go:720 +0x57\npanic(0x25034e0, 0x3a0f350)\n\t/usr/local/go/src/runtime/panic.go:969 +0x1b9\ngithub.com/lazyledger/lazyledger-core/consensus.(*Reactor).broadcastNewValidBlockMessage(0xc000c6ba80, 0xc013f32b48)\n\t/src/tendermint/consensus/reactor.go:443 +0x67\ngithub.com/lazyledger/lazyledger-core/consensus.(*Reactor).subscribeToBroadcastEvents.func2(0x263c320, 0xc013f32b48)\n\t/src/tendermint/consensus/reactor.go:415 +0x45\ngithub.com/lazyledger/lazyledger-core/libs/events.(*eventCell).FireEvent(0xc0196ba8e0, 0x263c320, 0xc013f32b48)\n\t/src/tendermint/libs/events/events.go:198 +0x1e3\ngithub.com/lazyledger/lazyledger-core/libs/events.(*eventSwitch).FireEvent(0xc0001990a0, 0x27e7d34, 0xa, 0x263c320, 0xc013f32b48)\n\t/src/tendermint/libs/events/events.go:158 +0xa7\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).enterCommit(0xc013f32a80, 0x3e8, 0x0)\n\t/src/tendermint/consensus/state.go:1520 +0x971\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).addVote(0xc013f32a80, 0xc01a3bcfa0, 0xc01b88cf90, 0x28, 0x21b4787, 0xc00055d8c0, 0xc013fae480)\n\t/src/tendermint/consensus/state.go:2159 +0xbe5\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).tryAddVote(0xc013f32a80, 0xc01a3bcfa0, 0xc01b88cf90, 0x28, 0x3b9eae0, 0xf9d3666, 0xed8783441)\n\t/src/tendermint/consensus/state.go:1954 +0x59\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).handleMsg(0xc013f32a80, 0x2c343e0, 0xc0011d2b50, 0xc01b88cf90, 0x28)\n\t/src/tendermint/consensus/state.go:820 +0x865\ngithub.com/lazyledger/lazyledger-core/consensus.(*State).receiveRoutine(0xc013f32a80, 0x0)\n\t/src/tendermint/consensus/state.go:753 +0x7d6\ncreated by github.com/lazyledger/lazyledger-core/consensus.(*State).OnStart\n\t/src/tendermint/consensus/state.go:393 +0x896\n"
E[2021-07-07|23:50:17.061] Stopping peer for error                      module=p2p peer="Peer{MConn{10.186.73.4:26656} 0a9c99096b50a6d72a24d5aa5286d3f7022b3555 out}" err=EOF
E[2021-07-07|23:52:17.061] Stopping peer for error                      module=p2p peer="Peer{MConn{10.186.73.4:26656} 0a9c99096b50a6d72a24d5aa5286d3f7022b3555 out}" err=EOF

also, the full02 node isn't booting up. full02 only connects via a seed node after height 1000, so it relies on some genesis state. It could be that the initial state wasn't able to be validated. That was atleast the cause here

if part == nil {
logger.Error("Could not load part", "index", index,
"blockPartSetHeader", blockMeta.BlockID.PartSetHeader, "peerBlockPartSetHeader", prs.ProposalBlockPartSetHeader)
rs, err := b.RowSet(context.TODO(), mdutils.Mock())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using a mock here just to not bother saving the data via PutBlock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Also, I aim to change that before merging. I don't like the current approach I've taken with RowSet. It is not practical for some cases like this.

@Wondertan Wondertan force-pushed the hlib/block-propagation-2 branch from d0c602a to a74bc01 Compare July 9, 2021 08:18
@liamsi
Copy link
Member

liamsi commented Aug 17, 2021

@Wondertan are you OK with closing this PR as well? I'd keep the branch around as we might pick that up in the future again.

@liamsi liamsi closed this Aug 17, 2021
@rootulp rootulp deleted the hlib/block-propagation-3 branch September 22, 2022 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants