Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many Trying to notify an import but the channel is full during Spammening on Kusama #6600

Open
AndreiEres opened this issue Nov 21, 2024 · 7 comments

Comments

@AndreiEres
Copy link
Contributor

image

txpool produced many warnings on Kusama, cc @michalkucharczyk

Logs in Grafana
https://grafana.teleport.parity.io/goto/duhHadnNg?orgId=1

@michalkucharczyk
Copy link
Contributor

The problem is with this channel:

pub fn import_notification_stream(&self) -> EventStream<ExtrinsicHash<B>> {
const CHANNEL_BUFFER_SIZE: usize = 1024;
let (sink, stream) = channel(CHANNEL_BUFFER_SIZE);
self.import_notification_sinks.lock().push(sink);
stream
}

Which is later consumed by transaction networking protocol:

pub async fn propagate_transaction_notifications<Block, ExPool>(
transaction_pool: Arc<ExPool>,
tx_handler_controller: sc_network_transactions::TransactionsHandlerController<
<Block as BlockT>::Hash,
>,
telemetry: Option<TelemetryHandle>,
) where
Block: BlockT,
ExPool: MaintainedTransactionPool<Block = Block, Hash = <Block as BlockT>::Hash>,
{
// transaction notifications
transaction_pool
.import_notification_stream()
.for_each(move |hash| {
tx_handler_controller.propagate_transaction(hash);
let status = transaction_pool.status();
telemetry!(
telemetry;
SUBSTRATE_INFO;
"txpool.import";
"ready" => status.ready,
"future" => status.future,
);
ready(())
})
.await;
}

This probably gets clogged and may lead to #5486.

We would need some improvements here. I don't see easy fix for this.

This issue is probably one more reason to start working on new transaction protocol.

@bkchr
Copy link
Member

bkchr commented Nov 21, 2024

I think this should fix the issue. Or at least reduce its likelihood.

@michalkucharczyk
Copy link
Contributor

I think this should fix the issue. Or at least reduce its likelihood.

Reading status is indeed under the same lock as import. But status operation itself is very cheap.

Also the conflicting lock during import is relatively cheap (comparing to whole work done in submit).
It just protects importing the transaction to the graph:

let imported = self.pool.write().import(tx)?;

which is here:
pub fn import(&mut self, tx: Transaction<Hash, Ex>) -> error::Result<Imported<Hash, Ex>> {

So by saying "both are cheap" - I try to say they don't interfer with each other to the extent that cause import is blocking status.

I also quickly tested proposed fix locally with zombienet (rpc + 3 collators + spamming transactions), and I still can see warning Trying to notify an import but the channel is full.

The fix still makes sense - there is no reason to update telemetry with every transaction. But from what I can say it will not solve this problem.

Maybe we should just remove the warning and accept that propagating single transactions on import notification won't happen for every tx. They will be propagate in the batch, which is happening periodically here:

self.propagate_transactions();

@bkchr
Copy link
Member

bkchr commented Nov 22, 2024

I also quickly tested proposed fix locally with zombienet (rpc + 3 collators + spamming transactions), and I still can see warning Trying to notify an import but the channel is full.

Hmm :D

Basically we are just forwarding from one channel to another, weird that this is not be able to keep up with the speed we are importing tx.

@michalkucharczyk
Copy link
Contributor

michalkucharczyk commented Nov 22, 2024

I think it goes down to:

for to_send in to_send {
let _ = self
.notification_service
.send_sync_notification(who, vec![to_send].encode());

and then to:
pub fn send_sync_notification(&self, message: impl Into<Vec<u8>>) {
let mut lock = self.inner.sync_channel.lock();
if let Some(tx) = lock.as_mut() {
let message = message.into();
let result = tx.try_send(NotificationsSinkMessage::Notification { message });
if result.is_err() {
// Cloning the `mpsc::Sender` guarantees the allocation of an extra spot in the
// buffer, and therefore `try_send` will succeed.
let _result2 = tx.clone().try_send(NotificationsSinkMessage::ForceClose);
debug_assert!(_result2.map(|()| true).unwrap_or_else(|err| err.is_disconnected()));
// Destroy the sender in order to not send more `ForceClose` messages.
*lock = None;
}
}
}

So maybe, once the network channel to the other peer is closed, the other side of the sync_channel in notifications_service stops consuming event? And the import_notification_sink can get clogged.

But I speculate here, as I am not yet familiar with this part of codebase.

@michalkucharczyk
Copy link
Contributor

michalkucharczyk commented Nov 22, 2024

And the size of SYNC_CHANNEL is relatively small comparing to number of txs imported during the spam test:

const SYNC_NOTIFICATIONS_BUFFER_SIZE: usize = 2048;

@bkchr
Copy link
Member

bkchr commented Nov 24, 2024

We only spawn one handler that iterates over all tx import notifications and not one per connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants