Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPS-???? | Block Delay Centralisation #943
base: master
Are you sure you want to change the base?
CPS-???? | Block Delay Centralisation #943
Changes from 14 commits
fdb3b0d
64dec3e
cb416da
e7520b9
f9cf028
8d96ae1
2bfbc81
8381ac1
c63ce0d
b131ea2
d4a8dae
e7bf9b4
7eef46e
2eff5b1
d9ec47a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This CPS does not provide data showing that 1s is not enough for blocks to propagate anywhere in the world with required hardware, connection and configuration. Without such data, it's impossible to determine if it's needed or not, and what solution would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often get 1000ms ping times just crossing India's border: though AWS Mumbai is probably exempt from such delays, which I imagine result from "great firewall" type packet inspection from the newly founded & somewhat ill-equipped surveillance state here. (p.s. this has become an additional reason why our pool is administered in India but all nodes are in Europe: which supports the author's premise fairly well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block was produced by my BP earlier today. It was full at 86.57kB in size, containing 64 transactions, and 66.17kB of scripts: https://cexplorer.io/block/c740f9ce8b25410ddb938ff8c42e12738c18b7fd040ae5224c53fb45f04b3ba0
These are the delays (from beginning of the slot) before each of my own relays included this block in their chains:
The average propagation delay by nodes pinging the data to pooltool was 1.67 seconds: https://pooltool.io/realtime/11169975
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TerminadaDrep Could you add the above delay metrics to the CPS? I think having empirical data would help strengthen the case of this CPS. Also, could you indicate whether your BP is locally controlled or in a vps? I'm guessing it is locally controlled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't say in which country is the BP.
Also I'm not sure we should target low spec VPS nodes, is that an aim of Cardano? Or even just VPSs?
Good nodes require a good control of the hardware and software, which VPSs don't really offer. Some in this list are particularly known to provide bad performance, and virtualization adds overhead.
Moreover configuration optimization can help with latency (tracing, mempool size, TCP slow start, congestion control, etc..), so more details are needed.
Overall I believe we cannot conclude that 1s is not enough with just this data point.
As a counter-example here is a SMAUG full 86.37 KB block with 97 transactions propagated in average in 0.46s:
https://pooltool.io/realtime/11147794
My furthest relay in Singapore received it in 550ms.
And most of my blocks propagate quicker than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not generally saying intercontinental propagation is irrelevant here. It actually is measurable and enforced by speed of light physical law. So I also disagree with one of the points in the CPS, in expectations of future transmission and latency improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting that the delay is clearly in the Australian part of the internet. Perhaps the Aussie national broadband network (NBN) was more congested than usual at this time.
Certainly this block had worse than usual propagation delays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more examples which include pooltool data for a couple of pools in Japan.
I also added a new "Arguments against" No: 5. to discuss the extra infrastructure cost requirements for a BP in Australia to try to reduce the disadvantage that is inherent to the current Ouroboros implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gufmar I'm not an expert on networks, but I really don't think we should be relying on improvements to network throughput here due to the rebound effect. Demand will very likely increase to use up any extra "slack" (eg, Leios, block size increases, and even non-blockchain demand). If network capacity doubles but so does demand, we can easily find ourselves here again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry if I was unclear. I wanted to say I disagree with the argument made at https://github.com/cardano-foundation/CIPs/blob/e7bf9b4c103f3841f2d8364e78905c1183ee9526/CPS-XXXX/README.md#arguments-against-correcting-this-unfairness
because I don't expect we will see significant improvements in network latency which is the main limiting factor here with the TCP ACK back and forth packets, not so much the Throughput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some data is needed to prove the Australia's case and to be able to reproduce it and evaluate working solutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example this AWS datacenter to datacenter round trip latency map does not seem to be enough to prove the point:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with data helping, but I want to point out that using AWS as the benchmark doesn't seem appropriate since the goal is to not have AWS control most of the block producers.
P.S. I'm not suggesting you mean to use AWS as the benchmark, I just felt like this point should be made explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we can't be sure these hop times aren't between AWS back-end networks: and therefore not including time spent for unaffiliated traffic to enter & exit backbone networks or cross the "last mile" of retail Internet services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point was that if AWS datacenter to datacenter round trip latency was already more than 1s between 2 points in the world, it would have been enough to prove the CPS point, because it's close to the best case connectivity wise (independently from the centralization issue). But that's not the case, so more data is needed. I didn't mean to say anything else, you are interpreting, so I think it was appropriate, to show that more data is indeed needed. See my original quote:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not this week, because involved in other things, but I can assure you, we have plenty of latency and propagation data. Not only general latency but actual mainnet block propagation times as transmitted and received via Ouroboros mini protocols.
And we have it for 2.5 years of history, covering a bunch of different normal and extraordinary network situations. For small and large blocks, with no, up to max script execution units.
The previously showed gantt chart here is just one visualization for one block. I'm happy to invite to a workshop call where we go through some of these data points, computed averages for predefined edge cases etc.