Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPS-???? | Block Delay Centralisation #943

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
109 changes: 109 additions & 0 deletions CPS-XXXX/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
CPS: ?
Title: Block Delay Centralisation
Status: Open
Category: Consensus
rphair marked this conversation as resolved.
Show resolved Hide resolved
Authors:
- Terminada <[email protected]>
Proposed Solutions: []
Discussions:
- https://github.com/cardano-foundation/CIPs/pull/943
TerminadaDrep marked this conversation as resolved.
Show resolved Hide resolved
- https://forum.cardano.org/t/problem-with-increasing-blocksize-or-processing-requirements/140044
Created: 2024-10-22
License: CC-BY-4.0
---

## Abstract
An underlying assumption in the design of Cardano's Ouroboros protocol is that the probability of a stake pool being permitted to update the ledger is proportional to the relative stake of that pool. However, the current implementation does not properly realise this design goal.

Minor network delays endured by some participants cause them to face a greater number of fork battles. The result is that more geographically decentralised participants do not obtain participation that is proportional to their relative stake. This is both a fairness and security issue.

## Problem
The [Ouroboros Praos paper](<https://eprint.iacr.org/2017/573.pdf>) states that it is the "*probability of being permitted to issue a block*" that "is proportional to the relative stake a player has in the system". However, it is clear from the security analysis section that it is the _probability of contributing to the ledger_ that is supposed to be proportional to the relative stake of the participant. This is an important distinction because it matters little if a player has stake weighted permission to make blocks if the protocol later disproportionately drops that players blocks despite their honest performance.

With the current Ouroboros implementation where slots have 1 second duration, it is not uncommon to have consecutive slot leaders that are only 1 second apart. Remote participants may be unable to receive the previous block within 1 second and so these block producers will make blocks that result in forks. Ouroboros settles such forks by preferring the fork whose terminal block has the lowest VRF value. This seems like a fair method because it is deterministic, neither player can alter the outcome, and each player has an equal chance of winning. However, the problem is that the more geographically decentralised players will face a disproportionately greater number of "fork battles". This in turn means that they will get more of their blocks dropped. The net result is that more geographically decentralised participants do not receive stake weighted _probability of contributing to the ledger_. This is not only unfair, but also represents a deviation from the Ouroboros security analysis assumptions.

This might seem like a minor problem, but the effect is significant. If the majority of the network reside in USA - Europe with close connectivity and less than 1 second propagation delays, then those participant pools will see 5% of their blocks suffering "fork battles" which will only occur when another pool is awarded the exact same slot (ie: a "slot battle"). They will lose half of these battles on average causing 2.5% of their blocks to get dropped, or "orphaned".

However, for a pool that happens to reside on the other side of the world where network delays might be just over 1 second, this pool will suffer "fork battles" not only with pools awarded the same slot, but also the slot before, and the slot after. In other words, this geographically decentralised pool will suffer 3 times the number of slot battles, amounting to 15% of its blocks, and resulting in 7.5% of its blocks getting dropped. The numbers are even worse for a pool suffering 2 second network delays because it will suffer 5 times the number of "fork battles" and see 12.5% of its blocks "orphaned". This not only results in an unfair reduction in rewards, but also the same magnitude reduction in contribution to the ledger.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CPS does not provide data showing that 1s is not enough for blocks to propagate anywhere in the world with required hardware, connection and configuration. Without such data, it's impossible to determine if it's needed or not, and what solution would work.

Copy link
Collaborator

@rphair rphair Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often get 1000ms ping times just crossing India's border: though AWS Mumbai is probably exempt from such delays, which I imagine result from "great firewall" type packet inspection from the newly founded & somewhat ill-equipped surveillance state here. (p.s. this has become an additional reason why our pool is administered in India but all nodes are in Europe: which supports the author's premise fairly well.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block was produced by my BP earlier today. It was full at 86.57kB in size, containing 64 transactions, and 66.17kB of scripts: https://cexplorer.io/block/c740f9ce8b25410ddb938ff8c42e12738c18b7fd040ae5224c53fb45f04b3ba0

These are the delays (from beginning of the slot) before each of my own relays included this block in their chains:

  • Relay 1 (ARM on same LAN) → Delayed=0.263s
  • Relay 2 (AMD on adjacent LAN) → Delayed=0.243s
  • Relay 3 (ARM approx 5 Km away) → Delayed=0.288s
  • Relay 4 (AMD Contabo vps in USA) → Delayed=2.682s
  • Relay 5 (ARM Netcup vps in USA) → Delayed=1.523s

The average propagation delay by nodes pinging the data to pooltool was 1.67 seconds: https://pooltool.io/realtime/11169975

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TerminadaDrep Could you add the above delay metrics to the CPS? I think having empirical data would help strengthen the case of this CPS. Also, could you indicate whether your BP is locally controlled or in a vps? I'm guessing it is locally controlled.

Copy link
Contributor

@SmaugPool SmaugPool Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't say in which country is the BP.
Also I'm not sure we should target low spec VPS nodes, is that an aim of Cardano? Or even just VPSs?

Good nodes require a good control of the hardware and software, which VPSs don't really offer. Some in this list are particularly known to provide bad performance, and virtualization adds overhead.

Moreover configuration optimization can help with latency (tracing, mempool size, TCP slow start, congestion control, etc..), so more details are needed.

Overall I believe we cannot conclude that 1s is not enough with just this data point.

As a counter-example here is a SMAUG full 86.37 KB block with 97 transactions propagated in average in 0.46s:
https://pooltool.io/realtime/11147794

My furthest relay in Singapore received it in 550ms.

And most of my blocks propagate quicker than that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not generally saying intercontinental propagation is irrelevant here. It actually is measurable and enforced by speed of light physical law. So I also disagree with one of the points in the CPS, in expectations of future transmission and latency improvements.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting that the delay is clearly in the Australian part of the internet. Perhaps the Aussie national broadband network (NBN) was more congested than usual at this time.

Certainly this block had worse than usual propagation delays.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more examples which include pooltool data for a couple of pools in Japan.

I also added a new "Arguments against" No: 5. to discuss the extra infrastructure cost requirements for a BP in Australia to try to reduce the disadvantage that is inherent to the current Ouroboros implementation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I also disagree with one of the points in the CPS, in expectations of future transmission and latency improvements.

@gufmar I'm not an expert on networks, but I really don't think we should be relying on improvements to network throughput here due to the rebound effect. Demand will very likely increase to use up any extra "slack" (eg, Leios, block size increases, and even non-blockchain demand). If network capacity doubles but so does demand, we can easily find ourselves here again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry if I was unclear. I wanted to say I disagree with the argument made at https://github.com/cardano-foundation/CIPs/blob/e7bf9b4c103f3841f2d8364e78905c1183ee9526/CPS-XXXX/README.md#arguments-against-correcting-this-unfairness

because I don't expect we will see significant improvements in network latency which is the main limiting factor here with the TCP ACK back and forth packets, not so much the Throughput.


Even the high quality infrastructure of a first world country like Australia is not enough to reliably overcome this problem due to its geographical location. But is it reasonable to expect all block producers across the world to receive blocks in under one second whenever the internet becomes congested, or if block size is increased following parameter changes? Unfortunately, the penalty for a block producer that cannot sustain this remarkable feat of less than 1 second block receipt and propagation, is 3 times as many "fork battles" resulting in 7.5% "orphaned" blocks rather than 2.5%.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some data is needed to prove the Australia's case and to be able to reproduce it and evaluate working solutions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example this AWS datacenter to datacenter round trip latency map does not seem to be enough to prove the point:
image

Copy link

@fallen-icarus fallen-icarus Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with data helping, but I want to point out that using AWS as the benchmark doesn't seem appropriate since the goal is to not have AWS control most of the block producers.

P.S. I'm not suggesting you mean to use AWS as the benchmark, I just felt like this point should be made explicit.

Copy link
Collaborator

@rphair rphair Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we can't be sure these hop times aren't between AWS back-end networks: and therefore not including time spent for unaffiliated traffic to enter & exit backbone networks or cross the "last mile" of retail Internet services.

Copy link
Contributor

@SmaugPool SmaugPool Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with data helping, but I want to point out that using AWS as the benchmark doesn't seem appropriate since the goal is to not have AWS control most of the block producers.

P.S. I'm not suggesting you mean to use AWS as the benchmark, I just felt like this point should be made explicit.

The point was that if AWS datacenter to datacenter round trip latency was already more than 1s between 2 points in the world, it would have been enough to prove the CPS point, because it's close to the best case connectivity wise (independently from the centralization issue). But that's not the case, so more data is needed. I didn't mean to say anything else, you are interpreting, so I think it was appropriate, to show that more data is indeed needed. See my original quote:

For example this AWS datacenter to datacenter round trip latency map does not seem to be enough to prove the point:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not this week, because involved in other things, but I can assure you, we have plenty of latency and propagation data. Not only general latency but actual mainnet block propagation times as transmitted and received via Ouroboros mini protocols.
And we have it for 2.5 years of history, covering a bunch of different normal and extraordinary network situations. For small and large blocks, with no, up to max script execution units.
The previously showed gantt chart here is just one visualization for one block. I'm happy to invite to a workshop call where we go through some of these data points, computed averages for predefined edge cases etc.


Considering that most stake pools are competing over 1% or less in fees, these are big numbers. The obvious solution for the remote pool is to move its block producer to a server housed in USA or Europe. This illustrates not only the centralisation problem created, but also the reduction in security that flows from running a block producer on someone elses computing hardware.

## Examples
1. This block was produced by my locally controlled block producer in Australia. [It was a full block 86.57kB in size, containing 64 transactions, and 66.17kB of scripts](<https://cexplorer.io/block/c740f9ce8b25410ddb938ff8c42e12738c18b7fd040ae5224c53fb45f04b3ba0>)

These are the delays (from beginning of the slot) before each of _my own relays_ included this block in their chains by extending their tips:

- Relay 1 (ARM on same LAN) --> Delayed=0.263s
- Relay 2 (AMD on adjacent LAN) --> Delayed=0.243s
- Relay 3 (ARM approx 5 Km away) --> Delayed=0.288s
- Relay 4 (AMD Contabo vps in USA) --> Delayed=2.682s
- Relay 5 (ARM Netcup vps in USA) --> Delayed=1.523s

The above block delay values were obtained using [this script](<https://github.com/TerminadaPool/cardano-node-debian/blob/main/bin/cn-monitor-block-delay>) running on each relay.

The [average propagation delay by nodes pinging such data to pooltool was 1.67 seconds](<https://pooltool.io/realtime/11169975>)

Fortunately on this occasion no other block producer was leader for the subsequent slot. But, if there had been they probably would not have received my block in time and consequently would have produced their block upon the previous one creating a fork.

Note that this example is worse than average. Maybe the general internet was more congestion than usual on this occasion? [Pooltool reports the following average propagation delays for TERM pool](<https://pooltool.io/pool/08f05bcfaada3bb5c038b8c88c6b502ceabfd9978973159458c6535b/metrics>):
- Producer = 1.141s
- Receiver = 0.722s

2. [2nd last block produced by TERM pool - Australia](<https://cexplorer.io/block/43a4d4cfe86fd577f6445eeabe5ad35e31620ce9393e436a6317bba2ee95d463>) - [pooltool average propagation time = 1.38s](<https://pooltool.io/realtime/11175932>)

3. [Last block produced by TERM pool - Australia](<https://pooltool.io/realtime/11176750>) resulted in a "fork battle". Note that pooltool reported the average propagation time for this TERM block at 0.87s. IOGP pool was leader for the next slot but it obviously did not receive the TERM block in time and consequently produced its block upon the previous creating a fork. Unfortunately for TERM pool IOGP pool's block had the lower VRF value so it won the "fork battle" and TERM pool's block was orphaned.

4. [Last block produced by JSP pool - Japan](<https://cexplorer.io/block/3dd07cd06fc4ba282e8f45e1a37328a1238646caa71ef5f85c360573b9544c9c>) - [pooltool average propagation = 1.08s](<https://pooltool.io/realtime/11171397>)

5. [Last block produced by AICHI pool - Japan](<https://cexplorer.io/block/7205814b191344b5d3292acdd5ee8394ff5e338e4a869d51501915d4d7a7dad3>) - [pooltool average propagation = 1.55s](<https://pooltool.io/realtime/11171393>)

## Goals
Cardano should live up to its [11 blockchain tenets](<https://iohk.io/en/blog/posts/2024/10/11/the-11-blockchain-tenets-towards-a-blockchain-bill-of-rights/>) proposed by Prof Aggelos Kiayias, in particular T8 and T11 which speak to treating participants fairly without asymmetries.

## Possible Solutions
1. Modify how stake pools calculate their slot leadership by including ```mod 2``` or ```mod 3``` in the formula so that only every second or third slot is a possible leader slot. Then adjust other parameters so that the Cardano network still produces a block every 20 seconds on average.

A consequence of this change would be an increased number of true "slot battles", where two pools are awarded the exact same slot. However such battles are fairly settled by preferring the lowest block VRF and cannot be gamed by a malicious actor.

2. Increase the slot duration to 2 or 3 seconds. However this could have consequences for dapps that have assumed a slot will always be 1 second.

## Considerations
1. What is a reasonable expectation for block producers housed across the world to achieve in terms of network delays? Is 2 seconds of network delay enough time before block producers get penalised, or would 3 seconds be more appropriate?

2. What internet infrastructure and geographical location does Cardano consider as a minimum requirement in order to fairly contribute to its ledger?

3. It would seem to be an advantage to encourage fair participation from people residing in countries on the other side of the globe because this might provide resilience against political or other attacks localised to Europe or USA.

4. Would there be any impact on the Ouroboros 5 second delta parameter?

5. Would it be appropriate to make the window in which the block VRF tie breaker rule is applied also the same 2 or 3 seconds so that only blocks with the exact same slot number are deterministicly settled using the block VRF, otherwise the node would prefer its current block? Such a change would remove the ability of malicious actors to deliberately cause a fork with the previous block when they know their block VRF is lower.

6. What effect could any change have on the solution to [issue #2913: Consensus should favor expected slot height to ward off delay attack](<https://github.com/IntersectMBO/ouroboros-network/issues/2913>)?

### Arguments against correcting this unfairness
1. It doesn't matter where the block producer is warehoused because block production is like a virtual service that can be run from anywhere. What really matters is ownership of the pledge and stake, not ownership of the computing hardware. If you live on the other side of the world, just rent a virtual server in Frankfurt or Los Angeles for your block producer.
- Centralising Cardano infrastructure to data centres potentially hands control over the software, as well as selective control over block propagation between nodes, to BigTech data centre owners.

2. The internet infrastructure is centred in USA and USA is more politically stable and less likely to have its internet infrastructure compromised through acts of war. If conflict between USA, China, and Russia ensues then the undersea cables to Japan, Australia and New Zealand could get damaged. Therefore it makes sense that Cardano block production should be slightly advantaged in USA and slightly disadvantaged in Japan, Australia and New Zealand.
- Japan, Australia and New Zealand are very politically stable and it might actually be a good idea to _fairly_ incentivise participation from people living in those areas. If internet connectivity was to be affected by cyber warfare, the targets of such attacks may not be predictable today.
- The bottom line is that if I have 0.001% of stake in the system then I should get 0.001% of access to the system. Likewise with scaling that up, if good actors possess 51% of the stake then they should have 51% of the control, as this is a fundamental assumption protecting Cardano. I hope that the Cardano community won't step away from this fundamental mathematics based approach in favour of some sort of subjective human inference about where they think block producers should be warehoused for political or other reasons.

3. Internet transmission is improving so in the future 1 second might be sufficient for block propagation across the entire globe.
- This unfairness problem doesn't seem hard to fix. If / when internet speeds improve then it should be straightforward to recalibrate the software again.

4. Won't halving or one-third'ing the number of potential leader slots reduce the number of blocks and therefore reduce the throughput?
- Possible solution 1 also involves adjusting other parameters so that the target rate of block production remains unchanged. Actually, it seems likely that the realised throughput would slightly increase. The reason for this is that [possible solution 1](#Possible-Solutions) would eliminate forks caused by 1 second propagation delays which currently are wasted throughput.

5. Anyone running a stake pool should use an Enterprise Ethernet True Business Grade point-to-point Fibre Service with High Class of Service (CoS). IE: A 1:1 contended service with guarantees around contention, frame delay, and a few other parameters. If you do this fram Australia then you should be able to get your block propagation times to less than 1 second.
- Well maybe by spending an extra $800/month on a high CoS fibre link it might be possible to achieve almost as low block propagation times as someone in USA/EU using a cheap consumer grade home internet connection costing $50/month. So I can either pay an extra $800/month to get less fork battles or I can save on the up front network connection costs but lose a similar amount of money in rewards due to the extra fork battles. Either way, a disadvantage persists if I house my pool in a geographically decentralised area such as Australia. The way to rectify this competitive disadvantage is to warehouse my BP in a data centre in USA/EU.

6. If my pool is located close to the majority then I am benefitting from this unfairness so why should I vote for this to be fixed?
- See section: [Goals](#Goals). Hopefully most people in Cardano want to build a fair system for everyone no matter where they live.

## Copyright
This CIP is licensed under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode).

****