Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bounty; performance] Remove fast block propagation/transaction processing from p2pool #19

Closed
jtoomim opened this issue Dec 5, 2018 · 37 comments

Comments

@jtoomim
Copy link
Owner

jtoomim commented Dec 5, 2018

Introduction

P2pool has a bunch of unnecessary code in it for handling transactions. This code is the main source of performance issues in p2pool code, and causes shares to propagate and process slowly, thereby increasing p2pool's share orphan/DOA rates and reducing mining fairness. This transaction handling code also makes p2pool unable to scale to large block sizes.

The core of the problem is that p2pool encodes transactions using a special 2- or 3-byte encoding scheme for transactions that are included in multiple shares, and this encoded list gets included in the share's metadata OP_RETURN commitment. This encoded transaction list is redundant information, as the transactions are already committed via the merkle root hash.

The encoding scheme is to use a (share_height_delta, transaction_index) pair to tell p2pool where it can find the TXID for that transaction (by lookup in a previous share). Using this encoding scheme, p2pool was able to transmit shares (which can be decoded into full blocks) faster than pre-Xthin/Compact-Blocks bitcoind was able to transmit blocks. In the p2p layer, nodes will keep a dictionary of txid:transaction mappings, and will send all of the transactions it receives to each of its peers to ensure that the recipient can decode a share into a block as soon as the share is received, without needing to request any additional transactions over the network.

Now that we have schemes like Compact Blocks and Graphene (written in C++), p2pool's python-based fast-block propagation is entirely obsolete. Having p2pool do processing on a per-transaction basis means that the system's performance is limited by the speed at which python code can process transactions relative to p2pool's 30-60 sec share interval, and that performance constraint is far tighter than Bitcoin's limit of the speed at which C++ code can process transactions relative to Bitcoin's 600 sec bock interval.

Overview of changes

The changes to be made fall into four main categories: Changes to the share data structure, changes to block reconstruction, changes to the p2p protocol layer, and (optionally) adding a mechanism for block/share reconstruction and validation that is asynchronous and outside the latency-critical path.

Share structure changes

The core of the change is that two fields in the share_info data structure need to be removed: new_transaction_hashes and transaction_hash_refs. There's a lot of code in various places that touch these variables, like in Share.generate_transaction(...), all of which needs to be removed or replaced. Most of it is in p2pool/data.py.

This is a hard forking change. Because of the extent of the changes, and because so much of the change will be code deletion instead of addition, it may be better to roll out this change in a backward-incompatible fashion, and just start a new share chain instead of using the typical hard fork upgrade mechanism. But if you think you can do it as a more typical hard fork, that's cool too.

Block reconstruction

As the list of transactions in a share will not be stored in the same way, some changes might be necessary to the getblocktemplate, stratum, and block reconstruction code to make sure that all of the information needed to reassemble blocks is still retained.

P2P changes

After the share changes have been made, it will no longer be necessary for peers to send transactions to each other. This involves the update_remote_view_of_my_mining_txs(...) function; the update_remote_view_of_my_known_txs(...) function; the have_tx, losing_tx, forget_tx, and remember_tx messages, all in p2p.py; and some code in work.py for keeping track of transactions in getblocktemplate results.

Asynchronous validation

[Not yet written]

Bounty

This description is not yet complete. I'll add more information later.

I am offering a $5000 total bounty for this project. Milestone payments are an option.

@dfoderick
Copy link

I don't understand the task yet, so let me ask some questions to get started.
Is this the forest data structure? or something else?
Currently, p2pool is getting tx from p2pool p2p network or from node getblocktemplate? Is that the source of the inefficiency?
What is the proposed solution? To replace the p2pool tx propagation with a call to get compact block from the node instead?
Basically, what would I need to look at to understand the problem?

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 20, 2018

Basically, what would I need to look at to understand the problem?

@dfoderick I've updated the original description to answer some of your more general questions.

Is this the forest data structure?

No, not the forest data structure. The forest is for storing the share chain itself. Elements in the forest are individual Share objects. The core modification is in the Share object.

Currently, p2pool is getting tx from p2pool p2p network or from node getblocktemplate?

P2pool nodes get txs from getblocktemplate, and then immediately share those transactions with each of the peers using a remember_tx message. According to the p2pool protocol, all txs must be marked as "remembered" over a TCP peer connection before a share can be transmitted which uses one of those txs as a new_transaction_hash.

Is that the source of the inefficiency?

The inefficiency comes at several levels. A big one is serializing and deserializing the list of varints for the transaction_hash_refs. The transaction_hash_refs and new_transaction_hashes elements also take up a lot of RAM (a few hundred MB on CPython, and a few GB on pypy). Searching previous shares for transactions to see if each transaction can be encoded as a transaction_hash_ref instead of a new_transaction_hash also takes quite a bit of time. Sending transactions to each peer, checking to see which peers each transaction needs to be sent to, and so forth also takes up a lot of both bandwidth and CPU time, though that's generally not in the latency-critical code path.

What is the proposed solution? To replace the p2pool tx propagation with a call to get compact block from the node instead?

Just submit the block to local bitcoind only on the node that actually mined the block, and let bitcoind take care of block propagation.

@kr1z1s
Copy link

kr1z1s commented Dec 21, 2018

The inefficiency comes at several levels. A big one is serializing and deserializing the list of varints for the transaction_hash_refs. The transaction_hash_refs and new_transaction_hashes elements also take up a lot of RAM (a few hundred MB on CPython, and a few GB on pypy). Searching previous shares for transactions to see if each transaction can be encoded as a transaction_hash_ref instead of a new_transaction_hash also takes quite a bit of time. Sending transactions to each peer, checking to see which peers each transaction needs to be sent to, and so forth also takes up a lot of both bandwidth and CPU time, though that's generally not in the latency-critical code path.

According to my estimates one of the most latency-critical code - this is receiving(parsing) transactions from GBT. When mempool large, the processor core running p2pool is loaded on 100%

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 21, 2018

I would like to keep this thread focused on how to implement, not why. Toward this end, I created a separate issue for discussing share/block cross-validation issues, and I deleted/moved comments in this thread that are about the validation issue to that issue. Discussion on the lack of block validation in p2pool should be done in #21.

Repository owner deleted a comment from CryptoManiac Dec 21, 2018
Repository owner deleted a comment from rldleblanc Dec 21, 2018
Repository owner deleted a comment from kr1z1s Dec 21, 2018
@jtoomim
Copy link
Owner Author

jtoomim commented Dec 21, 2018

@kr1z1s GBT parsing is pretty fast now most of the time. I added caching of transactions in GBT a while ago, so each transaction in GBT parsing is just a lookup into a dictionary unless it's being seen for the first time. helper.py:getwork() does the GBT parsing:

2018-12-21 04:32:00.856972 5.139 ms for helper.py:getwork(). Cache: 687 hits 90 misses, 776 known_tx 1 unknown 778 cached

Parsing a GBT request with 100% new transactions takes about 300 ms per MB of block size. That's significant, but far from the biggest problem right now. Both ABC and BU have replacement RPCs under development which will eliminate this problem, but using them will require p2pool to not know all transactions, and to only use the merkle path instead of the full transaction list.

On the other hand, most of the total time for assembling a new stratum job (which is done by work.py:get_work()) comes from calling data.py:Share.generate_transaction(), and the slowest parts of that is creating the transaction_hash_refs stuff (33.4 ms) and doing a bunch of other transaction-related stuff (49.4 ms):

2018-12-21 04:31:59.334079 109.645 ms for work.py:get_work()
2018-12-21 04:31:59.444500 84.052 ms for data.py:generate_transaction(). Parts: 0.405 0.775 33.433 0.065 49.374

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 22, 2018

I posted a profiling logfile from running p2pool on BTC for one day here:

https://toom.im/files/pypyp2poolfeatherlog.txt

Except for the select.epoll call (which is just waiting for an event to process), all of the top 4 time hoggers are calls to pack.py. The #5 hog is p2p.py:208(update_remote_view_of_my_known_txs).

When mempool large, the processor core running p2pool is loaded on 100%

That's because p2pool needs to send each one of the transactions in the block template to its peers, and needs to search previous shares for a previous use of each of those transactions, not because parsing the transactions is particularly slow.

@rldleblanc
Copy link

I've run across some syntax that I'm not familiar with and google isn't being helpful. Please let me know what this is called or help me understand it a little better.

https://github.com/jtoomim/p2pool/blob/master/p2pool/node.py#L237

There are some functions that are named only _ as in def _(arg):. I understand that _ can mean the last value returned, or discard the result, but I'm not sure what it means here. These always seem to have a decorator attached to them, so I suppose it is some sort of shortcut.

Also, which branch should we be working on? I'm using 1mb_segwit on all my pools, but it doesn't seem to be merged into master.

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 22, 2018 via email

@rldleblanc
Copy link

That makes sense now.

Can a miner get lucky enough to solve the same share more than once? Or once a miner submits a share, a new set of work is sent to them (getblocktemplate, etc)? Or do they keep hashing on it hoping to get a better difficulty?

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 23, 2018 via email

@rldleblanc
Copy link

rldleblanc commented Dec 23, 2018 via email

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 23, 2018 via email

@rldleblanc
Copy link

Unless you change the share chain into a share DAG. However, it's tricky to set up the rules for the DAG such that miners have an incentive to always work on the most recent BTC block, and to encourage all miners to always include the shares from other miners. I would be interested in seeing such a proposal. I have a few ideas on this myself. However, I'd rather focus on getting transactions out of share objects before working on DAGs.

From what I understand, the miners are not including any part of the share chain (one of the reasons it can be easy to poison the chain), it is just basically a linked list of shares that meet the share difficulty and the first one wins. The stratum server is the boss and can expire work that the miner is working on and can also basically force the miners to always work on new work.

Technically a miner could ignore the stratum messages and keep working on the same work, but nonce and extranonce would run out pretty quick, then they would have to submit the block to a local bitcoind, etc. Seems like too much work for people who don't even want to run their own bitcoind, so it seems like incentive enough. I don't see it as a big concern. Please help me see where I'm missing things.

I probably won't work on things tomorrow, so I think I'll have the new data model ready with some functions to do some bench marking on mid next week. If it doesn't have the memory or performance profile I expect, then I'll scrap it and just rip out and bandaid the current stuff. I think I can have full 32 MB BCH blocks and sub 10ms being under 200MB of RAM. Since I'd have to touch a bunch of code anyway...

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 23, 2018 via email

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 24, 2018

@rldleblanc via email:

We still need to keep all the transaction data so that we can build a block for submitting to Bitcoind, even if it's not being used for validation? Or can you just send a block header to submitblock with the work id and it will just assume the transactions that were sent in the getblocktemplate request? If it's the latter, then we only need to compute the Merkel and then throw away all the transactions.

Yes, for now we need to keep all the transaction data for our own block templates.

P2pool submits blocks with two methods:

  1. Via the bitcoind p2p protocol
  2. Via the bitcoind rpc protocol

https://github.com/jtoomim/p2pool/blob/1mb_segwit/p2pool/bitcoin/helper.py#L164

IIRC, the p2p protocol submission requires the serialized transactions as a raw string, and the rpc protocol requires the serialized transactions as a hex-encoded string.

Currently, it's set up so that the got_response() function is redefined for each getblocktemplate call in a scope that includes the original candidate share object and the transactions from the getblocktemplate call. This seems like a reasonable approach, so I suggest keeping it.

Something we may want to do when implementing this: instead of keeping the transaction information in the share_info['share_data'] structure (which becomes part of the p2p serialization for the share, and becomes consensus-level data), we may want to just have the list of transactions as an optional regular python attribute of the share object. This way, for our own block templates, we can create share objects that have this share.transactions list, and when we want to submit a block, we can just use that share.transactions list. For shares/blocks we get over the network, we can check to see if hasattr(share, 'transactions') before trying to submit it as a block, and if we don't have that list of transactions, we can either skip it or (in the future) request it from a peer.

In the future, we may add support for block submission interfaces that do not require all the transactions. There are three different bitcoind RPC interfaces for getting and returning work:

  1. Bitcoin Core-derived getblocktemplate and submitblock
  2. Bitcion Unlimited's getminingcandidate and submitminingsolution
  3. Bitcoin ABC's improved getblocktemplate WIP

The first one provides and requires a full list of transactions. The other two methods only provide the coinbase, the header, and the merkle path (same as stratum).

Currently, we're using method 1 in all cases. At some point in the future, we may want to add support for 2 or 3 when they're available. In those cases, we won't need to keep the full list of transactions.

@rldleblanc
Copy link

I believe I have all the transactions removed from the share info and have it done with a soft fork so that we don't have to start a new chain. I just need to set up another peer and test that the peers are no longer passing the transaction data.

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 27, 2018

Seriously? That's possible? Doesn't the code check that the share_info transaction list can regenerate the header's merkle root?

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 27, 2018

You can set up multiple nodes on the same machine by changing the worker port and the p2p port:

pypy run_p2pool.py --net bitcoincash -w 9350 --p2pool-port 9351 -n 127.0.0.1:9348

-n will make it connect to the specified peer. -w sets the worker port and web UI port. --p2pool-port sets the p2pool-to-p2pool inter-node communication listening port.

@rldleblanc
Copy link

The current transactions are stored in bitcoin_data (I think) and it uses that to generate the merkel when it submits to bitcoind, so I just had to go through all the places and branch where it is expecting the transaction data in the share chain based on the share version number. I ran into a lot of hidden spots when reading the shares from disk at startup, but I have several blocks on Litecoin testnet that have transactions in them, so they are passing just fine.

https://chain.so/block/LTCTEST/ad00fef2582031e97eb5485f20a2a9b41ef7af5b09edc9631693a6a3985d2890

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 27, 2018

bitcoin_data is a module. It's p2pool/bitcoin/data.py, as opposed to p2pool/data.py. Forrestv used

from p2pool.bitcoin import data as bitcoin_data

In order to avoid namespace conflict. bitcoin_data.calculate_merkle_link(...) is just a function found in that module.

I'm fairly sure your soft fork attempt is going to fail because of this line:
https://github.com/jtoomim/p2pool/blob/1mb_segwit/p2pool/data.py#L498

If my understanding is correct, that line will cause any nodes not running your code to reject shares from nodes that are running your code. That makes this a hard fork, not a soft fork.

so I just had to go through all the places and branch where it is expecting the transaction data in the share chain based on the share version number

Branching on share version number sounds kinda messy. We will eventually want to delete all of the code for old share version numbers once each of the major p2pools has forked.

@rldleblanc
Copy link

Yes, nodes not running my code will reject the share, but they would reject it anyway because it is an unsupported share version, right? Once enough nodes are upgraded, then the share version will kick in and the transactions will no longer be included, until that time, it runs as it always has.

As for as that line, I changed above in the code so that other_tx_hashes don't include transactions from the share if it is the new version, so far it's passing. I'll add some debug code to make sure that it is actually passing, I'm not hitting the assert a few line above it. That may change when I get this second node up and running (thanks for the info, I was going through the command line options when you sent the message).

I don't like all the branching either, and I'd like to remove all that code even for the older stuff too. I added one property to Node that makes it a lot easier. I hope to have the pull request for you to look at and then we can go over it and see what else is left.

@rldleblanc
Copy link

Yes, you are right, changing the share version number is a hard fork. I was thinking that was a soft fork. Sorry about that.

@jtoomim
Copy link
Owner Author

jtoomim commented Dec 28, 2018

As for as that line, I changed above in the code so that other_tx_hashes don't include transactions from the share if it is the new version, so far it's passing

That does not quite sound correct. In the new code, you still need to check to ensure that the merkle_link plus the coinbase matches the header's merkle root hash value. We just no longer need to verify that the merkle_link matches the other_transaction_hashes.

@rldleblanc
Copy link

When I added the peer node it started failing. I'm working through that chunk of code now.

@jtoomim
Copy link
Owner Author

jtoomim commented Jan 15, 2019

Estimated pool-wide DOA percent on BTC p2pool right now is 33%! That's about 18 PH/s that someone is not getting accurately credited for.

http://ml.toom.im:9332/static/graphs.html?Day

My guess is that someone has a large number of distinct workers on one p2pool node without knowing that p2pool has to do 100-400ms of work for each distinct worker on a node. @rldleblanc et al, if you've been wondering why I've been so adamant about stripping out the tx forwarding code, this is why. This user is probably going to have a bad experience with p2pool after they notice how much revenue they're losing, and they're going to leave p2pool and never come back.

@kr1z1s do you know who this person is? They could probably use some help tuning their setup. Specifically, reducing the number of unique workers per node will help them a lot. (It's okay to have 1,000 S9s on a single node, as long as all of the S9s are mining to the same address. But 10 S9s on different addresses will cause that node's performance to be poor.)

@rldleblanc
Copy link

@jtoomim I understand the problem. The issue I'm seeing is not so much in the tx forwarding as it is the building of the work for the miners since it is going through all the TXs and doing so very inefficiently. I've gone through all the code and got most of it updated for Python 3. I'm going through the unittests right now finding new issues and making sure they all pass. Then I need to run the code to find anything not covered by the unittest. Once that is done, then I'll refactor the get_work code path which should have huge improvements. I'm also implementing slots everywhere so that even if people aren't using Pypy, they will still get some of the benefits. I'm trying to get through it as fast as I can.

@jtoomim
Copy link
Owner Author

jtoomim commented Jan 15, 2019

The tx forwarding is one of the reasons why p2pool goes through all the txes. The merkle link stuff is a second reason.

But yeah, I'm really looking forward to having these issues be resolved. I'd love to be able to mine a 32 MB block on BCH with p2pool, even if it's just on testnet.

@rldleblanc
Copy link

One of the optimizations that Pypy does is implement __slots__ for all classes. Instead of using a dict to store the class variables which is expensive in terms of CPU and memory, you define all the variables that will be used in the class in the __slots__ class variable. When all inherited classes define __slots__, then a dict is not created for class and you save memory and CPU. It also prevents fat fingering a variable and then wondering why things don't work. http://book.pythontips.com/en/latest/__slots__magic.html

@jtoomim
Copy link
Owner Author

jtoomim commented Jan 15, 2019

Slots is mostly for RAM usage, not CPU. I don't think we need to worry about RAM usage any longer, especially on CPython.

@rldleblanc
Copy link

Slots are trivial to add, so I just added them as I went through all the lines of code.

Just as an update for the Python3 upgrade, I've got all the unittest passing in p2pool/test/{bitcoin,utils}. I did have ./run_pool.py --help running first to clean up most of the syntax issues (it's not working right now because I had to separate some things into strings and bytes() for the unittests. So, I just need to get the three unittests in p2pool/test/ passing and then find all the bugs not covered in the unittests. I think I'm on the downhill side of this upgrade (~60% complete I'd guess). I hope to have the Python3 version ready to start testing early next week.

@dfoderick
Copy link

Just as an update for the Python3 upgrade,

Python3 will be a great upgrade. Let me know when you have it checked into a branch. I can do some basic testing with it.

@rldleblanc
Copy link

So, the Upnp code is really old and crusty. Since it's not a critical component, I'm going to disable it for now so that I can get the Python3 migration done. Do you have any heartburn over that?

Once I get things all settled and the big things done, I can go back and work on it. It just uses SOAP and there are no real good SOAP libraries any more. So either you have to install dependencies which seem like overkill for a feature I'm not sure is even used that much. I could be totally wrong though.

@jtoomim
Copy link
Owner Author

jtoomim commented Jan 22, 2019

I don't think many people are using UPNP. Go ahead and remove it.

@rldleblanc
Copy link

@jtoomim Do you want me to temporarily disable it or remove it completely from the code and have it no longer be an option. I just want to make sure I'm clear on your intent.

@jtoomim
Copy link
Owner Author

jtoomim commented Jan 22, 2019

Up to you.

@rldleblanc
Copy link

Posted PR #25 for Python3 upgrade. Sorry it took so long. Had some personal issues come up and P2Pool ate most of the twisted errors making debugging a lot more painful. Please test the code and help me shake out any missed bugs.

@jtoomim
Copy link
Owner Author

jtoomim commented Jul 1, 2019

Bounty has been paid. PR #22 has been rebased, and placed into https://github.com/jtoomim/p2pool/tree/remove_tx_forwarding. I will probably redirect master to this commit soon. Edit: it's now in master.

@jtoomim jtoomim closed this as completed Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants