-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
development of unit test to verify bm1366 is able to find a solution for a known block #167
base: master
Are you sure you want to change the base?
development of unit test to verify bm1366 is able to find a solution for a known block #167
Conversation
… valid block. Include script for block validation.
…root_be needs to be reversed. confirmed by validation script as well as unit test running on real hardware (bitaxe 205 ultra, bm1366)
….merkle_root_be needs to be reversed. confirmed by validation script as well as unit test running on real hardware (bitaxe 205 ultra, bm1366)" This reverts commit b922e3e.
After further testing it seems there is no bug in construct_bm_job and the current code is correct. This means something else is wrong with my unit test. I need to further investigate. |
ok, it seems the merkle_root copied from blockchain.info needs to be in reversed order. It's not yet fully clear to me why. However, unit test is working again and is verifying the nonce and version again ;-) |
The way the nonce range is split up across the chain using the bits in the chip address doesn't really make sense to me. It almost seems as if we can't set a single chip to hash the entire 32 bit nonce range? While I don't think it affects mining currently, it would be nice to understand this more. |
As known already, the chip is somehow using its chip address to calculate the nonce range to hash.
I see overlapping nonce ranges (results) if I just incrementally increasing the chip address bit by bit. So IMHO a single chip is just hashing 1/128 of the entire 32 bit nonce range as it's using the chip address to calculate that range (as kind of suffix/prefix?). That makes kind of sense as the chip is designed to be in a chain, One design goal would be to avoid overlapping nonce ranges. A simple way is to use the unique chip address for the same. I wonder if there is a register to configure this somehow.
|
Are you sure it's fixed to 1/128th of the 32bit nonce range? What about hashboards that have chains with less chips? The S19k Pro only has 77 BM1366 chips in the chain. |
Is there anywhere a hashboard dump/log of the init sequence of the S19k Pro available?
No it is (hopefully) not fixed. IMHO there should be a way to configure it, perhaps a config register or something. See my previous posts, on my Bitaxe I see a complete different nonce range when I change the chip address from 0xC0 to 0xC2 but I'm getting an overlapping nonce range when I change the chip address from 0xC0 to 0xC1. As you can see below, both nonce ranges for chip address 0xC0 and 0xC2 are overlapping with the nonce range of chip address 0xC1. But the nonce range of chip address 0xC0 and 0xC2 are not overlapping at all. That's why I assume the chip is configured to split up the entire 32 bit nonce range into smaller unique chunks, one chunk per chip address (0x00 ... 0xFE), as outlined in your session with NebulaMiner
Nonce 2752546077 -> is in nonce range of chip address C1 and C2 |
I don't have one yet, but it's definitely on my todo list. Maybe I can get that this week 🤞
that would make sense.. I really hope such a thing exists.
Yes, that's what Nebula was going over in the session; the nonce range is split up with even addresses. 0xC1 is odd and so it would be in the middle of the range of 0xC0. The question is how do these ranges get setup to be 2? Can we change that to be a much bigger range? Ideally all 32 bits since we only have one chip on the Bitaxe. Maybe it's hardcoded and they just don't hash a big part of the range on the kPro? Back on the S17, each chip would hash the entire 32bit range (and then loop around), starting with the address. It was up to the miner to send new work in time so as not to do redundant hashing. I had hoped that was how it worked on the XP, but it doesn't appear to be the case. I suspect this change is because now version is being incremented internally. |
Here is the kPro parsed hexdump and the Saleae Logic file |
Sure enough, it sends 127 setaddress commands, incrementing by two. Even though you can clearly see in the previous sequence only 77 chips respond. |
Thanks Skot for sharing the hexdump. It was very helpful and I assume I got something working. I compared the s19k-pro (77 chips) hexdump with another hexdump of a s19xp-luxos (110 chips). Both init sequences are very similar. Well the s19xp is doing some magic autotune stuff while the s19k-pro was loading some preloaded/predefined PLL0 parameters. All in all, same well known procedure .... almost all parameters the same. Beside different Io driver strength configuration, I noticed a register write to register "10", as per the documentation in your bm1397 repo, register 10 is called "Hash Counting Number".... Well interesting, why would something write to a counter register......, I would expect the control board to read it. S19k-pro: S19xp-luxos:
Setting any of the bits 21..31 to high makes the hashing very sloooow. It seems it prevent the chip internal parallel processing somehow. Setting not all 20bits to high, gives me a different nonce range pattern per chip address. It's not yet fully clear to me what pattern is used. Considering the fact that S19k-Pro with 77 chip is using a 6bit mask and S19xp-luxos with 110 chips a 5 bit mask, seems to confirm that the S19k-Pro is hashing a bigger nonce range per chip address as S19xp-luxos. Anyway, the duration for a full 2^32 nonces and 2^16 version rolling on my Bitaxe took (689521-2281/1000) 687,24 sec Below, the serial log of my session. It's somewhat pretty to see how the Bitaxe is hashing thru the entire 32bit nonce range from lowest to highest version until rollover ;-) Next stop would be to get it implemented into ESP-Miner ;-)
|
Ooo! this sounds promising. I'm going to have to take a closer look at your writeup tonight. In the meantime here is a XP dump. pretty sure this is from stock firmware. |
Thanks Skot for sharing the dump. I'll read it tomorrow. Now I need to take some rest It's already after midnight in my timezone ;-) |
ok, thanks once again for sharing the XP hexdump. As I can see, the XP is using 55 AA 51 09 00 10 00 00 15 1C 02 to init register 10. That's the same as being used in Bitaxe firmware as of today. It seems the XP hexdump was used to develop the chip init Bitaxe is using (it follows the same structure/pattern). To recap what we have so far:
It's interesting that XP stock firmware and luxos using different pattern to init reg. 10 on the XP (for the same amount of chips). So it seems it's not just the amount of high bits what matters. Needs further testing and more hexdumps with different amount of chips (if available). |
Ok, it seems I was wrong yesterday (maybe I was too tired last night). I updated my post above to avoid confusion. After more testing today, it turns out that bitmask seems to only 4bits:
|
…pdate documentation
…t_nonce_space" This reverts commit d9f26b1.
…ss in log as it might be confusion.
@skot IMHO this PR is completed and ready for your review. While I was working on this PR I noticed some interesting stuff I would like to share as well.
I hope my comments in the source code are sufficient to get you an understanding what I did or was aiming for. If not, just let me know and I will rework this PR. Btw, I did not change bm1366_init to use BM1366_set_nonce_mask because it's messing up the job timing (BM1366_FULLSCAN_MS). It might be needed to rework the current job timing to adapt dynamically for different nonce range sizes instead of using a magic number. |
@MoellerDi I just want to confirm that the implication of your findings is that right now BitAxes will never find the correct nonce, is that correct? Would you mind updating your forked repo with the .bin releases? |
Can I ask how it would be possible to set this? #A chip with chip address < 0x80 calculate only nonces ending with 2,4,6,8,0. Chip with chip address of >= 0x80 calculate nonces ending with 1,3,5,7,9. Except register 0x10 is configured to combine the entire 32 bit nonce range into 1 single search range and assign it to chip address 0x00 (like in this unit test). I am interested in testing this I have a few devices and would like to set one to < 0x80 one with >= 0x80 and one with 0x10 I could not find where this would be possible any pointers? Thanks. |
No, that not correct. The goal of this PR (unit test) was actually to prove Bitaxe is able to find the correct nonce (and version). However, it seems it is not testing the full nonce space in a single chip configuration. |
There is no easy setting in the current firmware to change the chip address if that's what you are looking for. The chip address is assigned during the init phase of the chips/chain. If you want to change it, you would need to build a custom firmware for that. A good starting point would be the code used in this PR and/or the dev branch in my forked repo. |
I don't think there's any advantage to searching more of the (sub) space - it won't increase the chance of finding a block. Each hash has the same probability regardless where it is in the nonce space. |
Actually there is ranges that produce blocks more often than not. can be found here in the nonce pattern topic from a few years back. Bitmex research also did a good paper on this. I believe it may be a bitmain ASIC thing but interesting none the less. https://www.reddit.com/r/Bitcoin/comments/adddja/the_weird_nonce_pattern/ There was most definetly a range of nonces that produced strange outcome. My questions is this.. If we are able to set the range and we have more than once device would it not be better to ensure each one is trying a various range. I can't speak for all miners but I know some larger operations and the do not start at 0 nonce. |
There cannot be nonce ranges that produce more blocks than others, otherwise mining would be inherently broken. That reddit article concluded that. |
This pattern is just a selection bias. They find more golden nonces in that pattern because that's all they are checking. |
hey guys, you should have pinged me on this Issue ;) To clarify this "nonce space splitting" with Bitmain chip, here under is what I am almost (95%) sure of. There are 3 different stages of splitting for a chain (a chain is equal to a Hashboard) :
Split 1: this is done by the logical chip address given by the CB FW at the begining to every asic in the chain. It is a 8 bits space (256 values) that each ASIC will add (on the 8 msb of the 32 bits) as an offset on the first nonce value to begin hashing. So each ASIC in the chain will start at a different offset. It does not restrict the full nonce space hashing as if you let the chip run for enough time, it will hash the full nonce space (2^32 values). So in a single ASIC config (Bitaxe mono chip) we can use any Chip Address, as long aas we let enough time to hash. In a multi ASIC config (BitaxeHex, ...) we have to make sure to split evently by providing Chip Addresses with increment of round(256/num_asic) and let the Chain run for 1/increment the time of a single chip; waiting more time will make each asic hash header already hashed by the previous asic on the chain... It was the purpose of my "hex" branch from 9 month ago. I will rebase this work on the current "hex_v302" branch soon. When the number of ASIC on the chain is not a power or 2, there is always a "left over" at the end of the 32 bits nonce space that is never hashed by the chain, except if we let the chain hash for more time, the last ASIC in the chain will cover this "left over space" at the cost to have all other ASIC in the chain double hash space done by next-by ASIC... (I don't know if it ivery clear) Split 2: each ASIC knows (by HW design) how many cores it has, so each cores start hashing at a different offset, this is all harcoded in the ASIC. But each ASIC has a formula to extract from the nonce value which cores founded it. For ASIC that does not have a power of 2 count of cores, the same effect as for "Split 1" appear for the "left over space"... Split 3: each cores have a fixed number of "small cores" (by HW design). We have to make 2 cases for chip with (BM1366/68) or without (BM1397) version rolling :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will conflict with the "hex_302" branch, so the merge in master should be done knowing that fact
|
||
unsigned char init7[7] = {0x55, 0xAA, 0x40, 0x05, 0x00, 0x00, 0x1C}; | ||
_send_simple(init7, 7); | ||
//set chip address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is exactly what I was talking about
it was the purpose of my "hex" branch from 9 month ago.
this have to be done also for other chip, not only BM1366. BM1397 and BM1368 may also be used in multi chip design in the furture
static int _calculate_chip_number(unsigned int actual_chip_count) | ||
{ | ||
int i = 0; | ||
if(actual_chip_count == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the algo for this function may be written using interger algebra instead of this if else if else if structure.
I don't really understand the final usage of it, but as it is only for the added test (will not be used by actual mining), why not.
This was my understanding too, based on what we have talked about. I have seen this to be the case on the BM1397. I think what @MoellerDi is saying though is this isn't the case on the 1366.. it seems like those chips start hashing at the nonce offset provided by the address, but then do not continue to 2^32.. it sounds like they wrap around much before that. Have you specifically seen version rolling chips hash nonces all the way from address offset to 2^32? This would be ideal in the single chip situation like the Bitaxe. |
No, never see it specifically, i can't confirm nor invalidate that... |
based on the results of my testing (using this unit test) I cannot confirm this. I see the BM1366 is hashing/looping the same nonce space over and over again.
yes, I noticed the logical chip address is somewhat used to filter the nonce space but I'm not sure if it's actually an offset on the 8 msb. I noticed that a chip with logical chip address 0x00 (like current bitaxe firmware) is producing nonces ending with, 0,2,4,6,8. If a logical chip address >= 0x80 is assigned, it's producing nonces ending with 1,3,5,7,9. using chip address: 0x00:
using chip address: 0x80:
Anyway, this PR/code needs - for sure - more development (I'm just a hobbyist, not a developer) but it's IMHO a good starting point. It shows reg 0x10 is somewhat changing the way how the chip is selecting/filtering the nonce range. FYI, I made kind of proof-of-concept firmware in my repo (dev branch) if you like to test it out. It's functional firmware supporting full 32 bit nonce range hashing including dynamic job scheduling. It's running for 1-2 weeks on my bitaxe w/o issues. |
I am always very cautious with these bit handling, there are so many way to get it wrong just by endianness error !
Register 0x10 is Hash Counting Number : https://github.com/skot/BM1397/blob/master/registers.md#hash-counting-number Can you elaborate about this effect of this register? maybe it is just the source of your nonce space problem. |
Current BM1366 code is writing 0x0000151C in this HCN register : https://github.com/skot/ESP-Miner/blob/master/components/bm1397/bm1366.c#L511 Maybe we should write another value due to the fact we have a single chip in the Chain.... Can you try to write 0 and see if it solve your issue ? To be noted that neither BM1397 (Max) nor BM1368 (Supra) change this HCN default value of 0x0000_0000... |
That's already done. The unit test in this PR is actually writing a value based on chip count. It try to select a none range as big as possible.
Writing a zero to reg 0x10 will result in no hashing at all. What issue are you referring to? I do not have any issue, I just want to share what I got. The purpose of this unit test in this PR is to verify that the firmware/bm1366 chip is able to find the correct nonce/version combination as well as the correct hash for block #839900. It shows how I used reg. 0x10 to configure the bm1366 chip to use to full 32 bit nonce range. Key part is "void BM1366_set_nonce_mask(int chip_count)".As you can see I'm using a magic number 0x15FF to calculate the offset like outlined below.
|
The intention of this PR is to share my findings with the dev-team
and to get a possible bug addressed.I'm
wasworking on a unit test to proof my Bitaxe is actual able to find correct nonce and version for any given stratum notification. As I didn't have time to wait for years ;-) I took data from an already resolved block (merkle_root, ntime) and mixed it with the related stratum notification from public-pool.io targeting the same block. Block #839900 was selected for my unit test but any given block where stratum notification as well as correct and valid merkle_root and ntime is known should do.During the unit test implementation, I noticed that the bm1366 chip is actually not hashing the whole nonce space but only a small part of it. It turns out the chip is (IMHO) using its chip address to divide the nonce space into smaller portions (this was also highlighted at: https://youtu.be/6o92HhvOc1I?t=5710). It seems it is also doing so even if a single chip is not in a chain with others, as my unit test shows.
Anyway, in order to proof the Bitaxe is able to find a solution of any given stratum notification, the unit test is changing the chip address of a single bm1366 on-the-fly during the test to emulate kind of a bigger chain and to jump into all the different nonce spaces. This was working fine and it took just a couple of minutes to hash thru the whole nonce space for the already confirmed working merkle_root. Sadly no success, the expected nonce/version combination did not show up.
I was wondering why the bm1366 still was not able to find the correct nonce (and version), so I double checked the code referenced by the unit test
and found a possible bug in construct_bm_job (to be confirmed by one of the core devs). I used a script (also included in this PR) to verify hash values and found the merkle_root only needs endian words swapped and should not be in reversed order. Anyway, a suggestion for a fix is included in this PR and applied to my code,my unit test was finally able to find the correct nonce and version and my goal was achieved. If I find time I might try to develop a kind of chip address switching feature to actually cover the whole nonce space on my Bitaxe.Below some selected output from my unit test, a longer version of the log is included in this PR.
Update, as I reverted my changes to construct_bm_job, this unit test is not yet fully functional and I need to rework this stuff
Update2, it seems the merkle_root copied from blockchain.info needs to be in reversed order. It's not yet fully clear to me why. However, unit test is working again and is verifying the nonce and version again ;-)
Nonce 3529540887 and version 0x2a966000 are according to blockchain.info the correct solution for block #839900 and was found by a pool called "Poolin" two days ago.
Possible follow-up / next steps (IMHO):
the code used to calculate midstate hashes for bm1397 based hardware should be re-checked and verified as I noticed my "fix" is causing the bm1397 related unit test "validate bm job construction" to FAIL. Sadly I do not have any bm1397 based hardware available to test/verify this on my own.Disclaimer: This PR was developed and tested on my Bitaxe Ultra 205 only and needs further testing on other supported hardware.