-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make gateware-flash "FAILED!" on TinyFPGA BX #137
Comments
FWIW, my theory is that it actually does write all the bytes, but it doesn't read all the bytes, due to the way that In your results, it looks like what is read back is offset by 32 bytes, compared with what was written... ... but maybe there is also a problem with short writes too. Ewen |
It seems to be some issue with the gateware without the firmware that is causing issue... |
There is some type of weird wrapping going on here?
|
@zrecore We've previously found (at least in older) Of note, (1024*176)-180192 = 32 Ie, the file it's writing out is just short of 176KiB, and that's itself just short of 44 * 4096 KiB blocks. So I wonder if something about the file length is provoking this problem (eg, that files that are a multiple of some internal block work better). It's definitely interesting that there's a "wrap around" pattern in your output. That might possibly be caused by repeated byte sequences in the file to be written, or data read back out of flash from a previous attempt that succeeded/failed). The offset where it's failling ( Ewen |
Looks like that's reproducible -- the upload image for
We probably need to fix both those issues, and then see what else is left as issues (I think there's a read timeout issue as well as a write timeout, so I think we probably have to fix both of those). Ewen Testing here:
|
I did wonder about the timing when I later saw a tweet about you working on To the best of my recollection Also from the code reading last night, very early FTR, I emailed TinyFPGA yesterday to ask about the choice of 1.0s timeouts. I've not heard back yet, but given they were originally 0.2s early on, it feels like 1.0s was just "5 times larger" / "mostly don't hit timeouts now". Rather than any specific reason for 1.0s as a short timeout. (My current hunch is 5.0s for read timeout, and 10.0s for write timeout would make more sense.) Ewen
|
I need to dig out my tinyfpga-bx from a box somewhere to test myself. Probably be a couple of days before that happens... |
After some experimentation, I'm pretty much convinced that there's an issue with writing partial "minor sectors". If I detect the minor sector, append enough 0x00 bytes to fill out the "minor sector", and then write/read, the write of the So that confirms my hunch that it's failing on the very last write (if it doesn't fail due to an earlier timeout, which still happens about 1 time in 10 for me; I've not yet adjusted the timings). Of note, the way that these writes are being done is in I've not yet traced the program flow enough to understand where there's an issue with writing shorter sectors. Based on your trace it almost looks like it gets the address wrong or something like that. Also of note, the progress bar on programming for me seems to proceed at about 35KB/s, which implies that in practice it's not limited to 9600bps :-) Ewen PS: For reasons I don't understand
where that trace() routine is a hack I added to my local copy to get some trace-to-file-with-arguments functionality to be able to debug this:
|
@mithro, It turns out that The relevant SPI flash command command (0x02) says:
which means we inherit all of the quirks of writing the page from the SPI flash, including:
So that exactly explains the "wrap around" behaviour that you were seeing: if we give the flash an address which is a partial sector address, offset into the sector, the SPI flash chip itself will wrap it around the end of the sector to the start. From that I conclude that somewhere in the partial sector write case we're ending up passing the wrong address to the flash chip. I'm unclear if that's in the Ewen |
FTR, in the failing mode, this is what
as dumped with this kludge, plus some of my other kludges (but the sector padding disabled):
injected into The corresponding write failure for that run is:
in case we want to try to align individual bits of the hex dumps. I started trying to read the verilog which handles the USB serial to SPI bridge, but it's a twisty maze of nested state machines, with no documentation, so it's a bit hard to be certain what's happening at any given point from a casual glance. Given it's trivial to pad to a full sector in Python (and I discovered we should pad to the "erased" value -- ie, Ewen |
In this write command (from above):
and the remainder of the sector is the data to write, here So I think the write command itself, as issued by Of note, the read report in that request above gave 12 * 16 octets back (192) rather than 14 * 16 octets back (224), which implies maybe we got a short read. But at a glance the first 192 octets did match; it was just short in that case. Versus some of your other cases where it was offset. So maybe we're triggering edge behaviour in either the TinyFPGA bootloader or the SPI flash with partial sectors. Ewen PS: If you want to play with it, https://github.com/ewenmcneill/TinyFPGA-Bootloader/tree/partial-sector-write-debug-hacks has a push of this kludgy debugging code as I was testing with it. In
|
Trying to figure out what I can do to contribute a fix. Fiddling around w/ the ewencmneill/TinyFPGA-Bootloader version of tinyprog...
|
@zrecore Your results look like you're getting a short read back, at the end. Interestingly it seems to be on the final sector too. I didn't do anything about changing the timeouts in my experiments (on the partial-sector-write-debug-hacks branch), and because I was hoping to hear back from TinyFPGA in reply to my email about the timeout choice. I never did hear anything back in response to my email, and from what I could see they were originally 0.2s/0.2s read/write timeouts, which is painfully too short, and became 1s/1s read/write timeouts just to make them longer. So I think we could bump those timeouts up a bit more without breaking anything. You might want to try playing with the serial timeouts, and see if that makes a difference. My gut feeling is 2s read timeout, 5s write timeout make sense (longer for write, as a write timeout causing a partial write is more fatal). I've been meaning to turn that experiment branch into an actual cleaned up pull request (to always pad to a full sector/write a full sector, since there seems no reason not to do so -- the whole sector is always erased above, and would be erased prior to any subsequent rewrite). And probably also increase the timeouts too. But life has intervened the last couple of weeks, and I've not even plugged my TinyFPGA BX in at all... maybe in the next week. Making an actual pull request/merging it, would potentially allow us to do an actual release with a fix (mithro's time permitting; IIRC he's at a conference at present). And that'd also make it easier to test if it helped. Ewen |
Either the TinyProg-BootLoader state machines, or the SPI flash chip used on the TinyFPGA BX, seems unable to *properly* program a partial minor sector (ie, writing less than 256 octets), leading to a failed write/verify cycle. This seems to happen at the end of any programming that is not an exact multiple of 256 octets. Since the whole 256 octets is already erased in program_sectors(), we pad out the short write to a full 256 octets (with 0xff, which is the value read for "erased", to reduce wear on on the flash cells). For more detail on diagnosing this, see: timvideos/litex-buildenv#137
I made an actual pull request with a cleaned up version (and longer timeouts): tinyfpga/TinyFPGA-Bootloader#52 It seems to work for me (the pull request has a bunch of test results). @zrecore FYI, I noticed when I was doing that that the previous kludge branch I'd pushed had an " Ewen |
make_gateware_output.txt
make_gateware-flash_output.txt
It seems the final write doesn't actually write all bytes, as read comes back missing some final bytes.
The text was updated successfully, but these errors were encountered: