-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binutils encoding/match verification for all instructions description. #258
Conversation
…22ac94 git-subtree-dir: ext/binutils-gdb/include/opcode git-subtree-split: eabd522ac949549199802378ee3a04cd55dc3275
…ils-gdb/include/opcode'
git-subtree-dir: ext/binutils-gdb/opcodes git-subtree-split: ddce56a70c12095b8f72c2f48ad4194df9a62386
Signed-off-by: Afonso Oliveira <[email protected]>
Unfortunately, we can't add these files in the repo since they are GPL licensed. Perhaps we can use a submodule with a shallow fetch to avoid the space overhead? |
Signed-off-by: Afonso Oliveira <[email protected]>
Signed-off-by: Afonso Oliveira <[email protected]>
@dhower-qc I updated it, but I believe that the shallow fetch will have to be at local submodule initialization. |
Thanks. I've asked our legal to team to review this setup, just to be sure that we aren't inadvertently tainting UnifiedDB with GPL. I'll get back when I have their opinion (should be a few days). |
Great! If we also want to add this to the regression test, I have that done with just one thing missing. I was not able to add pyyaml to singularity neither through pip install nor apt-get install -y --no-install-recommends python3-yaml. Do you think you can set that up if this goes through? |
For libraries, I've been installing them outside the container so we don't have to rebuild it every time a dependence changes. You'll see in bin/setup that it runs We should be able to replicate this for python. |
Signed-off-by: Afonso Oliveira <[email protected]>
Signed-off-by: Afonso Oliveira <[email protected]>
Added the regression test part. |
Signed-off-by: Afonso Oliveira <[email protected]>
4b52917
to
f7d3af6
Compare
#269 should fix this regression test failing |
Sorry for the run-around; I added some files in #270 in order to support python dependencies better (also needed by pre-commit that @kbroch-rivosinc is adding). With that, you can run |
I'll wait for your merge into main and then rebase to remove the python part. |
Signed-off-by: Afonso Oliveira <[email protected]>
9e8a4c9
to
3496283
Compare
Signed-off-by: Afonso Oliveira <[email protected]>
Just an update: I have heard back from our legal team, but they have more questions. Appreciate your patience while we sort it out. |
Thank you for the feedback. This already functional with the newest PRs, but let me know when and if you get the greenlight so I make it mergeable. |
@dhower-qc any news on this? |
Yes, the advice is that:
In summary, yuck. I wonder if there is a way to compare instead again LLVM? @lenary, any thoughts on this? |
You also definitely don't want to bring in all of LLVM, so whether we're comparing against LLVM or against binutils, I think the best approach is to use environment/configure variables to point at an external source/build tree (maybe both?) which testing can use, without modifying that tree. This is how the LLVM test suite works for e.g. SPEC which is a proprietary test suite (and others). As for actually testing in this manner against LLVM, there are a few ways forward. Probably the best is to dump the info from the RISC-V tablegen descriptions into JSON (this would have to be done with I would need to more closely examine the RISC-V tablegen output, to make sure it has what we need for validation, but I believe it should, having frequently read the expanded tablegen information. This does not solve the problem of "is binutils correct" at all. Another approach is that, given Clang/LLVM and GCC/Binutils aim to be compatible with each other, to use the assembler ( |
Yuck :) I believe this is of great utility for the UDB and I truly would like to find a way of fitting it without harming the UDB with the whole licensing spaghetti, maybe keeping it outside the repo (as @lenary described) and just adding some external reference would be possible? I believe we will also have this problem with LLVM, no? Even with the more permissive licensing I guess we will have to have some cautions. I've actualy looked into LLVM to find a way of doing this before, but due to my lack of knowledge I couldn't find what I needed and dropped it.
A question on this, how would I be able to dump all the instructions out of this disassembler, any files I should look for? I've tried getting the info from LLVM but dropped it due to my (non-existing) progress, I believe the first pointer is already enough for me to give it a second shot, I'll keep you guys updated on how it goes. |
With an LLVM source tree, and having built
This will get you absolutely everything in the tablegen files for RISC-V (and some of the target-independent definitions), so is about 180MB. The tablegen programming model is a bit weird, classes/instances are more for prototypical inheritance than java-style inheritance, and things can inherit from multiple classes and get the properties of all of them. For a first clue, the The encoding information is in the Similarly, the Extension info is in instances of |
(You asked for "dump all the instructions out the disassembler", which the closest way to do so is the tablegen information, via json - normally tablegen generates C++ which is invoked by other hand-written C++) Part of my envisioning with the second assembler/disassembler approach, is that you use the database to generate valid or invalid bitpatterns (in an object) or instructions (as text), and pass those in to the tools, and look at the output. This sort of approaches the problem in more of a black box manner, but also has throughput difficulties if you want to scan the entire <=32-bit instruction space. |
Closing this due to the licensing problems. |
I added subtrees for binutils to have the necessary information. I know this adds a lot of unnecessary information, but the other options were submodules (which is around 3gb) or copy pasting the the files(which is not updatable).
The script is using only binutils to generate all the encondings. Warnings are instructions that the script can't parse (yet), and errors are for differences between binutils and the yamls. (Which surprisingly exist on the DB right now).
Should I add this to the regression test?