Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Script to generate instr_dict.json from riscv-opcodes using UDB data #328

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BrianAnakPintar
Copy link
Contributor

This is a draft PR for #300

I included a instr_dict.json which I generated from riscv-opcodes which has data as of Nov 28, 2024

There's 2 main scripts involved here.

  1. generate_instr_dict.py: This script generates a file called data.json which is equivalent of the instr_dict.json from riscv-opcodes but using the data from UDB.
  2. sorter.py: This script generates 2 files which sorts data.json and instr_dict.json to make it easier to diff the 2 files.

Known issues:
I use this site to diff the 2 files and here are some immediate issues

  1. Sometimes there are more extensions in UDB than there are in riscv-opcodes. For example the instruction add.uw. Iirc, this is because 1 extension encapsulates the other extension.
  2. Missing instructions Vector Load/Store Segment Instruction mnemonics are missing #314
  3. Unknown ways to get variable_fields. I'm unsure on how to handle some cases like imm where it can range from either immxx where xx is just the size of the range but also deal with cases where imm turns into bimm12hi and bimm12lo for instructions like beq. There are more cases of these.
  4. rori contains incorrect encoding. I'll open a separate pull request for this.

These are currently the roadblocks that I am facing and I can't seem to resolve them. Feel free to give feedback or modify the script if necessary.

There's also some redundant functions such as find_first_match which I previously used but no longer require, but I kept it for now in case anyone finds it useful. I also tried adding types to most of these functions to make it easier to follow through.

@AFOliveira
Copy link
Collaborator

I think this a great step towards making the UDB an even better source of information!

Answering you points:

  1. I think this is supposed to happen, maybe if we are trying to replicate exactly what riscv-opcodes generates, we can simply exclude it on the script - IDK if there are any advantages in doing so.
  2. This is supposed to be fixed as soon as Add V extension missing isntructions #316 is merged.
  3. Variable description is still a WiP and therefore all this names might change soon. However, if you want to do it for current version: riscv-opcodes has a mapping in which it does the translation of those fields into more readable (and parseable) names - https://github.com/riscv/riscv-opcodes/blob/9226b0d091b0d2ea9ccad6f7f8ca1283a3b15e88/constants.py#L88C1-L148C1. Moreover, you can either check https://github.com/drom/riscv/blob/master/lib/fieldo.js for more info or my own draft PR in this repo Added parse.py, constants.py and Makefile modifications to generate yaml files.  #21 to see how I did the mapping from riscv-opcodes to current format- might help in encrypting back to riscv-opcodes.
  4. This is a great catch. Thanks!

On another note, when this turns into a non-draft PR, I don't think that .json file should be merged in.

As I previously said, this is truly a great step towards the right direction, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants