Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symbolic opcodes #52

Open
mmcloughlin opened this issue Mar 8, 2024 · 6 comments
Open

symbolic opcodes #52

mmcloughlin opened this issue Mar 8, 2024 · 6 comments

Comments

@mmcloughlin
Copy link

I am interested in generating semantics for symbolic opcodes, for example where fields corresponding to immediate values or shift amounts are variable. Symbolic register indexes may be interesting as well.

Is this something you've considered already?

I have a work-in-progress PR that demonstrates feasibility. It would need some cleanup but if you agree with the general approach I'd be happy to work on preparing it for review. Please see the PR description for more details on the changes.

mmcloughlin#1

Thanks!

@ncough
Copy link
Collaborator

ncough commented Mar 11, 2024

Hi, thanks for giving this feature a shot! Funnily enough, this is something we have been playing around with for the last few weeks. We began with an approach much like yours, but ran into issues as things became more symbolic. In particular, the current partial evaluator is dependent on bitvector widths and loop bounds being concrete. It's not clear to me whether you'll also run into these issues in your use case, though I suspect shifts may create some problems.

If so, we have sidestepped the problem through an analysis to identify when these concrete values are necessary and a subsequent transform to partially duplicate the program for each concrete possibility. The branch is here https://github.com/UQ-PAC/aslp/tree/taint, with the relevant stages in libASL/symbolic_lifter.ml and libASL/req_analysis.ml. This branch attempts to treat the opcode as entirely symbolic, to effectively extract a lifter, so it also includes changes to support symbolic register indexes.

Either way, we'd be keen for an interface for mixed symbolic/concrete opcodes as you suggest and an implementation based on your approach is definitely feasible.

@mmcloughlin
Copy link
Author

@ncough Please accept my belated thank you for your reply!

A lot has changed since March, so I'm wondering what your recommendation would be now for implementing an interface that can generate semantics for partially symbolic opcodes? The case I'd be most interested in is when immediate values or shift amounts are symbolic.

I can revive my attempt from earlier this year, but I am curious if the new infrastructure you've developed makes this potentially easier? I know you now have the capability to generate entire lifters in other backend languages, but for our use case we are still ideally looking for concise semantics for small families of opcodes.

Thank again.

@ncough
Copy link
Collaborator

ncough commented Sep 25, 2024

Thanks for coming back to this @mmcloughlin, we are still interested in this work.

The situation is roughly the same as before. To reiterate, your earlier attempt will be the easiest approach as long as the assumptions made by the partial evaluator still hold for your symbolic opcode, e.g., all derived bitvector widths and loop bounds are concrete. If these assumptions do not hold for a given symbolic opcode then the partial evaluator will crash. For the most part, the symbolic fields you are interested in should not invalidate these assumptions, but it may be a trial-and-error process to determine this for certain.

@mmcloughlin
Copy link
Author

Thanks for the update @ncough! I will likely take a look at reviving this very soon.

@mmcloughlin
Copy link
Author

mmcloughlin commented Oct 5, 2024

Anecdotally, it appears to work exactly as you'd hope for some instructions, but others give horrible output. The function DecodeBitMasks seems to be the culprit in the problematic cases so far, and looking at it I'm not surprised.

Experimental PR mmcloughlin#4 adds support for an "opcode template" syntax. So you get for example:

ASLi> :sem A64 0x122:9|sh:1|imm:12|0xa4:10
Decoding instruction A64 0x122:9|sh:1|imm:12|0xa4:10
bits ( 64 ) imm__3 ;
if eq_bits.0 {{ 1 }} ( sh [ 0 +: 1 ],'0' ) then {
imm__3 = ZeroExtend.0 {{ 12,64 }} ( imm [ 0 +: 12 ],64 ) ;
}  else {
if eq_bits.0 {{ 1 }} ( sh [ 0 +: 1 ],'1' ) then {
imm__3 = ZeroExtend.0 {{ 24,64 }} ( append_bits.0 {{ 12,12 }} ( imm [ 0 +: 12 ],'000000000000' ),64 ) ;
}  else {
assert FALSE ;
}
}
__array _R [ 4 ] = add_bits.0 {{ 64 }} ( __array _R [ 5 ],imm__3 ) ;

On the other hand, 0x124:9|N:1|R:6|S:6|0xa4:10 leads to very nasty semantics due to logic immediate encoding, which triggers DecodeBitMasks.

I realize there may not be much to be done here, but I'm curious if you have any suggestions?
I notice that you implemented an override for HighestSetBit. I wonder if an alternative version of DecodeBitMasks would be friendlier for ASLp. The spec actually ships with a AltDecodeBitMasks function, so that might be a place to start.

@ncough
Copy link
Collaborator

ncough commented Nov 27, 2024

Sorry for the delay here. Unfortunately, I haven't really got a good suggestion for this. If you can override DecodeBitMasks to an equivalent representation that is better suited to your use case, then that might help. I believe we did something similar for the offline partial evaluation implementation in the past, using AltDecodeBitMasks as you suggest. Alternatively, you could also introduce a new primitive to implement some sub-expression of DecodeBitMasks (or the entire thing) and support that primitive in your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants