symbolic opcodes #52

mmcloughlin · 2024-03-08T19:50:46Z

I am interested in generating semantics for symbolic opcodes, for example where fields corresponding to immediate values or shift amounts are variable. Symbolic register indexes may be interesting as well.

Is this something you've considered already?

I have a work-in-progress PR that demonstrates feasibility. It would need some cleanup but if you agree with the general approach I'd be happy to work on preparing it for review. Please see the PR description for more details on the changes.

mmcloughlin#1

Thanks!

ncough · 2024-03-11T13:17:37Z

Hi, thanks for giving this feature a shot! Funnily enough, this is something we have been playing around with for the last few weeks. We began with an approach much like yours, but ran into issues as things became more symbolic. In particular, the current partial evaluator is dependent on bitvector widths and loop bounds being concrete. It's not clear to me whether you'll also run into these issues in your use case, though I suspect shifts may create some problems.

If so, we have sidestepped the problem through an analysis to identify when these concrete values are necessary and a subsequent transform to partially duplicate the program for each concrete possibility. The branch is here https://github.com/UQ-PAC/aslp/tree/taint, with the relevant stages in libASL/symbolic_lifter.ml and libASL/req_analysis.ml. This branch attempts to treat the opcode as entirely symbolic, to effectively extract a lifter, so it also includes changes to support symbolic register indexes.

Either way, we'd be keen for an interface for mixed symbolic/concrete opcodes as you suggest and an implementation based on your approach is definitely feasible.

mmcloughlin · 2024-09-23T03:28:46Z

@ncough Please accept my belated thank you for your reply!

A lot has changed since March, so I'm wondering what your recommendation would be now for implementing an interface that can generate semantics for partially symbolic opcodes? The case I'd be most interested in is when immediate values or shift amounts are symbolic.

I can revive my attempt from earlier this year, but I am curious if the new infrastructure you've developed makes this potentially easier? I know you now have the capability to generate entire lifters in other backend languages, but for our use case we are still ideally looking for concise semantics for small families of opcodes.

Thank again.

ncough · 2024-09-25T23:29:03Z

Thanks for coming back to this @mmcloughlin, we are still interested in this work.

The situation is roughly the same as before. To reiterate, your earlier attempt will be the easiest approach as long as the assumptions made by the partial evaluator still hold for your symbolic opcode, e.g., all derived bitvector widths and loop bounds are concrete. If these assumptions do not hold for a given symbolic opcode then the partial evaluator will crash. For the most part, the symbolic fields you are interested in should not invalidate these assumptions, but it may be a trial-and-error process to determine this for certain.

mmcloughlin · 2024-09-25T23:36:40Z

Thanks for the update @ncough! I will likely take a look at reviving this very soon.

mmcloughlin · 2024-10-05T03:39:59Z

Anecdotally, it appears to work exactly as you'd hope for some instructions, but others give horrible output. The function DecodeBitMasks seems to be the culprit in the problematic cases so far, and looking at it I'm not surprised.

Experimental PR mmcloughlin#4 adds support for an "opcode template" syntax. So you get for example:

ASLi> :sem A64 0x122:9|sh:1|imm:12|0xa4:10
Decoding instruction A64 0x122:9|sh:1|imm:12|0xa4:10
bits ( 64 ) imm__3 ;
if eq_bits.0 {{ 1 }} ( sh [ 0 +: 1 ],'0' ) then {
imm__3 = ZeroExtend.0 {{ 12,64 }} ( imm [ 0 +: 12 ],64 ) ;
}  else {
if eq_bits.0 {{ 1 }} ( sh [ 0 +: 1 ],'1' ) then {
imm__3 = ZeroExtend.0 {{ 24,64 }} ( append_bits.0 {{ 12,12 }} ( imm [ 0 +: 12 ],'000000000000' ),64 ) ;
}  else {
assert FALSE ;
}
}
__array _R [ 4 ] = add_bits.0 {{ 64 }} ( __array _R [ 5 ],imm__3 ) ;

On the other hand, 0x124:9|N:1|R:6|S:6|0xa4:10 leads to very nasty semantics due to logic immediate encoding, which triggers DecodeBitMasks.

I realize there may not be much to be done here, but I'm curious if you have any suggestions?
I notice that you implemented an override for HighestSetBit. I wonder if an alternative version of DecodeBitMasks would be friendlier for ASLp. The spec actually ships with a AltDecodeBitMasks function, so that might be a place to start.

ncough · 2024-11-27T01:48:04Z

Sorry for the delay here. Unfortunately, I haven't really got a good suggestion for this. If you can override DecodeBitMasks to an equivalent representation that is better suited to your use case, then that might help. I believe we did something similar for the offline partial evaluation implementation in the past, using AltDecodeBitMasks as you suggest. Alternatively, you could also introduce a new primitive to implement some sub-expression of DecodeBitMasks (or the entire thing) and support that primitive in your use case.

mmcloughlin mentioned this issue Mar 11, 2024

aslt changes + grammar + testing #53

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

symbolic opcodes #52

symbolic opcodes #52

mmcloughlin commented Mar 8, 2024

ncough commented Mar 11, 2024 •

edited

Loading

mmcloughlin commented Sep 23, 2024

ncough commented Sep 25, 2024

mmcloughlin commented Sep 25, 2024

mmcloughlin commented Oct 5, 2024 •

edited

Loading

ncough commented Nov 27, 2024

symbolic opcodes #52

symbolic opcodes #52

Comments

mmcloughlin commented Mar 8, 2024

ncough commented Mar 11, 2024 • edited Loading

mmcloughlin commented Sep 23, 2024

ncough commented Sep 25, 2024

mmcloughlin commented Sep 25, 2024

mmcloughlin commented Oct 5, 2024 • edited Loading

ncough commented Nov 27, 2024

ncough commented Mar 11, 2024 •

edited

Loading

mmcloughlin commented Oct 5, 2024 •

edited

Loading