Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo example test case #146

Merged
merged 19 commits into from
Jan 22, 2024
Merged

Demo example test case #146

merged 19 commits into from
Jan 22, 2024

Conversation

ailrst
Copy link
Contributor

@ailrst ailrst commented Dec 4, 2023

This adds the example loosely based on http_parse_basic as a test case with correct and incorrect examples. Only 3/8 of the tests pass due to the short timeout. SystemTests now also runs all boogie files present in the test directory instead of just the output file, to support overriding generated definitions.
Those examples are also setup to use a recursive Makefile to lift them.
There's also some more example specs included for the different memcpy implementations I've tried that didn't work, a version of this can be converted to a test case eventually too.

Comment on lines +17 to +20
// because named wrong in generated file
function {:inline} in_bounds64(base: bv64, len: bv64, i: bv64) returns (bool) {
(if bvule64(base, bvadd64(base, len)) then (bvule64(base, i) && bvult64(i, bvadd64(base, len))) else (bvule64(base, i) || bvult64(i, bvadd64(base, len))))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is named wrong in the generated file, that's a bug that should be fixed? But this also implies that the test is using boogie-use-lambda-stores which is not enabled for the tests by default? If you want to run the tests with that, you should probably set up a new test suite or something?

@l-kent
Copy link
Contributor

l-kent commented Dec 5, 2023

How long a timeout is needed for these tests to pass?

I like having the makefiles to lift these new tests, and it would be good to generalise that to all the other tests, but it could probably be improved so you don't need to put a makefile in every variation folder? Just one in the parent where you define which variations we want for that example? There's also a bunch of stuff relating to the Boogie in the makefiles which doesn't seem like right place for that.

@ailrst
Copy link
Contributor Author

ailrst commented Dec 13, 2023

How long a timeout is needed for these tests to pass?

With --boogie-use-lambda-stores it only takes two seconds, without its e.g. 2.24 minutes for incorrect/malloc_memcpy_strlen_memset_free/clang_O2 on my laptop. The other two incorrect ones are just under 30 seconds.

I like having the makefiles to lift these new tests, and it would be good to generalise that to all the other tests, but it could probably be improved so you don't need to put a makefile in every variation folder? Just one in the parent where you define which variations we want for that example? There's also a bunch of stuff relating to the Boogie in the makefiles which doesn't seem like right place for that.

I have an idea to clean up the makefiles, the boogie stuff is convenient for testing things more quickly than modifying the test code.

GIT_ROOT?=$(realpath ../../../../)

GCC ?= aarch64-linux-gnu-gcc
CLANG ?= clang
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous script this was clang-15 so we should maintain this unless we want to update to the latest version and update all the examples to match (I don't want to do that right now unless we find something interesting with the newer version). The default package named 'clang' on ubuntu is clang 10 or something I think?

C_SOURCE ?=$(realpath $(wildcard *.c))
SPEC ?=$(realpath $(wildcard *.spec))
EXTRA_SPEC ?=$(realpath $(wildcard *.bpl))
BASIL_FLAGS ?= --boogie-use-lambda-stores
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be the default? You may want this for testing the lambda stores, but make it its own thing

Comment on lines 35 to 37
clean: cleanbin
rm -rf $(NAME).bpl
rm -rf $(NAME)_result.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this should be the default clean behaviour, because running the tests via the make files is not the main way to do so, but lifting the tests via the make file is. The default make behaviour is lifting only, which also doesn't match up with this clean behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be in the repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't want this in the repo

@l-kent
Copy link
Contributor

l-kent commented Jan 12, 2024

This seems to lift the clang variant (just that one, not clang_o2 etc.) to x86 binaries instead of armv8.

@l-kent
Copy link
Contributor

l-kent commented Jan 12, 2024

You need to update the new test cases for the clang variation, and I'm still not quite sure what the intention of the new test cases are? Is it to be run with --boogie-use-lambda-stores flag? If so, it should probably be in its own test suite that uses that flag? Are we expecting any of it to pass right now? In the near future?

All the makefile stuff seems good now.

@ailrst
Copy link
Contributor Author

ailrst commented Jan 15, 2024

You need to update the new test cases for the clang variation, and I'm still not quite sure what the intention of the new test cases are? Is it to be run with --boogie-use-lambda-stores flag? If so, it should probably be in its own test suite that uses that flag? Are we expecting any of it to pass right now? In the near future?

Its not about use-lambda-stores, this is an example we've used a few times in demos so i want to make sure it keeps working. Everything should pass, however 3 time out with the 10 sec vc time limit, these all pass with a 1 minute time-limit.

All of the clang tests are failing at type-check because of a mistranslation, mov w0, wzr is getting translated as Var("R0", Imm(64)), Int(0, 32)) and then basil translates that directly to R0, Gamma_R0 := 0bv32, true;. This should be fixed eventually probably. Also, 4 of these are the 'incorrect' cases passing spuriously, so we should also fix the test cases so it actually looks for the assertion failed message rather than just not containing "verified" updated system tests to do this.

@l-kent
Copy link
Contributor

l-kent commented Jan 16, 2024

Its not about use-lambda-stores, this is an example we've used a few times in demos so i want to make sure it keeps working. Everything should pass, however 3 time out with the 10 sec vc time limit, these all pass with a 1 minute time-limit.

We probably don't need all the variants then, perhaps just one? Even then, it might make sense to put in its own test suite if it needs a longer time limit? It doesn't make sense to keep it in the test suite as it currently is if it will always time out.

All of the clang tests are failing at type-check because of a mistranslation, mov w0, wzr is getting translated as Var("R0", Imm(64)), Int(0, 32))

I don't get this when I lift it, it seems like you're using an old version of BAP/the ASLi plugin?

@ailrst
Copy link
Contributor Author

ailrst commented Jan 16, 2024

I don't get this when I lift it, it seems like you're using an old version of BAP/the ASLi plugin?

Seems like it

@l-kent
Copy link
Contributor

l-kent commented Jan 18, 2024

We probably don't need all the variants then, perhaps just one? Even then, it might make sense to put in its own test suite if it needs a longer time limit? It doesn't make sense to keep it in the test suite as it currently is if it will always time out.

Any thoughts on this bit?

@ailrst
Copy link
Contributor Author

ailrst commented Jan 18, 2024

I do want to make sure a future change doesn't make the other variants break, and if we find a way to improve the encoding so it is easier to verify then it would be nice to have the tests around to see if there is a perf improvement; but for that purpose it would probably be better to create a new test suite, I just don't have time to do that at the moment. Currently on my machine (i5-13600H) the -O2 variants verify in under 10s so we could keep those. Ideally I want to increase the timeout and run the tests in parallel so they complete in a reasonable amount of time.

                                           verifyTime
                                               
solver                 compiler                      
axioms                 clang                   20.695
                       clang_O2                 9.043
                       clang_no_plt_no_pic     20.557
                       clang_pic               19.829
                       gcc                     11.360
                       gcc_O2                   5.328
                       gcc_no_plt_no_pic       10.835
                       gcc_pic                 18.564
theory                 clang                   48.496
                       clang_O2                10.189
                       clang_no_plt_no_pic     49.063
                       clang_pic              120.582
                       gcc                     12.110
                       gcc_O2                   7.862
                       gcc_no_plt_no_pic       12.148
                       gcc_pic                 19.841
theory-non-extensional clang                   48.716
                       clang_O2                10.183
                       clang_no_plt_no_pic     47.989
                       clang_pic              120.600
                       gcc                     12.218
                       gcc_O2                   7.920
                       gcc_no_plt_no_pic       11.914
                       gcc_pic                 20.004

@l-kent
Copy link
Contributor

l-kent commented Jan 22, 2024

I agree with all that, that makes sense to do eventually.

@l-kent l-kent merged commit e14bdcc into main Jan 22, 2024
1 check passed
@l-kent l-kent deleted the demo-example-test-case branch January 22, 2024 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants