Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission for smtinterpol #45

Merged
merged 9 commits into from
Jun 18, 2024
Merged

Submission for smtinterpol #45

merged 9 commits into from
Jun 18, 2024

Conversation

jhoenicke
Copy link
Member

@jhoenicke jhoenicke commented May 15, 2024

The smtinterpol submission.

@jhoenicke jhoenicke self-assigned this May 15, 2024
@bobot bobot added the submission Submissions for SMT-COMP label May 21, 2024
@bobot
Copy link
Contributor

bobot commented May 21, 2024

Hi @jhoenicke , could you rebase your submission? For simplicity instead of being in the tests, the submission summary is now presented as a specific comment.

@jhoenicke jhoenicke force-pushed the jhoenicke-smtinterpol branch from 17c006b to 40bb385 Compare May 22, 2024 00:51
@jhoenicke
Copy link
Member Author

@bobot, I'm not sure about the Docker requirement. Is the solver meant to be run with docker run. Because this adds something like 4 seconds overhead just for running solver -version.

Since we have a java application as a portable jar archive, is it okay if our tar.gz submission contains only the jar file and we just assume someone installed java 8 or higher on the path, or do we need to include java in the archive, which is about 300 MB and also architecture specific?

Our docker file is currently just using the public java docker eclipse-tumerin:8 and then copying the jar file and the start script to some directory. Is this what you had in mind?

@mpreiner
Copy link
Contributor

@jhoenicke As far as I understand the Dockerfile is only for building the solver binaries.

@bobot
Copy link
Contributor

bobot commented May 24, 2024

The Dockerfile mention were an idea that we forgot to remove in the last iteration of the rules. I'm really sorry . The binaries must already be built. The computers are Ubuntus, we are going to provide the exact name of the image today. I will make an announcement later today.

@jhoenicke jhoenicke changed the title Create smtinterpol Submission for smtinterpol May 26, 2024
@jhoenicke jhoenicke marked this pull request as ready for review May 26, 2024 09:56
@mbromber
Copy link
Contributor

mbromber commented May 27, 2024

JFYI: Your solver link does not seem to work. I get the following error:

Forbidden

You don't have permission to access this resource.

Maybe it is only accessible from your university's intranet?

@jhoenicke
Copy link
Member Author

Fixed. I always have to set permission manually and forgot to set it.

@mbromber
Copy link
Contributor

Thanks. It works now.

@jhoenicke jhoenicke force-pushed the jhoenicke-smtinterpol branch from 9e28d35 to fef3875 Compare May 31, 2024 09:06
Copy link

github-actions bot commented May 31, 2024

Summary of modified submissions

SMTInterpol

  • 14 authors
  • website: https://ultimate.informatik.uni-freiburg.de/smtinterpol
  • Participations
    • UnsatCore
      • Arith
        • all
      • Equality
        • all
      • Equality+LinearArith
        • all
      • Equality+NonLinearArith
        • all
      • QF_Datatypes
        • all
      • QF_Equality
        • all
      • QF_Equality+LinearArith
        • all
      • QF_Equality+NonLinearArith
        • all
      • QF_LinearIntArith
        • all
      • QF_LinearRealArith
        • all
      • QF_NonLinearIntArith
        • all
      • QF_NonLinearRealArith
        • all
    • SingleQuery
      • Arith
        • all
      • Bitvec
        • all
      • Equality
        • all
      • Equality+LinearArith
        • all
      • Equality+MachineArith
        • ABV
        • AUFBV
        • AUFBVDTLIA
        • AUFBVDTNIA
        • AUFBVDTNIRA
        • UFBV
        • UFBVDT
        • UFBVLIA
      • Equality+NonLinearArith
        • all
      • QF_Bitvec
        • all
      • QF_Datatypes
        • all
      • QF_Equality
        • all
      • QF_Equality+Bitvec
        • all
      • QF_Equality+LinearArith
        • all
      • QF_Equality+NonLinearArith
        • all
      • QF_LinearIntArith
        • all
      • QF_LinearRealArith
        • all
      • QF_NonLinearIntArith
        • all
      • QF_NonLinearRealArith
        • all
    • ModelValidation
      • QF_ADT+BitVec
        • all
      • QF_ADT+LinArith
        • all
      • QF_Bitvec
        • all
      • QF_Datatypes
        • all
      • QF_Equality
        • all
      • QF_Equality+Bitvec
        • all
      • QF_Equality+LinearArith
        • all
      • QF_Equality+NonLinearArith
        • all
      • QF_LinearIntArith
        • all
      • QF_LinearRealArith
        • all
      • QF_NonLinearIntArith
        • all
      • QF_NonLinearRealArith
        • all
    • Incremental
      • Arith
        • all
      • Bitvec
        • all
      • Equality
        • all
      • Equality+LinearArith
        • all
      • Equality+NonLinearArith
        • all
      • QF_Bitvec
        • all
      • QF_Equality
        • all
      • QF_Equality+Bitvec
        • all
      • QF_Equality+Bitvec+Arith
        • all
      • QF_Equality+LinearArith
        • all
      • QF_Equality+NonLinearArith
        • all
      • QF_LinearIntArith
        • all
      • QF_LinearRealArith
        • all
      • QF_NonLinearIntArith
        • all

jhoenicke added 3 commits May 31, 2024 12:05
- New version uploaded
- Added Contact
- Added new BV logics to model-validation track
martinjonas pushed a commit that referenced this pull request Jun 10, 2024
#84: Create cvc5-cloud
#74: Draft STP submission
#70: draft yicesQS submission
#68: Create STP-CNFLS
#66: Yices2 SMTCOMP 2024 Submission
#65: Z3-alpha draft PR
#64: Solver submission: cvc5
#63: submission iProver
#61: OSTRICH 1.4
#60: SMT-RAT submission
#57: Amaya's submission for SMT-COMP 2024
#55: plat-smt submission
#54: Add 2024 Bitwuzla submission.
#53: 2024 solver participant submission: OpenSMT
#52: Z3-Noodler submission
#51: Submission Colibri
#45: Submission for smtinterpol
#42: Adding Algaroba to SMTCOMP 2024
@martinjonas
Copy link
Contributor

@jhoenicke We have executed the latest version of smtinterpol on a randomly chosen subset of 20 single-query benchmarks from each logic where it participates. The benchmarks are also scrambled by the competition scrambler (with seed 1). You can find the results here: https://www.fi.muni.cz/~xjonas/smtcomp/smtinterpol.table.html#/table

Quick explanation:

  • Green status means that the result agrees with the (set-info :status _) annotation from the benchmark.
  • Blue status means that the benchmark has annotation (set-info :status unknown).
  • By clicking on the result (e.g. false, true, ABORTED, …) you can see the command-line arguments with which your solver was called and its output on the benchmark.
  • By clicking on the benchmark name (i.e., *scrambled*.yml), you can see the details of the benchmark including its contents (by clicking on the file link in input_files) and the name of the original bennchmark before scrambling (e.g., # original_files: 'non-incremental/AUFBVFP/20210301-Alive2-partial-undef/ph7/583_ph7.smt2').

Please check whether there are some discrepancies, such as missing/extra logics, unexpected aborts or unknowns, and similar. If you update the solver, let me know and I can execute further test runs. We still have plenty of time for several follow-up test runs.

@martinjonas
Copy link
Contributor

@jhoenicke Here are also results of the test run on a small subset of incremental benchmarks: https://www.fi.muni.cz/~xjonas/smtcomp/smtinterpol_inc.table.html#/table

The status column has the following meaning:

  • DONE (X correct): the solver terminated successfully and correctly decided X (check-sat) queries before that.
  • SOMETHING_ELSE (X correct) (e.g., TIMEOUT (10 correct) or ABORTED (10 correct)): The solver did not terminate successfully, but correctly decided X (check-sat) queries before it crashed or was killed.
  • WRONG: There was at least one answer to (check-sat) that did not agree with the status specified in the benchmark.

Again, if there are discrepancies (missing/extra logics, unexpected results), please let me know.

@jhoenicke
Copy link
Member Author

@martinjonas Thanks for the preliminary runs. Do you by any chance already have the error.log files?

@martinjonas
Copy link
Contributor

@martinjonas Thanks for the preliminary runs. Do you by any chance already have the error.log files?

I can obtain them if you want. Do you need SingleQuery, Incremental, or both?

@martinjonas
Copy link
Contributor

Do you need SingleQuery, Incremental, or both?

Nevermind, I got them both. :-)

Single-query: https://www.fi.muni.cz/~xjonas/smtcomp/processed_results/smtinterpol/smtinterpol.2024-06-12_14-26-36.files.tar.gz
Incremental: https://www.fi.muni.cz/~xjonas/smtcomp/processed_results/smtinterpol_inc/smtinterpol_inc.2024-06-12_14-29-11.files.tar.gz

@jhoenicke
Copy link
Member Author

jhoenicke commented Jun 12, 2024

It looks like the zip file in the archive are all empty. I would suspect that most of the BV benchmarks trigger an output to error.log. And all runs should create at least an empty error.log file. Is this working? Could this be a permission problem, that we can't write to the docker image?

@martinjonas
Copy link
Contributor

It looks like the zip file in the archive are all empty. I would suspect that most of the BV benchmarks trigger an output to error.log. And all runs should create at least an empty error.log file. Is this working? Could this be a permission problem, that we can't write to the docker image?

Ah, it seems that there is some problem of our configuration of cluster for returning the generated files. The file gets created but is not returned from the computation nodes to the server. I will look into that.

@martinjonas
Copy link
Contributor

@jhoenicke We have finished test runs of model-validation and unsat-core generation tracks. You can find the results here:

As before, you can click on the status of the benchmark to see the output of your solver. If you find any discrepancies or extra/missing logics, please let me know.

Note that we selected only SAT benchmarks for model validation and only UNSAT for unsat-core generation. As a result of that, some logics do not contain any benchmarks. So do not be surprised if you have subscribed to one of these logics and you do not have any result for it. In particular, the logics are:

For model-generation: QF_UFFP, QF_UFBVDT, QF_UFDTNIA, QF_NIRA.

For unsat-core: UFBVDT, FPLRA, QF_AUFBVFP, QF_ABVFPLRA, QF_UFBVDT, QF_UFDTNIA, QF_UFDT, QF_NIA, QF_SNIA.

@jhoenicke
Copy link
Member Author

Hi Martin, thanks for the logs. I found two problems:

  1. the links to the yml-Files are not working, so it's impossible to see which benchmark it is.
  2. it seems that for model-validation, the model is not checked at all. It reports correct for QF_ABV/scrambled63358.smt2, even though the model production crashed.

@martinjonas
Copy link
Contributor

martinjonas commented Jun 13, 2024

1. the links to the yml-Files are not working, so it's impossible to see which benchmark it is.

Thanks for the report. It should be ok now.

2. it seems that for model-validation, the model is not checked at all. It reports correct for QF_ABV/scrambled63358.smt2, even though the model production crashed.

Yes, this is true. We are just running the solver to check whether it can be executed and not doing any extra post-processing yet. The true in the table corresponds to the result of the (check-sat) result. I see why this can be confusing and it would be better to show ERROR in this case. I will check whether we can do something about that.

@jhoenicke
Copy link
Member Author

I updated the solver binaries for our final submission.

@martinjonas martinjonas merged commit 91ed77f into master Jun 18, 2024
5 checks passed
@martinjonas martinjonas deleted the jhoenicke-smtinterpol branch June 18, 2024 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
submission Submissions for SMT-COMP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants