Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I actually love this paper & work -- but I'd love if the transformation was model count-preserving! #3

Open
msoos opened this issue Feb 25, 2024 · 8 comments

Comments

@msoos
Copy link

msoos commented Feb 25, 2024

Hi,

Long story short, I would love to use this tool as a preprocessor for model counting. I could fix it I guess, but maybe you can do it faster -- I'd need it to be count-preserving. Basically, the count of the formula should not change. I think it should be possible to do, maybe not too hard? I wanna use it as a preprocessor for model counting. It turns out that SBVA can significantly reduce the treewidth of a formula, which can make it a lot more tractable for model counting. See: https://github.com/Laakeri/sharpsat-td and its associated research paper, https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CP.2021.8

If you use this branch: https://github.com/msoos/sharpsat-td/tree/preprocessor then you can get the preprocessor of SharSAT-TD. You need to run that on a formula, the simplified CNF will be written to the console. Then either run SBVA + flowcutter or just flowcutter. You will see that the treewidth is VERY different, up to half on certain formulas. Since the time to run is ~exponential in the treewidth, SBVA could make counting sqrt of the time it takes to count, e.g. instead of 1 million seconds to count, 1000 seconds to count. Kinda neat.

Let me know what you think,

Mate

PS: I'll also make it deterministic, and into a library, and all that, because time-based timeout is non-deterministic, making debugging impossible, a hard requirement for anyone who does industrial work. Also, a library allows it to be re-usable, not only from a bash script (which again is hard to make deterministic, properly abort, etc).

@Laakeri for visibility, since SSTD is his tool
@kuldeepmeel for visibility

@msoos
Copy link
Author

msoos commented Feb 25, 2024

Demo of treewidth-reduction:

git clone https://github.com/msoos/sharpsat-td/
cd sharpsat-td
git checkout preprocessor
mkdir build
cd build
rm -rf CMake* src cmake* ganak* sharp* Make*
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DSTATIC=ON ..
make -j4
./preprocessor mc2022_track1_126.cnf > out
./sbva -i out -o out2

# now compile SSTD again, but this time normal branch
git checkout main
make -j8

# run on preprocessed instance
./sharpSAT -decot 3 -decow 100 -tmpdir . -cs 3500  out
c o Parsed 285 vars, 1695 clauses, and 0 weights.
[stuff]
c o COEF: 8.947 Width: 88

# run on preprocessed + SBVA-preprocessed instance
./sharpSAT -decot 3 -decow 100 -tmpdir . -cs 3500  out2
c o Parsed 400 vars, 787 clauses, and 0 weights.
[stuff]
c o COEF: 156.287 Width: 62

As you can see, treewidth goes from 88 -> 62. Also, SharpSAT-TD recognized that the TD is working much better in the second case, so it increased the coefficient of the TD weight, but that's kind of a side-note, it's obviously a lot better.

BTW, the dual preprocessing could be disabled, but it doesn't matter. There is a significant reduction in the number of clauses, too, but that's expected of SBVA of course. The fun part is the TW (treewidth) reduction.

Find mccomp track1 2022 files here: https://datashare.tu-dresden.de/index.php/s/7iZSNgKT27FEsGX

This specific file is attached here:
mc2022_track1_126.cnf.gz

I hope the above helps in understanding the importance of SBVA for model counting,

Mate

ndrewh added a commit that referenced this issue Feb 25, 2024
This is experimental and untested. (see #3)
@ndrewh
Copy link
Collaborator

ndrewh commented Feb 25, 2024

Hey @msoos, I did a quick patch that adds an option (-c) that should preserve model count. I haven't tested it on your benchmarks or with your solver, but I thought I'd share it now in case it is helpful to you.

b0d1631 (on branch preserve_model_count)

It adds one extra clause per added variable (described in a comment).

This does now mean that it may perform transformations that do not reduce formula size. i.e. if it adds a 2x3 bva grid, it would remove 6 clauses, add 6 clauses (i.e. normally 5, now +1), and add 1 variable... if this turns out to be undesirable for your problems, you may want to adjust the lower bound for the reduction size here:

SBVA/sbva.cc

Lines 658 to 660 in becc0e0

if (matched_lits->size() <= 2 && matched_clauses->size() <= 2) {
continue;
}

@msoos
Copy link
Author

msoos commented Feb 25, 2024

Wow, nice, thanks, that sounds fantastic! Let me test this tomorrow. I'll keep this issue open because I wanna test and report back some numbers for you so you see the difference :)

@msoos
Copy link
Author

msoos commented Feb 27, 2024

Hi,

First of all, this is great, thanks! I am hacking it a bit to have an interface. As a quick feedback regarding C++. This is something I bumped into myself and is extremely painful.

When you want to enlarge a datastruct by 1 element, and you want to do this many times potentially, doing this:

a.resize(n);
...
a.resize(n+1);
...
a.resize(n+2);

Will actually re-allocate and re-copy everything (almost) every time. It's very stupid but it is what it is. Instead, you need to do this:

a.insert(a.end(), n, elem());
...
a.insert(a.end(), 1, elem());
...
a.insert(a.end(), 1, elem());

This will dynamically allocate more space, first allocating some exponent of 2 bytes, then if that gets used up, it allocates 2x that, etc. This way, you are waaay more efficient. In fact the 1-by-1 can take quadratic time. Sad day. Notice that the push_back() is basically going to do the insert....1.. thing above, that's why it's so efficient.

I think the above may be happening in your code here (doing it 2-by-2 I think?):

            num_vars += 1;
            int new_var = num_vars;

            // Prepare to add new clauses.
             [...]

            lit_to_clauses->resize(num_vars * 2);
            lit_count_adjust->resize(num_vars * 2);

BTW, I am planning to have an API like this, I hope it's OK:

    // This is how to add a CNF clause by clause
    void init_cnf(uint32_t num_vars, Config config);
    void add_cl(const std::vector<int>& cl_lits);
    void finish_cnf();

    std::vector<int> get_cnf(uint32_t& ret_num_vars, uint32_t& ret_num_cls);

Where Config is the set of config params you can normally pass via the command line. Should be ready by tomorrow. It's working already, but I wanna have the library installed, CMake set up to automagically find local version of the compiled (but not yet installed) library, and compile on Windows, Mac, and Linux.

@ndrewh
Copy link
Collaborator

ndrewh commented Feb 27, 2024

Awesome! Thanks for finding the resize issue. Your changes sound great.

One thing I am curious about: have you tried with -n? This disables heuristic tiebreaking (i.e. becomes equivalent to regular BVA). I would be really interested to hear if your problems benefit from the tiebreaking, or not.

@msoos
Copy link
Author

msoos commented Feb 27, 2024

OK, I am now done with the refactor: https://github.com/meelgroup/SBVA

Changes:

  • builds on all platforms
  • builds a library
  • builds a manpage
  • installs binary, header, library, manpage to expected locations
  • Time is now deterministic steps (not tested much)
  • GIT revision is printed on startup (along with compiler stuff, so I can reproduce bugs)
  • --help prints a help
  • command line parsing is more robust thanks to argparse
  • input & output are now positional options, if omitted they do as before
  • memory expansion should not be quadratic (likely doesn't make a big difference but anyway)
  • builds without any warnings
  • clang-tidy says code is clean
  • namespace, etc. used to make sure we don't have a name mangling conflicts
  • allows dynamic and static binaries & libraries to be compiled (except Windows, only static lib&bin is supported there)
  • eigen library is included into the source so it can be compiled more easily
  • added test.cpp as an example how to use the library

You can find it here: https://github.com/meelgroup/SBVA

TODO:

  • check for memory leaks
  • check for over/underflow
  • fuzz it

The system allows to be compiled for clang sanitizer, which I will do (added as a cmake flag). Perhaps there could be leaks or under/overflows. I'll hook this up into my model counting fuzzer and that should help.

I will now do the above testing, add it our model counting preprocessor, and run a full performance test and see what happens. I'll keep this open to do both the checks above and the performance tests so I can get back to you.

ndrewh added a commit that referenced this issue Feb 29, 2024
This is experimental and untested. (see #3)
@msoos
Copy link
Author

msoos commented Mar 4, 2024

I promised to get back to you with results, sorry it took so long. I integrated the version of SBVA that I have into my pipelines. In my minor revision of SBVA, I removed the use of the ->operator[]() calls, and use memory-managed structures except for the swappable arrays. Those are still using pointers, even though myvector.swap(othervector) is a constant-cost operation, it only swaps around the internal pointers. I think it's a gcc optimization bug but I didn't want to dig through the assembly code. Anyway, no mem leaks, which is important for a library, but didn't affect the process-based system you built, as the kernel frees it all at the end anyway.

Overall, SBVA does help, though I haven't tuned it much yet. I set it to about a 10s timeout (200*1e6 steps) and I set the cutoffs here to be variable, as you suggested:

            if (matched_lits.size() <= config.matched_lits_cutoff &&
                    matched_clauses->size() <= config.matched_cls_cutoff) {
                continue;
            }

In model counting, we don't mind to to have somewhat more clauses as long as we have less variables. So the savings need to be a bit higher than usual. I set these to 4 each, except for --sbvaclcut 3 where the clause cutoff was set to 3. Seemingly 4 works better, though. I will also try with different combinations and lower/higher cutoffs, especially for the lits, sorry I haven't done it justice yet.

CDF plot is here of GANAK with/without SBVA: run.pdf Where the legend says --sbva 0, SBVA is turned off. When it says --sbva 200, it's turned on, with a relatively low cutoff (sorry, I'll bump it later). I'll also check with/without "Structured" part, as you have asked for before, and set different limits.

Note that this is 2023 model counting competition, 1h timeout, 4GB RAM. The performance we have here matches that of the best solvers.

@msoos
Copy link
Author

msoos commented Mar 4, 2024

I now checked, and regarding SBVA vs BVA -- BVA solves one problem less. So not so dramatic of a difference, but a difference nevertheless, so I'll stick with SBVA. Note that counting is a lot less noisy than a SAT check, because we have to walk the whole formula, so it's not like SAT where e.g. you can accidentally stumble on a SAT solution and be happy. It's more akin to UNSAT, although in my experience even less noisy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants