-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hanging Collective Transitive Rules #78
Comments
Thank you for the detailed explanation. |
Thanks, Robin. |
Hello, I looked into it a bit. First MethodThe compilation step, a #P-complete problem, causes the long processing time. I tested this on both SDD ( There might be a bug that makes some ground programs unnecessarily larger, something Robin and I were looking into, but I have yet to confirm whether that occurs here too. The ProbLog problem you gave seems to just scale terribly, and so does the compilation process. Below are some statistics. SDDExcluding nodes 7 and 8 - using SDD without minimization Excluding nodes 7 and 8 - using SDD with minimization Excluding node 8 - using SDD without minimization Excluding node 8 - using SDD with minimization Original problem - using SDD without minimization Original problem - using SDD with minimization dSharpExcluding nodes 7 and 8 Excluding node 8 Original problem Second MethodThe issue is with grounding the program (so not even cycle breaking). Apparently, as is also seen in profile report, it is spending most of the time in the Term's Since this is not cycle breaking related, and Robin knows most about the grounding, I'll reassign him. I will revisit the First Method to see if the ground program contains anything unnecessary but I will have to attend to some other tasks first so it might take a few days. Code I used to analyse each stepsCodeimport time
from problog.formula import LogicDAG
from problog.program import PrologFile
from problog.logic import Term
from problog.engine import DefaultEngine
from problog.ddnnf_formula import DDNNF
from problog.cnf_formula import CNF
from problog.sdd_formula import SDD
class Timer(object):
def __init__(self, msg):
self.message = msg
self.start_time = None
def __enter__(self):
self.start_time = time.time()
def __exit__(self, *args):
print("%s: %.4fs" % (self.message, time.time() - self.start_time))
def main(filename, k):
model = PrologFile(filename)
# default label_all=False
engine = DefaultEngine(label_all=True)
with Timer("parsing"):
db = engine.prepare(model)
print("\n=== Queries ===")
queries = engine.query(db, Term("query", None))
print("Queries:", ", ".join([str(q[0]) for q in queries]))
print("\n=== Evidence ===")
evidence = engine.query(db, Term("evidence", None, None))
print("Evidence:", ", ".join(["%s=%s" % ev for ev in evidence]))
print("\n=== Ground Program ===")
with Timer("ground"):
gp = engine.ground_all(db)
print(gp)
semiring = None
if k == 'sdd':
print("\n=== SDD ===")
with Timer("SDD"):
kc = SDD.create_from(gp, engine=engine, semiring=semiring, sdd_auto_gc=False)
sdd_manager = kc.get_manager().get_manager()
print(f"SDD size (all / live / dead): {sdd_manager.size()} / {sdd_manager.live_size()} / {sdd_manager.dead_size()}")
print(f"SDD node (all / live / dead): {sdd_manager.count()} / {sdd_manager.live_count()} / {sdd_manager.dead_count()}")
else:
print("\n=== DAG ===")
with Timer("DAG"):
dag = LogicDAG.createFrom(gp)
print(dag)
print("\n=== DDNNF ===")
with Timer("DDNNF"):
cnf = CNF.create_from(dag, engine=engine, semiring=semiring)
cnf_file_path = "./analysis_file_cnf.out"
with open(cnf_file_path, "w") as cnf_file:
cnf_file.write(cnf.to_dimacs())
print(f"Printed CNF file to {cnf_file_path}")
kc = DDNNF.create_from(cnf, engine=engine, semiring=semiring)
print("\n=== Evaluation ===")
with Timer("Evaluation"):
result = kc.evaluate(semiring=semiring)
print(result)
if __name__ == "__main__":
import argparse
argparser = argparse.ArgumentParser()
argparser.add_argument("filename")
argparser.add_argument("-k")
args = argparser.parse_args()
main(**vars(args)) |
Thanks for the update. |
About method 2: Tabling doesn't helps problog here, as the 3rd argument keeps changing. It's even possible that tabling is adding to the slowdown once the tables grow big. |
@krishnangovindraj The exponential grounding time (and increased time due to memoizing the list) makes sense, but do you think that's all there is to it? |
I have let an instance of the second method run for a week now and it is still going. |
I was able to run the first method to completion on a larger machine.
|
I've finished tracking down a bug today. I hope to revisit this one in a few days but it might take a while. Something I did already notice and is perhaps related is the following: The ground program of :- use_module(library(lists)).
1/2::throw([1]).
t :- throw(L), member(4, L).
query(t). should be empty since
|
Hey All,
I am having some issues with transitive rules and I was wondering if my results were expected or if there may be a bug lurking here.
I have a transitive rule that I have tried to implement in two different ways, both of which have unfavorable outcomes (consuming all memory and a presumably infinite loop).
All methods were run using the CLI from the ProbLog Python package (from PyPi, version 2.2.2):
Any help you can provide in getting either of these methods (or any other method of dealing with collective transitivity) would be greatly appreciated.
General Model
This is a small synthetic model with the end goal of inferring whether two people know each other.
The data and structure for this model comes from the psl-examples repository:
https://github.com/linqs/psl-examples/tree/develop/simple-acquaintances
The specific rules that my initial ProbLog rules are based on come from here:
https://github.com/linqs/psl-examples/blob/develop/simple-acquaintances/cli/simple-acquaintances.psl
These examples will only use three predicates:
knows/2
- The final target predicate, indicating two people know each other.knows_l1/2
- An intermediate predicate forknows/2
.knows/3
- Used in the second method to carry a list of seen nodes.In the examples I will provide, I stripped down the rules and data to the smallest set that still causes these issues (I have not fully tested every possible subset of the data since these can take a while, but I have cut it down considerably).
First Method
The first method uses a very straightforward approach to transitivity (with the hopes that memoization will stop cycles from happening).
The probabilistic facts enumerates through [0, 9] for each argument and excludes self references and the query (0, 1).
The full file is attached:
problog_transitive_method1.txt
I also attached the output of a profile (py-spy top):
problog_transitive_method1_profile_top.txt
When run, it hangs on the output:
The process will continue to use ~100% of a core and will keep accumulating memory until there is no more or it is killed.
On my machine, the this took about 50GB of RAM and swap combined.
Second Method
The second method uses a commonly recommended pattern for transitivity in ProLog, by keeping a list of previously seen nodes.
The probabilistic facts enumerates through [0, 4] for each argument and excludes self references and the query (0, 1).
The full file is attached:
problog_transitive_method2.txt
I also attached the output of a profile (py-spy top):
problog_transitive_method2_profile_top.txt
When run, it hangs on the output:
So it looks like it is stuck in grounding.
The process will continue to use ~100% of a core, but will not use more RAM.
I ran this for about 12 hours until I killed it.
The text was updated successfully, but these errors were encountered: