Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saul inference: moving beyond LBJava's inference #401

Open
wants to merge 71 commits into
base: master
Choose a base branch
from
Open

Saul inference: moving beyond LBJava's inference #401

wants to merge 71 commits into from

Conversation

danyaljj
Copy link
Member

@danyaljj danyaljj commented Sep 25, 2016

Items to finish:

Results on ER (L+I):

Exactly the same numbers in the readme file:

[info] ===============================================
[info] Evaluating PerConstrainedClassifier$
[info] 
[info]  Label   Precision Recall   F1   LCount PCount
[info] ----------------------------------------------
[info] false       98.788 99.752 99.267  61178  61775
[info] true        89.088 62.362 73.367   1990   1393
[info] ----------------------------------------------
[info] Accuracy    98.574   -      -      -     63168
[info] ===============================================
[info] Evaluating OrgConstrainedClassifier$
[info] 
[info]  Label   Precision Recall   F1   LCount PCount
[info] ----------------------------------------------
[info] false       98.714 99.622 99.166  61896  62465
[info] true        66.714 36.871 47.494   1272    703
[info] ----------------------------------------------
[info] Accuracy    98.358   -      -      -     63168
[info] ===============================================
[info] Evaluating LocConstrainedClassifier$
[info] 
[info]  Label   Precision Recall   F1   LCount PCount
[info] ----------------------------------------------
[info] false       98.375 99.643 99.005  60760  61543
[info] true        86.646 58.472 69.824   2408   1625
[info] ----------------------------------------------
[info] Accuracy    98.073   -      -      -     63168
[info] ===============================================
[info] Evaluating WorksForRelationConstrainedClassifier$
[info] 
[info]  Label   Precision Recall   F1   LCount PCount
[info] ----------------------------------------------
[info] false       90.783 84.588 87.576    850    792
[info] true        25.989 38.655 31.081    119    177
[info] ----------------------------------------------
[info] Accuracy    78.947   -      -      -       969
[info] ===============================================
[info] Evaluating LivesInRelationConstrainedClassifier$
[info] 
[info]  Label   Precision Recall   F1   LCount PCount
[info] ----------------------------------------------
[info] false       76.581 94.509 84.605    692    854
[info] true        66.957 27.798 39.286    277    115
[info] ----------------------------------------------
[info] Accuracy    75.439   -      -      -       969
[info] ===============================================

@kordjamshidi
Copy link
Member

Do you have any documentation that we can start reading from there?

@danyaljj
Copy link
Member Author

danyaljj commented Nov 6, 2016

Yes, see the changes.

@kordjamshidi
Copy link
Member

ok, I'll do.

the inference starts from the head object. This function finds the objects of type `INPUT_TYPE` which are
connected to the target object of type `HEAD_TYPE`. If we don't define `filter`, by default it returns all
objects connected to `HEAD_TYPE`. The filter is useful for the `JointTraining` when we go over all
global objects and generate all contained object that serve as examples for the basic classifiers involved in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: objects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

In Saul, the constraints are defined for the assignments to class labels.
A constraint classifiers is a classifier that predicts the class labels with regard to the specified constraints.
In Saul, the constraints are defined for the assignments to class labels. In what follows we outine the details of operators
which help us define the constraints. Before jumping into the details, note that you have to have the folling import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: following

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


In the above definition, `on` and `is` are keywords.

Here different variations of this basic, but there are different variations to it:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: didn't understand this sentence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol fixed.

| `ForAll` | For **all** the elements in the collection it applies the constraints. In other words, the constrain should hold for **all** elements of the collection. | `textAnnotationNode.ForAll { x: TextAnnotation => Some-Constraint-On-x }` |
| `Exists` | The constrain should hold for **at least one** element of the collection. | `textAnnotationNode.Exists { x: TextAnnotation => Some-Constraint-On-x }` |
| `AtLest(k: Int)` | The constrain should hold for **at least `k`** elements of the collection. | `textAnnotationNode.AtLeast(2) { x: TextAnnotation => Some-Constraint-On-x }` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AtLeast(k: Int) in the first column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

There are just the definitions of the operations. If you want to see real examples of the operators in actions see [the definitions of constraints for ER-example](https://github.com/IllinoisCogComp/saul/blob/master/saul-examples/src/main/scala/edu/illinois/cs/cogcomp/saulexamples/nlp/EntityRelation/EntityRelationConstraints.scala).

**Tip:** Note whenever the constrained inference is infeasible (i.e. the constraints are overlly tight), we use the default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: overly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -19,48 +19,48 @@ object ClassifierUtils extends Logging {
def apply[T <: AnyRef](c: (Learnable[T], Iterable[T])*) = {
c.foreach {
case (learner, trainInstances) =>
logger.info(evalSeparator)
logger.info("Training " + learner.getClassSimpleNameForClassifier)
println(evalSeparator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change from logger.info to println intentional? No strong opinion, just wanted to confirm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it because it was messing up the formatting of the output results. But now in retrospect, I think only the ones related to testing was needed. I returned back the rest.

val instanceIsInvolvedInConstraint = instancesInvolved.exists { set =>
set.exists {
case x: T => x == t
case everythingElse => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can use _ here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

case (singleConstraint, ins) =>
ins union getInstancesInvolved(singleConstraint).asInstanceOf[Set[Any]]
}
case c: AtMost[_, _] =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I miss something here, but could you explain a bit: all these have the same body? AtMost, AtLeast, forAll...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Merged them into one by adding an extra type.

@danyaljj
Copy link
Member Author

danyaljj commented Nov 9, 2016

Applied the comments. Let me know if you have any other comments.

object TestConstraintClassifier extends ConstrainedClassifier[String, String] {
override def subjectTo = None
override val solverType = OJAlgo
override lazy val onClassifier = TestClassifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the onClassifier to baseClassifier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this rename.

object OrgConstrainedClassifier extends ConstrainedClassifier[ConllRawToken, ConllRelation] {
override lazy val onClassifier = EntityRelationClassifiers.OrganizationClassifier
override def pathToHead = Some(-EntityRelationDataModel.pairTo2ndArg)
override def subjectTo = Some(EntityRelationConstraints.relationArgumentConstraints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want to say subjectTo = Some(worksForConstraint)? what will be the syntax? workForContatint needs an input parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another relevant comment, how can I just express true or false i.e. constant expressions as constraints. Please add these to the documentation also.

Copy link
Member Author

@danyaljj danyaljj Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want to say subjectTo = Some(worksForConstraint)? what will be the syntax? workForContatint needs an input parameter.

Regarding the first questions (as also mentioned in the documentation), the definition of the constraints starts with Node and use ForEach operator. If you want to define "worksForConstraints", instead of having it as a function:

  def worksForConstraint(x: ConllRelation) = {
    (WorksForClassifier on x isTrue) ==> ((PersonClassifier on x.e1 isTrue) and (OrganizationClassifier on x.e2 isTrue))
  }

it should be written as

  def worksForConstraint = EntityRelationDataModel.pairs.ForEach { x: ConllRelation => 
    (WorksForClassifier on x isTrue) ==> ((PersonClassifier on x.e1 isTrue) and (OrganizationClassifier on x.e2 isTrue))
  }

and then you can do subjectTo = Some(worksForConstraint) ....

Another relevant comment, how can I just express true or false i.e. constant expressions as constraints. Please add these to the documentation also.

You can't (and you shouldn't) define constants.

Copy link
Member

@kordjamshidi kordjamshidi Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not mapping the case with constant True to the case with no constraints and the case with constant False to an infeasible solution message? I think it would more expressive and robust to cover all possible logical expressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not mapping the case with constant True to the case with no constraints and the case with constant False to an infeasible solution message? I think it would more expressive and robust to cover all possible logical expressions.

I can almost surely guarantee that you never need to use constant True or constant False. Can you come up with an example that either of these constants are necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my view, it is a matter of completeness of the representation. And those are the basic cases that your system should not fail to address those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something is never needed why is it a "basic case" and "matter of completeness"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you give me a concrete example that it's needed, I won't be convinced.

Copy link
Member

@kordjamshidi kordjamshidi Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me a concrete application example that a hypothesis which says h=True is useful? Why the most general hypothesish=True is discussed and mentioned at all!!
Your argument looks like this to me.

Copy link
Member

@kordjamshidi kordjamshidi Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something is never needed why is it a "basic case" and "matter of completeness?"

I missed this sentence, now I see it. I think the confusion comes from where we use first-order logic inside a different formalism that is ILP, therefore, it seems we can look at it only very practically without thinking if our constraint representation is even sound. From my view when you talk about logical expressions in any context the first expressions to think of are True and False and if I can not represent these, I am not sure what I am talking about then. We probably need to ask a third or more opinions on this. I do not have concrete examples.

@kordjamshidi
Copy link
Member

For Foreach, I think the documentation is not clear enough, I can not see easily that I alway need to start writing any kind of constraint with Foreach expression, maybe you should say this above the table. Also, why this should be the case? I guess having ForAll and Foreach with two different semantics here will be confusing, from the logical expressions perspective. Unless this is technically impossible to change, I would suggest changing this.

@danyaljj
Copy link
Member Author

For Foreach, I think the documentation is not clear enough, I can not see easily that I alway need to start writing any kind of constraint with Foreach expression, maybe you should say this above the table. Also, why this should be the case? I guess having ForAll and Foreach with two different semantics here will be confusing, from the logical expressions perspective. Unless this is technically impossible to change, I would suggest changing this.

Yes there is a conceptual difference.

  • Collection.ForEach{ x => .... } applies the constraint on each single instance.
  • Collection.ForAll{ x => .... } applies the constraint on all the instances.

For Nodes we often want the first one. In many other cases we use the second one. I will clarify this in the documentation.

Bhargav Mangipudi and others added 3 commits January 20, 2017 18:02
Update Badge example to use new Constraint convention.
Update the Saul Inference PR
Copy link
Contributor

@bhargav bhargav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I agree with @kordjamshidi 's comment about the ambiguity of ForEach and ForAll. ForEach is implemented as a quantifier that you can apply to nodes.

But ForAll is implemented as a conjunction of constraint clauses. If we plan to keep this convenience notation, we should rename to avoid confusion.

Ideally, we should be able to do node.ForAll as a quantifier.

|----------|------------|---------|---|
| `ForEach` | This operator works only on `Node`s. For each single instance in the node. This is often times one of the starting points for defining constraints. So if you are defining using a constrained classifier with head type `HEAD_TYPE`, we the definition of the constraint have to start with the node corresponding to this type. | `textAnnotationNode.ForEach { x: TextAnnotation => Some-Constraint-On-X }` |
| `ForAll` | For **all** the elements in the collection it applies the constraints. In other words, the constrain should hold for **all** elements of the collection. | `textAnnotationNode.ForAll { x: TextAnnotation => Some-Constraint-On-x }` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo: constrain -> constraint in this and the next 4 lines.

object TestConstraintClassifier extends ConstrainedClassifier[String, String] {
override def subjectTo = None
override val solverType = OJAlgo
override lazy val onClassifier = TestClassifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this rename.

}

/** find all the instances used in the definiton of the constraint. This is used in caching the results of inference */
private def getInstancesInvolved(constraint: Constraint[_]): Set[_] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getInstancesInvolved and getClassifiersInvolved methods do not depend on ConstrainedClassifer (They don't need to be here). We should add these methods to the trait/abstract class for Constraint and have them implemented there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What abstract class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; that is doable, but not sure if we would gain significant anything from that.....

@danyaljj
Copy link
Member Author

@bhargav I think the introduction of ForAll is confusing. I can totally hide (just like the old times), and things should be simpler in that case. That sounds any good?

@danyaljj
Copy link
Member Author

Updated the documentation to make the confusion between ForAll and ForEach clear.

@bhargav
Copy link
Contributor

bhargav commented Jan 26, 2017

Overall the change look good. I want to do a test on the JVM memory usage after these changes. I'll do a comparative run on some of our examples. ETA: 1-2 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants