Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate E_list #971

Merged
merged 9 commits into from
Jul 18, 2019
Merged

Eliminate E_list #971

merged 9 commits into from
Jul 18, 2019

Conversation

ampli
Copy link
Member

@ampli ampli commented Jul 12, 2019

The main change here is the elimination of the E_list struct. I think the code is much clearer without it.
See my post in issue 718.
Add expression traversal API calls per the proposal there. Since most of the lg_exp_*() API are inline functions, the Exp struct definition still needs to be public.
(But maybe these calls can be made non-inline and moved to link-includes.h along with Exp_type and an opaque Exp definition.)

In the same occasion, change copy_Exp() and exp_compare() to iterate the expression operands instead of making recursive calls, to prevent extra stack usage on a very big operand list.

This change increases the Exp struct back to 32 bytes (it got reduced to 24 bytes in the recent PR #969).
If needed, it can be reduced again to 24 bytes using this definition:

struct Exp_struct
{
        Exp *operand_next; /* Next same-level operand. */
        float cost;   /* The cost of using this expression. */
        Exp_type type:8; /* One of three types: AND, OR, or connector. */
        char dir;      /* The connector connects to the left ('-') or right ('+'). */
        bool multi;    /* TRUE if a multi-connector (for connector). */
        union
        {
                Exp *operand_first; /* First operand (for non-terminals). */
                condesc_t *condesc; /* Only needed if it's a connector. */
        };
};

I will check the exact effect of the Exp struc sized when I implement my cost definition cleanup proposal in issue 783.

The total speedup is small - about 3% for the benchmarks of the en basic & fixes batches.

N.B If desired, I can also send PR for the needed change in LGDictExpContainer.cc.

@ampli
Copy link
Member Author

ampli commented Jul 12, 2019

I forgot to add changes for the SAT parser.
I can append them here.

Note also that I didn't really convert commented-out functions (like the ones for infix notation), and also not the corpus stuff (they gotr partially converted by the initial elimination of the u union that was done by a global replacement over all the files).

@ampli
Copy link
Member Author

ampli commented Jul 12, 2019

I can append them here.

Still checking the SAT parser conversion - I will be able to finish only tomorrow.

@ampli
Copy link
Member Author

ampli commented Jul 13, 2019

It turns out I first must fix issue #932, as the SAT parser somehow mishandles long expression chains. It looks like some subexpressions are skipped, but I'm not sure yet. I will have to inspect a detailed debug output from the last good commit and the first bad commit. This may take me several days more.

Meanwhile, please disregard this PR.

@linas
Copy link
Member

linas commented Jul 15, 2019

commented-out functions (like the ones for infix notation)

Maybe it is time to remove that.

the corpus stuff

It seems very unlikely that this will ever be revived, and should almost surely be removed,

@ampli
Copy link
Member Author

ampli commented Jul 18, 2019

It turns out I first must fix issue #932

Fixed in PR #972.

The main work for removing E_list in the SAT-parser was fixing join_alternatives() and finding the nex X_node in insert_connectors(). The problem in both were that toplevel Exp node copies must be used when there are no E_lists, because the next operand pointers in tye Exp nodes themselves must be changed.

BTW, now the classic parser is very much faster than the SAT parser even for long sentences.
And I have WIP changes that make the classic parser yet very much faster (still need cleanup).

I also have WIP changes to make the SAT parser much faster (also still need cleanup). But it seems that the classic parser will still be much faster after that. (I'm not sure about their relative speed after I implement my next ideas regarding yet speeding up both because the speedup potential in each is big.)

Meanwhile, please disregard this PR.

It can be merged now.

@linas linas merged commit 72d1f03 into opencog:master Jul 18, 2019
@linas
Copy link
Member

linas commented Jul 18, 2019

much faster

I was keeping a diary of performance measurements and it was really quite fun to watch things get faster. Very sadly, I accidentally deleted it a few weeks ago, and don't have backups :-(

@linas
Copy link
Member

linas commented Jul 18, 2019

After this merge, I get a build failure:

../../../link-grammar/sat-solver/util.cpp: In function ‘Exp* null_exp()’:
../../../link-grammar/sat-solver/util.cpp:50:3: sorry, unimplemented: non-trivial designated initializers not supported
   };
   ^
Makefile:478: recipe for target 'util.lo' failed

This is for gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)

@ampli
Copy link
Member Author

ampli commented Jul 18, 2019 via email

@linas
Copy link
Member

linas commented Jul 18, 2019

now the classic parser is very much faster than the SAT parser even for long sentences.

If this is the case, there is a question: should the SAT parser be continued to be maintained, or should it be removed? It was originally created to overcome the perception of performance issues. But I've never actually used it in any large-scale projects, and I mostly don't much track how it's performance changes as the result of your work.

The only reason I can think of to keep SAT around is that, for some unknown, futuristic variant of LG that allows link-crossing (non-planar graphs), maybe that would be easier to implement in SAT than the classic parser. Right now, its kind-of hard to imagine why this would be needed, at least for natural language.

The only other strange variation I can think of is the case where every word can belong to 100 different word-classes, in which case there is an awful combinatoric explosion. But its unclear that SAT could offer any benefit for this, either ...

@ampli
Copy link
Member Author

ampli commented Jul 18, 2019

commented-out functions (like the ones for infix notation)

Maybe it is time to remove that.

the corpus stuff

It seems very unlikely that this will ever be revived, and should almost surely be removed,

Removal added to my TODO list.

@ampli
Copy link
Member Author

ampli commented Jul 18, 2019

now the classic parser is very much faster than the SAT parser even for long sentences.

If this is the case, there is a question: should the SAT parser be continued to be maintained, or should it be removed? It was originally created to overcome the perception of performance issues. But I've never actually used it in any large-scale projects, and I mostly don't much track how it's performance changes as the result of your work.

No huge performance change over time. Mainly bug fixes to make its output equivalent to the classic parser, and maintenance when it gets broken due to LG library changes.

It needs to be made several times faster to be faster on short sentences. On long sentences, I think that even 10 times faster would not help. But both are definitely possible, and it may be technically interesting.

The only reason I can think of to keep SAT around is that, for some unknown, futuristic variant of LG that allows link-crossing (non-planar graphs), maybe that would be easier to implement in SAT than the classic parser. Right now, its kind-of hard to imagine why this would be needed, at least for natural language.

It is very easy to allows link-crossing - mainly commenting out some code.

The only other strange variation I can think of is the case where every word can belong to 100 different word-classes, in which case there is an awful combinatoric explosion. But its unclear that SAT could offer any benefit for this, either ...

The current SAT parser also has performance problem with a huge number of connectors. However, both parsers may be enhanced to be much more scalable.

should the SAT parser be continued to be maintained, or should it be removed?

I don't expect that any of my current WIPs will severely break the SAT parser like the Exp changes did
(maybe if we eliminate maxcost, but that may be still relatively easily fixable).
So I think it can be maintained for now, and be removed only if a non-compatible enhancement is done to the classic parser (that is hard to fix in the SAT parser).

ampli added a commit to ampli/link-grammar that referenced this pull request Jul 19, 2019
@ampli ampli mentioned this pull request Jul 19, 2019
@ampli
Copy link
Member Author

ampli commented Jul 19, 2019

N.B If desired, I can also send PR for the needed change in LGDictExpContainer.cc.

It seems it is more severely not up to date, since I get:
'link-grammar/dict-api.h' file not found

Please advise.

@linas
Copy link
Member

linas commented Jul 19, 2019

? It's installed in /usr/local/include/link-grammar/dict-api.h

@linas
Copy link
Member

linas commented Jul 19, 2019

what needs to change is that we should no longer install /usr/local/include/link-grammar/dict-structures.h

@ampli
Copy link
Member Author

ampli commented Jul 19, 2019

? It's installed in /usr/local/include/link-grammar/dict-api.h

Ah, I didn't install LG on purpose to make sure I will not get system headers by mistake in my development code... (but now that hopefully all the <...> includes for LG headers got converted to "..." I guess such a problem is not likely).

Since now the relevant headers files are in subdirectories under link-grammar, it is not just a matter of adding a symlink to the link-grammar directory in the devel code. But the work around is easy (still w/o installing.)

@ampli
Copy link
Member Author

ampli commented Jul 19, 2019

what needs to change is that we should no longer install /usr/local/include/link-grammar/dict-structures.h

This can be done as follows, please tell me if this is fine:

  1. Move the declarations of lg_exp*() functions to dict-api.h.
  2. Move the definitions of the inlinelg_exp*() functions to dict-utils.c.
  3. Move Exp_type to dict-api.h.
  4. Move the forward calls to to dict-api.h, but the one for condesc_t that should be moved elsewhere.
  5. Move Dict_node_struct to dict-api.h, unless an API will be added for it.
  6. ??? Move insert_dict() elsewhere.
  7. Regarding expression_stringify(): In a WIP I added stringify_cost() to return a string version of the cost argument in a consistent manner. Related questions:
    -- Should such functions actually be in the form of stringify_*()? In that case I can use this opportunity to change expression_stringify() to stringify_expression(). (Else I will change stringify_cost() to cost_stringify() to be consistent).
    -- I would like to change expression_stringify() and similar future functions to return static TLS string, because else it is tedious to use such functions due to the need to capture their value free() it afterwards (but with a disadvantage that the returned value is destroyed on the next call).
    -- print_expression() seems to be unneded in the API, especially if expression_stringify() returnes a static (TLS) string.
  8. After all of these moves, remove dict-structures.h as nothing will remain in it.

@linas
Copy link
Member

linas commented Jul 19, 2019

I can take care of porting the opencog code to the new API; I don't expect it to be hard, just 6-12 lines of code total.

no longer install /usr/local/include/link-grammar/dict-structures.h

Sorry, I was confused; installing that file is just fine, I see no problem with it.

stringify

Sure, rename s needed for consistency. Yes, using a static with TLS sounds like a good idea.

@ampli
Copy link
Member Author

ampli commented Jul 19, 2019

Sure, rename s needed for consistency. Yes, using a static with TLS sounds like a good idea.

But should I rename the existing expression_stringify() to stringify_expression() (API change but no one uses it) or the new (yet unpublished) cost_stringify() to stringify_cost()?

ampli added a commit to ampli/link-grammar that referenced this pull request Jul 20, 2019
It is obsolete and unsupported.
Per discussion at PR opencog#971.
@linas
Copy link
Member

linas commented Jul 20, 2019

should I rename the existing

I have no opinion. Most of the rest of the API is dictionary_blah and linkage_blah so expression_blah follows that pattern.

ampli added a commit to ampli/link-grammar that referenced this pull request Jul 20, 2019
It is obsolete and unsupported.
Per discussion at PR opencog#971.
@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

So I will leave the name expression_stringify() but I would like to change its result to const char * (instead of just char *). I will also add const char *cost_stringify(double cost) to dict-api.h.
Both will return static TLS.

@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

These are found in dict-api.c but not in link-grammar.def.

void print_expression(const Exp *);
char *expression_stringify(const Exp *);

But the comment at the start of this file say this is public API.

In any case, I think that print_expression() should be removed from the LG library altogether and printf("%s", expression_stringify()) should be used instead (as it will be fixed to return a static TLS value).

@linas
Copy link
Member

linas commented Jul 20, 2019

print_expression() should be removed

yes sounds good

But the comment at the start of this file say this is public API.

I'm pretty sure no one is using it.

@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

These are found in dict-api.c but not in link-grammar.def.

void print_expression(const Exp *);
char *expression_stringify(const Exp *);

But the comment at the start of this file say this is public API.

I'm pretty sure no one is using it.

I still not understand, as link-grammar.def contains only this lg_exp_*() function:
lg_exp_get_string

So these whole lg_exp_*() API is not actually in any use?!

@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

I can take care of porting the opencog code to the new API; I don't expect it to be hard, just 6-12 lines of code total.

I guess that I will need to add the lg_exp_*() functions to link-grammar.def.

But I think that we should **also move them to link-includes.h** so the Python test suite will be able to use them. At least lg_exp_stringify() (as I currently have it in the WIP) would be useful to tests dicts (another thing that I would like to add to the test suite).

@linas
Copy link
Member

linas commented Jul 20, 2019

link-grammar.def

In the past, Apple was the only OS that paid attention to what was exported, and so any ommissions in this file had to wait for Apple users to report them. Very recently, some version of Linux showed up that also pays attention to this file...

lg_exp_get_string

The only time this function was ever used by anybody was in opencog, during debugging. It is not used normally, and is not in the code base. I think it is very unlikely that there are any other users out there, or that anyone would ever use it except during debugging.

move things to link-includes.h

I'd rather not. The stuff in link-includes is a fairly clean, well-supported API with some promises about forward and backward compatibility. The stuff in dict-api.h is a bit more experimental, and it has one and only one user -- opencog -- and I do not expect any other users to show up. I would rather keep it off to the side, and not put it too close to the "main" API. The main API is enough for all users; I don't want to confuse them with this other, somewhat strange and currently a bit buggy API (its "buggy" because it currently doesn't support regexes; currently, even opencog is buggy, because it does not handle optional connectors correctly, and opencog is also ignoring costs, and finally, the two opencog subsystems that use it need to be redesigned anyway, because they do not work well. They are using the dict to generate sentences...)

@linas
Copy link
Member

linas commented Jul 20, 2019

In fact, if you are bored and looking forr something new to do -- the language generation subsystem needs help, The idea is that you give it a bag of concepts, relations, actions, and these will get serialized into some grammatically valid sentence. (e.g. (Concept "Amir") (Concept "pizza") (Action "eat") (Time "past") and a sentence falls out "Amir ate pizza") The "bag of stuff" is informal and is not strictly defined; any sentence that approximates the bag of stuff is acceptable. There are some interesting things to think about with the "bag of stuff" needs to be, and how to convert that into sentences. The current code for this is called "sureal" (surface realization) and "microplannning" (to plan out multiple sentences) -- they don't work very well.

@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

I thought that the intention was to use the lg_get_*() functions to make the expression traversal that doesn't depend on Exp_struct internals, so there will not be a need to change it again if the Exp_struct is changed again.
(I also thought it will be further a good idea to to export the Exp_struct internals at all, and can I guess that if the
implement this if desired)

Is it still the case?

@linas
Copy link
Member

linas commented Jul 20, 2019

I think the code is using the lg_exp_* functions already, and I am happy to leave it as static inline for now. There is no need for a binary ABI for this particular interface at this time. Mostly just keep things simple, and minimize work. Effort should got elswhere.

@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

if you are bored and looking forr something new to do

The "problem" is that I am not bored at all with the LG stuff. But I'm open to interleave it with other tasks, see below.

Currently I try to convert to clean pull request mostly existing WIP branches, some of them very old.
For many months I try to do it as fast as possible, and a main obstacle is that I mostly have to send the PRs serially, because of their (sometimes heavy) inter dependency. After I send each of them, I "git pull" and then merge/rebase many branches. I don't do it in ahead because I didn't find a way to ensure there will not be inconsistencies, and I had bad surprises, and often I need to change PRs before they are applied (I don't complain on that - this is just fine). But making "git pull" and rebase turned out to be full-proof.

Now to the "see below" stuff.
Maybe by interleaving with additional tasks I can do more things while I am waiting for PRs to apply.
So I will take a look at that.

@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

I think the code is using the lg_exp_* functions already

You are right - I located that.
So I will just add them to the .def file so they will work when compiled on any version of Linux and other *nix systems.

@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

BTW, one of my TODOs is a simple way to eliminate the link-grammar.def file altogether with a way compatible to both *inx and Windows systems.

@linas
Copy link
Member

linas commented Jul 20, 2019

I am not bored at all

Well, that's good! You are good at what you do, and I was just thinking that there are these other systems that need this kind of love and attention.

I mostly have to send the PRs serially

I can try to review them faster, if it helps. Also, I am comfortable with 95% of the changes that you make, so you could start merging some of these yourself, to speed things up. There is very little in your pull reqs that I ever look at and find questionable or objectionable.

@ampli
Copy link
Member Author

ampli commented Jul 20, 2019

I can try to review them faster, if it helps.

Yes, it would very help, as the amount of PRs I would like to finally send is astonishing big, and I have a big list of TODOs (among other things, almost every open issue is at least one TODO).

There is very little in your pull reqs that I ever look at and find questionable or objectionable.

The problem is that I'm never sure what it will be, and had a poor luck in guessing that...
But I'm ready to fix things after the fact, e.g. in the case of cost_stringify() I could just submit an additional PR that make it use a fixed printing format (I did it several times with other stuff that I wrote than totally reimplemented, sometimes more than once for the same stuff).

@linas
Copy link
Member

linas commented Jul 20, 2019

poor luck in guessing that...

Getting it right 90% of the time is not "poor luck".

@ampli
Copy link
Member Author

ampli commented Aug 8, 2019

It seems I introduced a bug in this PR into read-sql.c, as the elimination of E_list was not supposed to solve the "unknown word" overhead of micro-discrim-unk. I'm checking that and will send a fix if needed.

EDIT: Fixed in PR #988.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants