-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cost adjust #1352
base: master
Are you sure you want to change the base?
Cost adjust #1352
Conversation
IT seems I answered in the wrong box and pressed the wrong button! |
if (NULL == dj) continue; | ||
|
||
// Look at the right-going connectors. | ||
for (Connector* rcon = dj->right; rcon; rcon=rcon->next) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the left
connectors are ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this was a prototype and I did not do those.
for (Connector* wrc = lkg->link_array[l].rc; wrc; wrc=wrc->next) | ||
{ | ||
// Skip if no match. | ||
if (strcmp(rcon->desc->string, wrc->desc->string)) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote in the general response (in the wrong box...), there is 1-1 match between condesct_t elements and connector strings, disregarding if the connectors are multi or not. Thus the condesc_t strings don't include@
.
So there is a need for a different comparison.
I propose a simpler algo:
The multi-connector costs can be taken from a hash table stored in |
When you say
Does this table currently exist, or would it have to be created? |
It doesn't exist. (BTW, I already used |
OK, thanks. I'm going to keep this open, as a to-do item. Finishing it would not be hard - an afternoon of creating the table and hooking it all up. But nothing depends on it right now, so I don't feel like cluttering up the current code with this new stuff. |
FYI, this did eventually become urgent, and an alternate work-around was done in commit 1dcdd15 which is adequate for now. |
In any case I added this to my TODO list. |
(EDIT: Since I have already written it in the wrong box, I leave it here...)
It seems there are several problems in this code.
Also, there are several other problems related to it.
Regarding
condesc_t::cncost
- this is an improper place to put a multi-connector cost: There is a singlecondesc_t
element for each connector string (i.e. all multi and non-multi connectors with the same connector string share a singlecondesc_t
element). However, each multi-connector in the dict may have its own cost, and these costs may be different.There is also a problem to store the cost in the connector due to the 64 bytes limitation.
A multi-connector cost table could be added to disjuncts, but there is a space constraint there too.
So the only good place for multi-connector costs seems to be in
Sentence_s
- a hash table for that can be added.Among the problems that are related:
Disjunct_struct
because this will create a cost mess if relinking is needed (e.g. for implementing phantoms). To that end, anadjust_cost
field could be added toDisjunct_struct
, but again there is a space problem there.!disjunct
) should be updated, to be consistent with the adjusted disjunct costs. However, it is not clear how to textually represent multi-matches of multi-connectors. (I also discovered a bug there regarding multi-connectors that has no implications in the current dict/corpora and in your optional multi-connector examples).Regarding the code, it seems it has several things which are wrong (or that I don't understand).
In any case, the code can be simplified and this will save fixing these things.
I will add my comments to the code itself.