Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resize tracon count table #1403

Merged
merged 5 commits into from
Jan 30, 2023
Merged

Resize tracon count table #1403

merged 5 commits into from
Jan 30, 2023

Conversation

linas
Copy link
Member

@linas linas commented Jan 30, 2023

Based on the measurements made in #1402, we now have an accurate estimate for the number of tracons that will actually be used during parsing, both for contentional and for MST parsing. Implement those estimates.

@linas
Copy link
Member Author

linas commented Jan 30, 2023

@ampli I am merging this now, because it seems like the right thing to do. But please review; I do appreciate the comments you make.

@linas linas merged commit d80760e into opencog:master Jan 30, 2023
@linas linas deleted the counting-table branch January 30, 2023 22:30
@ampli
Copy link
Member

ampli commented Jan 31, 2023

I found several problems.
It seems there is now a slowdown for null_count>1 (it needs a much bigger table then), and the resize doesn't work as needed.
Since I'm going to sleep soon, I will just post some partial findings.
I used the sentence And yet he should be always ready..., with arguments -lim=10000 -v=5 -de=table_alloc.
Note the debug message from table_alloc():

...
verbosity set to 5
debug set to table_alloc
link-grammar: Info: Dictionary found at ./data/en/4.0.dict
limit set to 10000
link-grammar: Info: Dictionary version 5.12.1, locale en_US.UTF-8
link-grammar: Info: Library version link-grammar-5.12.1. Enter "!help" for help.
#### Finished tokenizing (174 tokens)
++++ Split sentence                              0.10 seconds
++++ Finished expression pruning                 0.00 seconds
++++ Built disjuncts                             0.07 seconds
++++ Eliminated duplicate disjuncts              0.04 seconds
++++ Encoded for pruning                         0.08 seconds
++++ power pruned (for 0 nulls)                  0.05 seconds
++++ Built mlink_table                           0.00 seconds
++++ power pruned (for 0 nulls)                  0.01 seconds
++++ pp pruning                                  0.00 seconds
++++ power pruned (for 0 nulls)                  0.00 seconds
++++ Built mlink_table                           0.00 seconds
++++ power pruned (for 0 nulls)                  0.01 seconds
++++ Encoded for parsing                         0.00 seconds
++++ Initialized fast matcher                    0.00 seconds
Trace: table_alloc: Connector table size 4194304
Trace: table_alloc: Connector table size 1
++++ Counted parses (0 w/0 nulls)                0.62 seconds
++++ Finished parse                              0.00 seconds
No complete linkages found.
++++ Finished expression pruning                 0.00 seconds
++++ Built disjuncts                             0.06 seconds
++++ Eliminated duplicate disjuncts              0.04 seconds
++++ Encoded for pruning (one-step)              0.11 seconds
++++ power pruned (for 1 null)                   0.05 seconds
++++ Built mlink_table                           0.00 seconds
++++ power pruned (for 1 null)                   0.01 seconds
++++ pp pruning                                  0.00 seconds
++++ power pruned (for 1 null)                   0.01 seconds
++++ Built mlink_table                           0.00 seconds
++++ power pruned (for 1 null)                   0.01 seconds
++++ Encoded for parsing                         0.00 seconds
++++ Initialized fast matcher                    0.00 seconds
Trace: table_alloc: Connector table size 8388608
Trace: table_alloc: Connector table size 1
Trace: table_alloc: Connector table size 1
link-grammar: Warning: insanely large tracon hash table size: 33554432
++++ Counted parses (2147483647 w/1 null)       14.39 seconds
++++ Built parse set                             1.98 seconds
...

In addition, if this is as intended, then I don't understand it:

* Provide an estimate for the number of Table_tracon entries that will
* be needed.
*
* The number of entries actually used was measured in discussion
* https://github.com/opencog/link-grammar/discussions/1402
* Based on this, an upper bound on the entries needed is
* 3 * num_disjuncts * log_2(num_words)
* i.e. more than this is almost never needed. A lower bound is
* 0.5 * num_disjuncts * log_2(num_words)
* i.e. more than this is *always* needed.
*
* In both conventional and MST dictionaries, more than 500K entries is
* almost never needed. In a handful of extreme cases, 2M was observed.
*/
static unsigned int estimate_tracon_entries(Sentence sent)
{
unsigned int nwords = sent->length;
unsigned int log2_nwords = 0;
while (nwords) { log2_nwords++; nwords >>= 1; }
unsigned int tblsize = 3 * log2_nwords * sent->num_disjuncts;
if (tblsize < 512) tblsize = 512; // Happens rarely on short sentences.
return tblsize;

unsigned int num_elts = estimate_tracon_entries(sent);
sent->Table_tracon_pool =
pool_new(__func__, "Table_tracon",
num_elts, sizeof(Table_tracon),
/*zero_out*/false, /*align*/false, /*exact*/false);

You allocate here at once a memory block equal to the upper bound.
Even sentences that need half of the upper bound will get this upper bound as the initial allocation.
And if even an additional single entry will be needed, then pool_alloc() will allocate an additional memory block equal to the upper bound!
In my opinion, the lower bound should be clearly used here.

I also found that the ru batch runs now slower by ~6%.

Regarding table size:

uint32_t table_size;
size_t table_mask;

You have changed it from size_t to uint32_t (the mask inconsistently remained size_t).
I intentionally made it size_t, so it can get over 2^31, for the generation mode (and for regular parsing with nulls of long sentences).
When I use link-generator -s 8 -l en --verbosity=5 --debug=table_stat I get:
link-grammar: Warning: insanely large tracon hash table size: 134217728.

linas added a commit that referenced this pull request Jan 31, 2023
@linas
Copy link
Member Author

linas commented Jan 31, 2023

size_t hash

OK, I did not understand. Restored in 1193ddd

@linas
Copy link
Member Author

linas commented Jan 31, 2023

For me, And yet he should be always ready... parses without nulls; and power-prune left only 74 disjuncts.

I added print statements to count.c here:

ctxt->table_size *= 2; /* Double the table size */

to see if the table doubles past 32 bits. I haven't seen that yet. Whatever; I will change things to use size_t and remove the warning.

Also, link-generator crashed because of a null pointer deref; I worked around this in 7e2e9d3 -- I don't know why this happened.

I removed the large-table warning and went to size_t tables in commit 8b87635

I changed the pool chunk size in 7951ed8 ... I had not previously understood this was a chunk size. Perhaps it might be better to hard-code this size, after all?

The last four changes were direct pushes into the repo; would it be better if I had made these into pull reqs, instead?

linas added a commit that referenced this pull request Jan 31, 2023
linas added a commit that referenced this pull request Jan 31, 2023
@ampli
Copy link
Member

ampli commented Jan 31, 2023

For me, And yet he should be always ready... parses without nulls; and power-prune left only 74 disjuncts.

I meant the famous sentence from fix-long:
And yet he should be always ready to have a perfectly terrible scene, whenever we want one, and to become miserable, absolutely miserable, at a moment’s notice, and to overwhelm us with just reproaches in less than twenty minutes, and to be positively violent at the end of half an hour, and to leave us for ever at a quarter to eight, when we have to go and dress for dinner when, after that, one has seen him for really the last time, and he has refused to take back the little things he has given one, and promised never to communicate with one again, or to write one any foolish letters, he should be perfectly broken-hearted, and telegraph to one all day long, and send one little notes every half-hour by a private hansom, and dine quite alone at the club, so that every one should know how unhappy he was.

But now I see the reason of Trace: table_alloc: Connector table size 1:

static void table_grow(count_context_t *ctxt)
{
table_alloc(ctxt, 0);

size_t reqsz = 1ULL << logsz;
if (0 < logsz && reqsz <= ctxt->table_size) return; // It's big enough, already.
lgdebug(+D_COUNT, "Connector table size %lu\n", reqsz);
#if HAVE_THREADS_H && !__EMSCRIPTEN__
// Install a thread-exit handler, to free kept_table on thread-exit.
static once_flag flag = ONCE_FLAG_INIT;
call_once(&flag, make_key);
if (NULL == kept_table)
tss_set(key, &kept_table);
#endif /* HAVE_THREADS_H && !__EMSCRIPTEN__ */
if (logsz == 0)
ctxt->table_size *= 2; /* Double the table size */
else
ctxt->table_size = reqsz;

Regarding the table size print at line 198, the intention was to print its size after the doubling (the previous code did that).

The original code has this:
#define MAX_LOG2_TABLE_SIZE ((sizeof(size_t)==4) ? 25 : 34)

Now there is no limit. In 32 bits it is not a good idea to let it become too big.

	 * FYI: the new tracon tables are (much?) smaller than the older
	 * connector tables, so maybe this reuse is no longer needed?

Note that we observed the problem of big allocation when the table already used tracons.

if (tblsize < 512) tblsize = 512; // Happens rarely on short sentences.

The original minimum size was 4096, found after profiling. I will check it again later.
The reason of the ru slowdown is not yet clear to me, mas changing it to 4096 didn't solve the slowdown.

More later.

@ampli
Copy link
Member

ampli commented Feb 1, 2023

Perhaps it might be better to hard-code this size, after all?

I don't find a benefit in hard coding that.
It depends on the expected number of elements and their distribution.
If you use very big blocks you may waste virtual memory and maybe need to have more pages in memory (I'm not sure about the impact). Too small allocations has too much allocation overhead. However, if you use the zero_out argument too big allocations cause CPU overhead due to clearing of unused memory.

@ampli
Copy link
Member

ampli commented Feb 1, 2023

For me, And yet he should be always ready... parses without nulls; and power-prune left only 74 disjuncts.

I added print statements to count.c here:

ctxt->table_size *= 2; /* Double the table size */

to see if the table doubles past 32 bits. I haven't seen that yet. Whatever; I will change things to use size_t and remove the warning.

Also, link-generator crashed because of a null pointer deref; I worked around this in 7e2e9d3 -- I don't know why this happened.

I removed the large-table warning and went to size_t tables in commit 8b87635

I changed the pool chunk size in 7951ed8 ... I had not previously understood this was a chunk size. Perhaps it might be better to hard-code this size, after all?

The last four changes were direct pushes into the repo; would it be better if I had made these into pull reqs, instead?

For small changes like typos, comment fixes, code formatting and insignificant changes (like making initializations at declarations instead of code body), direct pushes seems fine. But for more significant changes I thing PR are preferable.

@ampli
Copy link
Member

ampli commented Feb 1, 2023

I don't know why this happened.

Mysterious to me too, for now.
BYW, I have a branch to replace the finding of unused disjuncts (the current code is wrong) but I still need to complete it. But before I can send it I have to send the PR of improving the generation speed (which is also not complete...).

@linas
Copy link
Member Author

linas commented Feb 1, 2023

Perhaps it might be better to hard-code this size, after all?

I don't find a benefit in hard coding that.

You had originally set it to 16384:

sent->Table_connector_pool =
pool_new(__func__, "Table_connector",
/*num_elements*/16384, sizeof(Table_connector),
/*zero_out*/false, /*align*/false, /*exact*/false);

I changed this to be dynamic, but I mis-understood how pools worked when I made this change. That number in pool_new seems to be a chunk-size. With pool-reuse, the size in that initial pool_new then governs everything that comes after, forevermore. If its too small, then that's bad... I'm setting it back to 16384 now ...

Let me know if my logic is still bad, here.

linas added a commit to linas/link-grammar that referenced this pull request Feb 1, 2023
@linas
Copy link
Member Author

linas commented Feb 1, 2023

#define MAX_LOG2_TABLE_SIZE ((sizeof(size_t)==4) ? 25 : 34)

OK, I'm putting this back now. See #1405

@ampli
Copy link
Member

ampli commented Feb 1, 2023

Perhaps it might be better to hard-code this size, after all?

I don't find a benefit in hard coding that.

You had originally set it to 16384:

Sorry, I totally misunderstood you. For some reason, I thought that we are talking about hard coding a value in the memory-pool management...

With pool-reuse, the size in that initial pool_new then governs everything that comes after, forevermore.

pool_reuse() leaves the currently allocated memory as is, and reuses the allocated blocks. It allocates more blocks (of the same size) if needed. The next pool-reuse(), if any, makes the same thing.

We can think of other strategies for the pool block size that may have a benefit:

  1. Increase the block size on any allocation. This will reduce the number of allocations if the initially guessed blocked size was too small.
  2. Round the requested block sizes to the near power-of-2 less the malloc() overhead area, in a try to get the most from the allocated virtual memory. This will also tend to use few block sizes and hopefully would reduce fragmentation.

@linas
Copy link
Member Author

linas commented Feb 1, 2023

Perhaps it might be better to hard-code this size, after all?

I don't find a benefit in hard coding that.

You had originally set it to 16384:

We can think of other strategies for the pool block size that may have a benefit:

A fixed size 16384 seems OK. (Or maybe 16383 would be better) Why? The "alloced table connectors" graph in
#1402 (comment)

repost:

djc-dj-cnt-scaled

Each horizontal line is a re-alloc of 16K elements. Counting the line, I see that there are 12 lines, 12 allocs that cover almost all cases. The most useful thing to do would be to free all but the bottom 12 blocks during re-use, and possibly adjust the block size at that time. I will write this up more clearly in #1406

(The bottom-most line extends far to the left, cut off in the graph, showing that 16K is much too large for small sentences. But that seems OK...)

@linas linas mentioned this pull request Feb 1, 2023
@ampli
Copy link
Member

ampli commented Feb 1, 2023

The most useful thing to do would be to free all but the bottom 12 blocks during re-use, and possibly adjust the block size at that time.

But note that free/malloc cycles are costly, so maybe it is better not to free.
In addition, my memory-pool implementation has a subtle problem (which is most probably insignificant for us): It keeps the next-block pointers at the end of blocks. This may cause extra page faults when following the block chaining (if they got swapped out) in order to free them - free() may touch the blocks too, but at their start (I'm not sure). So maybe it is better to put the changing pointer at block starts instead, and if the common free() implementation doesn't touch the blocks then maybe it is even better to have a special pointer block instead of block chaining (similar to a file system) but this starts to be complex. Putting the chaining pointer at block start can also save space in case element alignment is requested because it doesn't need to be aligned to the element size, while in the current implementation, the reserved space for chaining (at block end) is equal to an element size.

BTW, the current memory-pool element alignment code is buggy, and I have a branch that fixes it (need to convert to PR). Other code locations also use non-optimal variable alignments (to be fixed in the same PR).

@linas
Copy link
Member Author

linas commented Feb 1, 2023

Most of this conversation should be moved to #1406

@linas
Copy link
Member Author

linas commented Feb 5, 2023

Here's the "final" graph showing the actual bounds implemented in estimate_tracon_entries() in count.ccirca line 150. The y-axis is given bypool_num_elements_issued(sent->Table_tracon_pool)incount.c`

djc-dj-cnt-bound-act

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants