Resize tracon count table #1403

linas · 2023-01-30T22:27:30Z

Based on the measurements made in #1402, we now have an accurate estimate for the number of tracons that will actually be used during parsing, both for contentional and for MST parsing. Implement those estimates.

linas · 2023-01-30T22:29:58Z

@ampli I am merging this now, because it seems like the right thing to do. But please review; I do appreciate the comments you make.

ampli · 2023-01-31T00:13:56Z

I found several problems.
It seems there is now a slowdown for null_count>1 (it needs a much bigger table then), and the resize doesn't work as needed.
Since I'm going to sleep soon, I will just post some partial findings.
I used the sentence And yet he should be always ready..., with arguments -lim=10000 -v=5 -de=table_alloc.
Note the debug message from table_alloc():

...
verbosity set to 5
debug set to table_alloc
link-grammar: Info: Dictionary found at ./data/en/4.0.dict
limit set to 10000
link-grammar: Info: Dictionary version 5.12.1, locale en_US.UTF-8
link-grammar: Info: Library version link-grammar-5.12.1. Enter "!help" for help.
#### Finished tokenizing (174 tokens)
++++ Split sentence                              0.10 seconds
++++ Finished expression pruning                 0.00 seconds
++++ Built disjuncts                             0.07 seconds
++++ Eliminated duplicate disjuncts              0.04 seconds
++++ Encoded for pruning                         0.08 seconds
++++ power pruned (for 0 nulls)                  0.05 seconds
++++ Built mlink_table                           0.00 seconds
++++ power pruned (for 0 nulls)                  0.01 seconds
++++ pp pruning                                  0.00 seconds
++++ power pruned (for 0 nulls)                  0.00 seconds
++++ Built mlink_table                           0.00 seconds
++++ power pruned (for 0 nulls)                  0.01 seconds
++++ Encoded for parsing                         0.00 seconds
++++ Initialized fast matcher                    0.00 seconds
Trace: table_alloc: Connector table size 4194304
Trace: table_alloc: Connector table size 1
++++ Counted parses (0 w/0 nulls)                0.62 seconds
++++ Finished parse                              0.00 seconds
No complete linkages found.
++++ Finished expression pruning                 0.00 seconds
++++ Built disjuncts                             0.06 seconds
++++ Eliminated duplicate disjuncts              0.04 seconds
++++ Encoded for pruning (one-step)              0.11 seconds
++++ power pruned (for 1 null)                   0.05 seconds
++++ Built mlink_table                           0.00 seconds
++++ power pruned (for 1 null)                   0.01 seconds
++++ pp pruning                                  0.00 seconds
++++ power pruned (for 1 null)                   0.01 seconds
++++ Built mlink_table                           0.00 seconds
++++ power pruned (for 1 null)                   0.01 seconds
++++ Encoded for parsing                         0.00 seconds
++++ Initialized fast matcher                    0.00 seconds
Trace: table_alloc: Connector table size 8388608
Trace: table_alloc: Connector table size 1
Trace: table_alloc: Connector table size 1
link-grammar: Warning: insanely large tracon hash table size: 33554432
++++ Counted parses (2147483647 w/1 null)       14.39 seconds
++++ Built parse set                             1.98 seconds
...

In addition, if this is as intended, then I don't understand it:

link-grammar/link-grammar/parse/count.c

Lines 130 to 152 in 8c945d0

    
            * Provide an estimate for the number of Table_tracon entries that will 
        
            * be needed. 
        
            * 
        
            * The number of entries actually used was measured in discussion 
        
            *     https://github.com/opencog/link-grammar/discussions/1402 
        
            * Based on this, an upper bound on the entries needed is 
        
            *    3 * num_disjuncts * log_2(num_words) 
        
            * i.e. more than this is almost never needed. A lower bound is 
        
            *    0.5 * num_disjuncts * log_2(num_words) 
        
            * i.e. more than this is *always* needed. 
        
            * 
        
            * In both conventional and MST dictionaries, more than 500K entries is 
        
            * almost never needed. In a handful of extreme cases, 2M was observed. 
        
            */ 
        
           static unsigned int estimate_tracon_entries(Sentence sent) 
        
           { 
        
           	unsigned int nwords = sent->length; 
        
           	unsigned int log2_nwords = 0; 
        
           	while (nwords) { log2_nwords++; nwords >>= 1; } 
        
           	unsigned int tblsize = 3 * log2_nwords * sent->num_disjuncts; 
        
           	if (tblsize < 512) tblsize = 512; // Happens rarely on short sentences. 
        
           	return tblsize;

link-grammar/link-grammar/parse/count.c

Lines 1639 to 1643 in 8c945d0

    
           unsigned int num_elts = estimate_tracon_entries(sent); 
        
           sent->Table_tracon_pool = 
        
           	pool_new(__func__, "Table_tracon", 
        
           	         num_elts, sizeof(Table_tracon), 
        
           	         /*zero_out*/false, /*align*/false, /*exact*/false);

You allocate here at once a memory block equal to the upper bound.
Even sentences that need half of the upper bound will get this upper bound as the initial allocation.
And if even an additional single entry will be needed, then pool_alloc() will allocate an additional memory block equal to the upper bound!
In my opinion, the lower bound should be clearly used here.

I also found that the ru batch runs now slower by ~6%.

Regarding table size:

link-grammar/link-grammar/parse/count.c

Lines 115 to 116 in 8c945d0

    
           uint32_t table_size; 
        
           size_t table_mask;

You have changed it from size_t to uint32_t (the mask inconsistently remained size_t).
I intentionally made it size_t, so it can get over 2^31, for the generation mode (and for regular parsing with nulls of long sentences).
When I use link-generator -s 8 -l en --verbosity=5 --debug=table_stat I get:
link-grammar: Warning: insanely large tracon hash table size: 134217728.

linas · 2023-01-31T02:07:54Z

size_t hash

OK, I did not understand. Restored in 1193ddd

linas · 2023-01-31T04:09:33Z

For me, And yet he should be always ready... parses without nulls; and power-prune left only 74 disjuncts.

I added print statements to count.c here:

link-grammar/link-grammar/parse/count.c

Line 210 in 1193ddd

ctxt->table_size *= 2; /* Double the table size */

to see if the table doubles past 32 bits. I haven't seen that yet. Whatever; I will change things to use size_t and remove the warning.

Also, link-generator crashed because of a null pointer deref; I worked around this in 7e2e9d3 -- I don't know why this happened.

I removed the large-table warning and went to size_t tables in commit 8b87635

I changed the pool chunk size in 7951ed8 ... I had not previously understood this was a chunk size. Perhaps it might be better to hard-code this size, after all?

The last four changes were direct pushes into the repo; would it be better if I had made these into pull reqs, instead?

Per discussion in #1403

ampli · 2023-01-31T05:43:12Z

For me, And yet he should be always ready... parses without nulls; and power-prune left only 74 disjuncts.

I meant the famous sentence from fix-long:
And yet he should be always ready to have a perfectly terrible scene, whenever we want one, and to become miserable, absolutely miserable, at a moment’s notice, and to overwhelm us with just reproaches in less than twenty minutes, and to be positively violent at the end of half an hour, and to leave us for ever at a quarter to eight, when we have to go and dress for dinner when, after that, one has seen him for really the last time, and he has refused to take back the little things he has given one, and promised never to communicate with one again, or to write one any foolish letters, he should be perfectly broken-hearted, and telegraph to one all day long, and send one little notes every half-hour by a private hansom, and dine quite alone at the club, so that every one should know how unhappy he was.

But now I see the reason of Trace: table_alloc: Connector table size 1:

link-grammar/link-grammar/parse/count.c

Lines 471 to 473 in 7951ed8

    
           static void table_grow(count_context_t *ctxt) 
        
           { 
        
           	table_alloc(ctxt, 0);

link-grammar/link-grammar/parse/count.c

Lines 195 to 212 in 7951ed8

    
           	size_t reqsz = 1ULL << logsz; 
        
           	if (0 < logsz && reqsz <= ctxt->table_size) return; // It's big enough, already. 
        
           	lgdebug(+D_COUNT, "Connector table size %lu\n", reqsz); 
        
           #if HAVE_THREADS_H && !__EMSCRIPTEN__ 
        
           	// Install a thread-exit handler, to free kept_table on thread-exit. 
        
           	static once_flag flag = ONCE_FLAG_INIT; 
        
           	call_once(&flag, make_key); 
        
           	if (NULL == kept_table) 
        
           		tss_set(key, &kept_table); 
        
           #endif /* HAVE_THREADS_H && !__EMSCRIPTEN__ */ 
        
           	if (logsz == 0) 
        
           		ctxt->table_size *= 2; /* Double the table size */ 
        
           	else 
        
           		ctxt->table_size = reqsz;

Regarding the table size print at line 198, the intention was to print its size after the doubling (the previous code did that).

The original code has this:
#define MAX_LOG2_TABLE_SIZE ((sizeof(size_t)==4) ? 25 : 34)

Now there is no limit. In 32 bits it is not a good idea to let it become too big.

	 * FYI: the new tracon tables are (much?) smaller than the older
	 * connector tables, so maybe this reuse is no longer needed?

Note that we observed the problem of big allocation when the table already used tracons.

if (tblsize < 512) tblsize = 512; // Happens rarely on short sentences.

The original minimum size was 4096, found after profiling. I will check it again later.
The reason of the ru slowdown is not yet clear to me, mas changing it to 4096 didn't solve the slowdown.

More later.

ampli · 2023-02-01T01:28:51Z

Perhaps it might be better to hard-code this size, after all?

I don't find a benefit in hard coding that.
It depends on the expected number of elements and their distribution.
If you use very big blocks you may waste virtual memory and maybe need to have more pages in memory (I'm not sure about the impact). Too small allocations has too much allocation overhead. However, if you use the zero_out argument too big allocations cause CPU overhead due to clearing of unused memory.

ampli · 2023-02-01T01:32:20Z

For me, And yet he should be always ready... parses without nulls; and power-prune left only 74 disjuncts.

I added print statements to count.c here:

link-grammar/link-grammar/parse/count.c

Line 210 in 1193ddd

ctxt->table_size *= 2; /* Double the table size */

to see if the table doubles past 32 bits. I haven't seen that yet. Whatever; I will change things to use size_t and remove the warning.

Also, link-generator crashed because of a null pointer deref; I worked around this in 7e2e9d3 -- I don't know why this happened.

I removed the large-table warning and went to size_t tables in commit 8b87635

I changed the pool chunk size in 7951ed8 ... I had not previously understood this was a chunk size. Perhaps it might be better to hard-code this size, after all?

The last four changes were direct pushes into the repo; would it be better if I had made these into pull reqs, instead?

For small changes like typos, comment fixes, code formatting and insignificant changes (like making initializations at declarations instead of code body), direct pushes seems fine. But for more significant changes I thing PR are preferable.

ampli · 2023-02-01T01:38:26Z

I don't know why this happened.

Mysterious to me too, for now.
BYW, I have a branch to replace the finding of unused disjuncts (the current code is wrong) but I still need to complete it. But before I can send it I have to send the PR of improving the generation speed (which is also not complete...).

linas · 2023-02-01T07:56:33Z

Perhaps it might be better to hard-code this size, after all?

I don't find a benefit in hard coding that.

You had originally set it to 16384:

link-grammar/link-grammar/parse/count.c

Lines 1605 to 1608 in 30065d6

    
           sent->Table_connector_pool = 
        
           	pool_new(__func__, "Table_connector", 
        
           	         /*num_elements*/16384, sizeof(Table_connector), 
        
           	         /*zero_out*/false, /*align*/false, /*exact*/false);

I changed this to be dynamic, but I mis-understood how pools worked when I made this change. That number in pool_new seems to be a chunk-size. With pool-reuse, the size in that initial pool_new then governs everything that comes after, forevermore. If its too small, then that's bad... I'm setting it back to 16384 now ...

Let me know if my logic is still bad, here.

Per discussion on opencog#1403, and especially comment opencog#1403 (comment) and also opencog#1403 (comment)

linas · 2023-02-01T08:25:09Z

#define MAX_LOG2_TABLE_SIZE ((sizeof(size_t)==4) ? 25 : 34)

OK, I'm putting this back now. See #1405

ampli · 2023-02-01T20:11:46Z

Perhaps it might be better to hard-code this size, after all?

I don't find a benefit in hard coding that.

You had originally set it to 16384:

Sorry, I totally misunderstood you. For some reason, I thought that we are talking about hard coding a value in the memory-pool management...

With pool-reuse, the size in that initial pool_new then governs everything that comes after, forevermore.

pool_reuse() leaves the currently allocated memory as is, and reuses the allocated blocks. It allocates more blocks (of the same size) if needed. The next pool-reuse(), if any, makes the same thing.

We can think of other strategies for the pool block size that may have a benefit:

Increase the block size on any allocation. This will reduce the number of allocations if the initially guessed blocked size was too small.
Round the requested block sizes to the near power-of-2 less the malloc() overhead area, in a try to get the most from the allocated virtual memory. This will also tend to use few block sizes and hopefully would reduce fragmentation.

linas · 2023-02-01T21:25:46Z

Perhaps it might be better to hard-code this size, after all?

I don't find a benefit in hard coding that.

You had originally set it to 16384:

We can think of other strategies for the pool block size that may have a benefit:

A fixed size 16384 seems OK. (Or maybe 16383 would be better) Why? The "alloced table connectors" graph in
#1402 (comment)

repost:

Each horizontal line is a re-alloc of 16K elements. Counting the line, I see that there are 12 lines, 12 allocs that cover almost all cases. The most useful thing to do would be to free all but the bottom 12 blocks during re-use, and possibly adjust the block size at that time. I will write this up more clearly in #1406

(The bottom-most line extends far to the left, cut off in the graph, showing that 16K is much too large for small sentences. But that seems OK...)

ampli · 2023-02-01T22:11:55Z

The most useful thing to do would be to free all but the bottom 12 blocks during re-use, and possibly adjust the block size at that time.

But note that free/malloc cycles are costly, so maybe it is better not to free.
In addition, my memory-pool implementation has a subtle problem (which is most probably insignificant for us): It keeps the next-block pointers at the end of blocks. This may cause extra page faults when following the block chaining (if they got swapped out) in order to free them - free() may touch the blocks too, but at their start (I'm not sure). So maybe it is better to put the changing pointer at block starts instead, and if the common free() implementation doesn't touch the blocks then maybe it is even better to have a special pointer block instead of block chaining (similar to a file system) but this starts to be complex. Putting the chaining pointer at block start can also save space in case element alignment is requested because it doesn't need to be aligned to the element size, while in the current implementation, the reserved space for chaining (at block end) is equal to an element size.

BTW, the current memory-pool element alignment code is buggy, and I have a branch that fixes it (need to convert to PR). Other code locations also use non-optimal variable alignments (to be fixed in the same PR).

linas · 2023-02-01T22:17:36Z

Most of this conversation should be moved to #1406

linas · 2023-02-05T05:23:13Z

Here's the "final" graph showing the actual bounds implemented in estimate_tracon_entries() in count.ccirca line 150. The y-axis is given bypool_num_elements_issued(sent->Table_tracon_pool)incount.c`

linas added 5 commits January 30, 2023 15:13

Redesign connector table sizing, based on opencog#1402

6b03295

Audit and update how the table load factor works.

a888fe5

Use an appropriate size estimate.

1169a54

Fix assorted typos

8cefcdc

Check for crazy hash table size.

6b150ea

linas merged commit d80760e into opencog:master Jan 30, 2023

linas deleted the counting-table branch January 30, 2023 22:30

linas added a commit that referenced this pull request Jan 31, 2023

Restore hash size, per discussion in #1403

1193ddd

linas added a commit that referenced this pull request Jan 31, 2023

Remove the table size check, per discussion in #1403

8b87635

linas added a commit that referenced this pull request Jan 31, 2023

Change tracon pool chunk size to the min estimate

7951ed8

Per discussion in #1403

linas added a commit to linas/link-grammar that referenced this pull request Feb 1, 2023

Assorted fixes for recently introduced issues.

93a06fc

Per discussion on opencog#1403, and especially comment opencog#1403 (comment) and also opencog#1403 (comment)

linas mentioned this pull request Feb 1, 2023

Assorted fixes for recently introduced issues. #1405

Merged

linas mentioned this pull request Feb 1, 2023

pool management idea... #1406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resize tracon count table #1403

Resize tracon count table #1403

linas commented Jan 30, 2023

linas commented Jan 30, 2023

ampli commented Jan 31, 2023 •

edited

Loading

linas commented Jan 31, 2023

linas commented Jan 31, 2023 •

edited

Loading

ampli commented Jan 31, 2023

ampli commented Feb 1, 2023

ampli commented Feb 1, 2023

ampli commented Feb 1, 2023

linas commented Feb 1, 2023

linas commented Feb 1, 2023

ampli commented Feb 1, 2023

linas commented Feb 1, 2023

ampli commented Feb 1, 2023

linas commented Feb 1, 2023

linas commented Feb 5, 2023

Resize tracon count table #1403

Resize tracon count table #1403

Conversation

linas commented Jan 30, 2023

linas commented Jan 30, 2023

ampli commented Jan 31, 2023 • edited Loading

linas commented Jan 31, 2023

linas commented Jan 31, 2023 • edited Loading

ampli commented Jan 31, 2023

ampli commented Feb 1, 2023

ampli commented Feb 1, 2023

ampli commented Feb 1, 2023

linas commented Feb 1, 2023

linas commented Feb 1, 2023

ampli commented Feb 1, 2023

linas commented Feb 1, 2023

ampli commented Feb 1, 2023

linas commented Feb 1, 2023

linas commented Feb 5, 2023

ampli commented Jan 31, 2023 •

edited

Loading

linas commented Jan 31, 2023 •

edited

Loading