Restoring fast feature counting #89

bmschmidt · 2015-12-03T21:23:07Z

In wrapping up all the various bookworm calls into a command-line executable over the summer, I removed the ability to ingest unigrams.

I've now restored that, but not using the system calls to @organisciak's "fast_featurecounter.sh". Instead, it just calls a moved version of his (older?) function write_word_ids_from_feature_counts.

For rebuilds of Hathi, I'm not so worried about this: what we really want to do is not rebuild the vocabulary list at all, but instead to just use the file that we've now created.

For Jstor DFR, the Underwood corpus, or other potential feature-count bookworms, however, we may want the faster version. I don't know what the cost is here, really.

My preference for doing this would be as a redefinition of that function so that the external wrappers can keep working. But we could also just switch back to using the Makefile to dispatch if that is easier. (It may be, because the current version is configured to read from stdin.)

The text was updated successfully, but these errors were encountered:

bmschmidt · 2018-11-16T18:02:34Z

Fold into #134

bmschmidt closed this as completed Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restoring fast feature counting #89

Restoring fast feature counting #89

bmschmidt commented Dec 3, 2015

bmschmidt commented Nov 16, 2018

Restoring fast feature counting #89

Restoring fast feature counting #89

Comments

bmschmidt commented Dec 3, 2015

bmschmidt commented Nov 16, 2018