You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For rebuilds of Hathi, I'm not so worried about this: what we really want to do is not rebuild the vocabulary list at all, but instead to just use the file that we've now created.
For Jstor DFR, the Underwood corpus, or other potential feature-count bookworms, however, we may want the faster version. I don't know what the cost is here, really.
My preference for doing this would be as a redefinition of that function so that the external wrappers can keep working. But we could also just switch back to using the Makefile to dispatch if that is easier. (It may be, because the current version is configured to read from stdin.)
The text was updated successfully, but these errors were encountered:
In wrapping up all the various bookworm calls into a command-line executable over the summer, I removed the ability to ingest unigrams.
I've now restored that, but not using the system calls to @organisciak's "fast_featurecounter.sh". Instead, it just calls a moved version of his (older?) function
write_word_ids_from_feature_counts
.For rebuilds of Hathi, I'm not so worried about this: what we really want to do is not rebuild the vocabulary list at all, but instead to just use the file that we've now created.
For Jstor DFR, the Underwood corpus, or other potential feature-count bookworms, however, we may want the faster version. I don't know what the cost is here, really.
My preference for doing this would be as a redefinition of that function so that the external wrappers can keep working. But we could also just switch back to using the Makefile to dispatch if that is easier. (It may be, because the current version is configured to read from stdin.)
The text was updated successfully, but these errors were encountered: