-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explainST for multiword/extragrade subgroups #15
Comments
Adding "acrudoxic" record to the subgroup formative element dictionary seems to "fix" the formatting, but does not auto-concatenate on the result with the "plinthic" explanation
Need to figure out @dylanbeaudette 's workflow and plans for additions and updates to these dictionaries |
There are at least three sub-issues here:
It is wide-open. My original goal was to have a simple / concise definition for all taxa, starting with what I could glean from textbooks (Buol et al., Schaetzl and Anderson, etc.). The first draft can (should IMHO) specify simple (atomic) and compound formative elements, favoring longer matches ("acrudoxic" vs. "acr"). This specific example shouldn't be a problem because only the subgroup formative element dictionary will be used on the subgroup chunk of taxa text. Explanations will include a compact description of compound formative elements (mostly relevant at the subgroup). Long-term:
|
I'm going to spend a little time right now looking at the non-exported functions related to |
Made some minor upgrades / fixes to the "explanation" of multi-term SG taxa, and "?" place holder for incomplete dictionary entries. Prior code would only flag missing (NA vs. empty-string) entries. cat(explainST('acrudoxic plinthic kandiudults'))
|
I don't think a fully generic atomic system is worthwhile at this point--more or less same reason as with your comment below on context-dependent meanings of some elements.
This is about as far as my reasoning went on it. My understanding is the searches are constrained. I didn't take the time to trace it to the origin of the spurious match but rather guessed what it was doing. Just saw your fix!
That sounds good for a conceptual basis and I support all that. Thinking back to where I was when I made this issue I was more referring to the nuts and bolts of having a process for updating and an idea for what the metadata for each of the columns in those tables are. For instance in subgroup we have:
However, I don't think those columns are currently defined in e.g. the man page for Essentially: what do they mean, and how do we know when/how to fill them in for the currently-blank ones?
So, in the current datasets order to subgroup we have:
Good long term goal, and I think even context-dependent definitions of formative elements could probably be handled eventually.
There is a lot of potential there. This should probably be a separate issue. Does it relate to #10? |
Great. Further discussion / planning would greatly benefit from pen/paper or at least in front of a white board. I'd like to move "planning" content and ideas into a more permanent location / issues so that we can close this issue--the core problem is (I think) resolved.
|
I can confirm that 274 out of 275 multiword subgroups work as expected. The one that doesnt work is a subgroup in I have committed two relevant files in d6cc06a and am closing this issue. |
Since my updates to fix parsing for all subgroups in the #14 "rearrangement"
explainST
"works" for all modern taxa. However, it still does some funky stuff -- I BELIEVE at least in part because of the specific definitions in the lookup table for formative elements.For instance -- "acr" is expected to be a great group element, so when used in the extragrade subgroups e.g. "acrudoxic plinthic ..." the formatting is wrong. "acr" matches "acrudoxic" and positions the empty space wrong. This is probably some special handling that will be needed to "explain" and define relevant explanations for multiword/extragrade subgroups
I am fairly sure these types of things can be resolved with some careful manual review of "likely suspects" and random spot checks and mostly making new entries in the lookup table as appropriate to handle unintended matches (#5 related). E.g. "acrudoxic" actually comprises 3 formative elements within itself. Do we match them all and somehow concatenate? Or do we simply define "acrudoxic" as some aggregate formative element with a unique definition?
The text was updated successfully, but these errors were encountered: