Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the expected behaviour when combining use_base_merging and several import module types? #1189

Open
gouttegd opened this issue Feb 22, 2025 · 6 comments
Assignees
Labels
documentation needs discussion Discussion, specification, or clarification required to proceed

Comments

@gouttegd
Copy link
Contributor

I am reviewing the the way the standard Makefile works regarding the import modules, and something puzzles me.

The type of import modules (e.g. slme, minimal, mirror, etc.) can be declared both at the level of the entire import_group and at the level of each individual module, as in:

import_group:
  module_type: slme    # <- default module type for all modules that do not have an explicit type
  products:
    - id: ro
    - id: pato
    - id: fbbt
      module_type: minimal

This will produce, in the standard Makefile, the following rules:

# Default rule, using the group-level module type;
# that rule will be used to produce ro_import.owl and pato_import.owl.
$(IMPORTDIR)/%_import.owl: $(MIRRORDIR)/%.owl $(IMPORTDIR)/%_terms_combined.txt
        $(ROBOT) ... # ROBOT pipeline to produce a SLME module

# Specific rule for the FBbt import module
$(IMPORTDIR)/fbbt_import.owl: $(MIRRORDIR)/fbbt.owl $(IMPORTDIR)/fbbt_terms_combined.txt
        $(ROBOT) ... # ROBOT pipeline to produce a minimal module

So far, so good.

But now let’s use base merging:

import_group:
  use_base_merging: True
  module_type: slme
  products:
    - id: ro
    - id: pato
    - id: fbbt
      module_type: minimal

This generates three rules:

# The rule that creates the (SLME) merged import module
$(IMPORTDIR)/merged_import.owl: $(MIRROR)/merged.owl $(IMPORTDIR)/merged_terms_combined.txt
        $(ROBOT) ... # ROBOT pipeline to produce a SLME module

# A generic rule to produce a SLME module
$(IMPORTDIR)/%_import.owl: $(MIRRORDIR)/%.owl $(IMPORTDIR)/%_terms_combined.txt
        $(ROBOT) ... # ROBOT pipeline to produce a SLME module

# A specific rule for the FBbt minimal import module
$(IMPORTDIR)/fbbt_import.owl: $(MIRRORDIR)/fbbt.owl $(IMPORTDIR)/fbbt_terms_combined.txt
        $(ROBOT) ... # ROBOT pipeline to produce a minimal module

Question is: what is even the point of the last two rules? They are not used anywhere. Under use_base_merging=True the only import module that matters is the merged_import.owl. In the example above, imported FBbt terms would end up in the merged import module, extracted using the SLME method, regardless of the module_type: minimal parameter declared in the ODK configuration file.

In fact, as soon as use_base_merging is used, I don’t think it even makes sense to have a per-module module type. How would that even work?

And if mixing the use of use_base_merging with per-module module types is not supposed to be supported, then I think we should document that, and maybe even add a check in odk.py to detect this kind of thing and warn the users that they are using a configuration that makes no sense.

@gouttegd gouttegd self-assigned this Feb 22, 2025
@gouttegd gouttegd added documentation needs discussion Discussion, specification, or clarification required to proceed labels Feb 22, 2025
@gouttegd
Copy link
Contributor Author

The case illustrated above is not merely a theoretical one. It happens for example in CL, which uses base merging but declares the omo import with the mirror module type:

import_group:
  use_base_merging: TRUE
  products:
    - id: pr
      # make_base, mirror_from and other parameters omitted for brevity
    - id: go
    - id: uberon 
    - id: ro
    - id: pato
    - id: ncbitaxon
    - id: ncbitaxondisjoints
    - id: omo
      module_type: mirror

The intention here may have been to use the entirety of OMO without “extracting” any kind of module, but that is not what actually happens. The omo_import.owl module is in fact never created at all (the rule that creates it exists in the Makefile, but is never called), and what is really imported from OMO is a SLME module (bundled in the merged_import.owl module) created from the list of OMO terms in $(IMPORTDIR)/omo_terms_combined.txt.

In fact, even if the omo_import.owl rule was actually called, it would not produce the desired result, because here is the rule:

$(IMPORTDIR)/omo_import.owl: $(MIRRORDIR)/merged.owl $(IMPORTDIR)/omo_terms_combined.txt
        if [ $(IMP) = true ]; then $(ROBOT) merge -i $< query --update ../sparql/preprocess-module.ru --updat>
    $(ANNOTATE_CONVERT_FILE); fi

Notice how it uses the merged mirror as input? The rule would actually merely create a copy of the entire merged mirror!

@gouttegd
Copy link
Contributor Author

It also happens in Uberon, which declares

  • omo as a mirror module;
  • orcidio as a custom module.

For the OMO module, it happens the same thing as with CL above: the omo_import.owl module is never created, and the imported OMO terms are extracted by SLME in the merged import module.

For the ORCIDIO module, it’s even more funny. In principles, imported ORCIDIO terms would end up in the merged_import.owl module, except that in the case of ORCIDIO, the terms are individuals, which are explicitly excluded from the SLME process (slme_individuals: exclude). Instead, the Uberon edit file explicitly imports the orcidio_import.owl module, which is supposed to be generated by a custom rule that was added in the custom uberon.Makefile. But that rule is never implicitly called from anywhere! In particular, it is not called when invoking make refresh-imports (which only refresh the $(IMPORT_FILES), which under use_base_merging=True only lists the merged_import.owl module).

So the only way to refresh the ORCIDIO import is to explicitly call make refresh-orcidio.

As a result, the ORCIDIO import in Uberon has not been refreshed since the last time someone thought about this module, 10 months ago.

@gouttegd
Copy link
Contributor Author

gouttegd commented Feb 22, 2025

I see three possible course of actions here:

(A) Decide that mixing base merging and per-import module type is not supported, period.

That is, when you do that:

import_group:
  use_base_merging: True
  module_type: slme
  products:
    - id: ro
    - id: pato
    - id: fbbt
      module_type: minimal

then the fbbt-specific module type is silently ignored and FBbt is imported as a SLME module extracted from merged bases as for all the other modules.

This is, in effect, the current situation. I would have no objection to keeping things like that, but I think it should at least be properly documented. (And the Makefile template should be fixed so that it does not contain useless rules that only serve to increase confusion.)

(B) Likewise, but we error out instead of silently ignoring the problem.

We add some logic in the odk.py script to detect cases like the one above and error out with an “invalid configuration” message (something like “use of module_type at the individual module level is incompatible with use_base_merging”).

(C) Decide that it is in fact allowed to use per-import module type at the same time as base merging.

If we go that route, I think the only sensible behaviour is to exclude modules that have their own module type from the merged import module. In the example above, that would mean that we would create both: a merged_import.owl module obtained from SLME extraction from ro-base and pato-base, and a fbbt_import.owl module obtained from minimal extraction from FBbt.

My concern with that approach is that it would undoubtedly break existing configurations. For example, in both Uberon and CL this would make all OMO terms (that are currently present in merged_import.owl) suddenly disappear from the ontology entirely, until their edit files are updated to explicitly import omo_import.owl.

@matentzn
Copy link
Contributor

This was bad design on my part, sorry.. Maybe the best way of handling the situation would be a flag for each import to denote it should not be incorporated in the merged import module (a more explicit variant of your (C), like exclude_from_merged: true or some such), these being treated as stand-alones. This will have the least ripple effect in a new version, as I am sure there are many installations that have accidentally maintained module_type etc on their modules even if using base merging.

@gouttegd
Copy link
Contributor Author

a flag for each import to denote it should not be incorporated in the merged import module (a more explicit variant of your (C), like exclude_from_merged: true or some such)

I don’t like the fact that such a flag would be somewhat redundant¹ (the mere fact that an import has its own module_type should be enough to flag it as “exclude-me-from-the-merged-module”), but if we want to avoid breaking existing installations I tend to agree that this is the best solution.


¹There are several cases of similar redundant options in the config file that at some point I’d like to get rid of, like for example the way we describe a component that is generated from mappings:

components:
  products:
    - filename: my-sssom-derived-components.owl
      use_mappings: True
      mappings:
        - mapping_set_from_which_the_component_is_derived.owl

The mere presence of a non-empty mappings list should be enough to infer that the component is to be derived from mappings, there is no real need for that use_mappings. Such redundant options create the possibility of having invalid/inconsistent configuration.

@gouttegd
Copy link
Contributor Author

So I propose:

For 1.6: Statu quo. We preserve the existing behaviour (option A above: basically, module_type when used on a specific module while use_base_merging is also used has no effect), we simply make it more obvious by (a) not generating the rules that are not in fact used and (b) explicitly document that module_type on specific modules is incompatible with use_base_merging and is ignored.

For later (maybe 1.7): Add an option to explicitly exclude a module from the main merged module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation needs discussion Discussion, specification, or clarification required to proceed
Projects
None yet
Development

No branches or pull requests

2 participants