Skip to content

Commit

Permalink
citations
Browse files Browse the repository at this point in the history
  • Loading branch information
slobentanzer committed Dec 9, 2023
1 parent a6a47d7 commit 39f37d0
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions content/24.sup.note.4.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ In addition, some features may be incompatible, and thus, one centrally maintain
With BioCypher, each of the above languages can be adopted as the basis for a particular knowledge graph; in fact, we use the Biolink model as a basic ontology.
Inside our framework, these languages can be freely and transparently exchanged, modified, extended, and hybridised, as we show in several of our case studies (e.g., “Tumour board” extends Biolink with Sequence Ontology and Disease Ontology).

3) KG frameworks provide a means to build KGs, similar to the idea of BioCypher 14;[@doi:10.1101/631812];[@doi:10.1101/2020.08.17.254839];[@doi:10.1186/s12859-022-04932-3].
3) KG frameworks provide a means to build KGs, similar to the idea of BioCypher 14;[@doi:10.1101/631812;@doi:10.1101/2020.08.17.254839;@doi:10.1186/s12859-022-04932-3].
However, most tie themselves tightly to a particular standard format or modelling language ecosystem, thereby inheriting many of the limitations described above.
The Knowledge Graph Hub provides a data loader pipeline, KGX allows conversion of KGs between different technical formats, and RTX-KG2 builds a fixed semantically standardised KG; all three adhere to the Biolink model [@doi:10.1101/2020.08.17.254839;@doi:10.1186/s12859-022-04932-3].
Bio2BEL is an extensive framework to transform primary databases into BEL [@doi:10.1101/631812].
PheKnowLator is the only tool that is conceptually similar to BioCypher in that it allows the creation of knowledge graphs under different data models 14.
However, it appears to be aimed at knowledge representation experts, requiring considerable bioinformatics and ontology expertise.
While being fully customisable, it does not feature flexible recombination of modular components.

The strategy of subgraph extraction to yield smaller, user-specific KGs has been implemented previously, for instance by CROssBAR (v1), ROBOKOP, and the BioThings Explorer [@doi:10.1093/nar/gkab543];[@doi:10.1093/bioinformatics/btz604];[@doi:10.1186/s12859-018-2041-5].
The strategy of subgraph extraction to yield smaller, user-specific KGs has been implemented previously, for instance by CROssBAR (v1), ROBOKOP, and the BioThings Explorer [@doi:10.1093/nar/gkab543;@doi:10.1093/bioinformatics/btz604;@doi:10.1186/s12859-018-2041-5].
However, these rely on single (and thus enormous) harmonised KGs for extracting the subgraphs as opposed to BioCypher’s modular approach [@doi:10.1111/cts.12592].
While the “top-down” approach of first building a massive KG and then extracting subgraphs from it is a valid means to arrive at a particular knowledge representation, the effort involved is detrimental to efficiency and democratisation of the process.
A secondary consequence of this large primary effort is that alternative representations of the initial KG will probably not be attempted, hindering flexible knowledge representation.
Expand All @@ -37,7 +37,7 @@ We aim to close this gap by providing an agile and modular framework that facili

There exist alternatives to workflows that involve KGs.
While the premise of our manuscript is that KGs are an important part of sustainable and trustworthy machine learning in the biomedical sciences, “zero domain knowledge” approaches such as UniHPF [@doi:10.48550/arXiv.2211.08082] can do without prior knowledge in their inference process.
Whether methods that forego knowledge representation entirely can be as good or better than methods that use knowledge representation is still a matter of discussion [@doi:10.1038/s41551-022-00942-x];[@doi:10.1101/2022.05.01.489928];[@doi:10.1101/2022.12.07.22283238];[@doi:10.48550/arxiv.2210.09338];[@doi:10.1016/j.artint.2021.103627];[@doi:10.48550/arXiv.2205.15952];[@doi:10.1093/bioinformatics/btac001].
Whether methods that forego knowledge representation entirely can be as good or better than methods that use knowledge representation is still a matter of discussion [@doi:10.1038/s41551-022-00942-x;@doi:10.1101/2022.05.01.489928;@doi:10.1101/2022.12.07.22283238;@doi:10.48550/arxiv.2210.09338;@doi:10.1016/j.artint.2021.103627;@doi:10.48550/arXiv.2205.15952;@doi:10.1093/bioinformatics/btac001].
One aspect that is apparent from modern developments in large language models is that prior knowledge-free models appear to be very data hungry; while billion parameter models are very impressive in their text and image processing capabilities, we do not nearly have enough data in molecular biomedicine to train a GPT-like model, even if we had the funds to train it.
In addition, even in prior knowledge-free deep models, a semantically enriched knowledge graph can still play a role and be useful as an in-process component [@doi:10.1609/aaai.v36i10.21286].
To address these and other performance-related questions, we want to facilitate the creation of benchmarks and standard datasets through the modular nature of our framework.

0 comments on commit 39f37d0

Please sign in to comment.