You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nothing too congealed, as of yet, but I think a powerful tool for users of database-dependent tools, developers of those tools, and curators of those databases might have features as follows:
management of downloaded/installed databases via a system daemon, akin perhaps to the Docker daemon, to promote a consistent interface into the management and retrieval of databases;
the daemon able to report useful information about the version and change history of the database, and restore a database to an earlier version on demand so that earlier analyses can be fully repeated;
API's and language bindings in Java, Python, Perl, and C (stuff like Protocol Buffers makes this somewhat easier) allow developers to add functionality to interrogate the daemon (if necessary) to resolve references to the necessary databases in their bioinformatics tools and pipelines;
Distributed storage of databases via IPFS, perhaps, to prevent traffic and bandwidth bottlenecks across cluster environments and elsewhere, with the daemon perhaps able to make intelligent decisions about which databases are hosted locally and which are pulled from the distributed network on demand;
a secure, content-based addressing system so that the same system can distribute open and closed databases, and that integrity of the data can be assured
Right now I imagine a system that's a bit like a mash-up of Git and the user experience of Docker, but for big databases instead of containers. Running on top of IPFS maybe to handle distribution.
The text was updated successfully, but these errors were encountered:
Nothing too congealed, as of yet, but I think a powerful tool for users of database-dependent tools, developers of those tools, and curators of those databases might have features as follows:
management of downloaded/installed databases via a system daemon, akin perhaps to the Docker daemon, to promote a consistent interface into the management and retrieval of databases;
the daemon able to report useful information about the version and change history of the database, and restore a database to an earlier version on demand so that earlier analyses can be fully repeated;
API's and language bindings in Java, Python, Perl, and C (stuff like Protocol Buffers makes this somewhat easier) allow developers to add functionality to interrogate the daemon (if necessary) to resolve references to the necessary databases in their bioinformatics tools and pipelines;
Distributed storage of databases via IPFS, perhaps, to prevent traffic and bandwidth bottlenecks across cluster environments and elsewhere, with the daemon perhaps able to make intelligent decisions about which databases are hosted locally and which are pulled from the distributed network on demand;
a secure, content-based addressing system so that the same system can distribute open and closed databases, and that integrity of the data can be assured
Right now I imagine a system that's a bit like a mash-up of Git and the user experience of Docker, but for big databases instead of containers. Running on top of IPFS maybe to handle distribution.
The text was updated successfully, but these errors were encountered: