Akash <> JZFS: Verifiable, Version Control Data Pipeline #577
taoshengshi
started this conversation in
AI - Artificial Intelligence
Replies: 1 comment 1 reply
-
Thank you for opening up this discussion @taoshengshi . Is there a direct ask from this discussion? Is this a flow that you have tested on the Akash Network yet? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Introduction
There is a need to establish the end-to-end reasoning behind data driven decisions, and in particular reasoning incorporating data supplied by many parties.
Several trends make this need increasingly urgent.
First, the trend towards using public and open source data to substantiate decisions increases the utility of expressing these decisions transparently and verifiably.
Second, decisions are increasingly automated with IOT, smart contract and AI subsystems forming parts of the decision making pipeline. Explicitly recording this pipeline is necessary to substantiate trust in the end result.
Third, generative AI heightens the need to rigorously track the provenance of data to combat misinformation.
Fourth, decisions in critical areas such as environmental sustainability and AI ethics and safety rely on rapidly evolving research which carries an imperative to explicitly lay out methodologies so that they can be reproduced, challenged, improved, and rapidly applied.
Fifth, the introduction of more powerful and general zero knowledge systems increases the necessity of tracing decisions end-to-end so that proven claims relying on little revealed information can be put into context.
Version Control pipeline
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering.
There are two categories of pipeline abstractions to be aware about:
The pipeline which is the most powerful object encapsulating all other pipelines.
Task-specific pipelines are available for audio, computer vision, natural language processing, and multimodal tasks.
Data Lineage is an interactive tool that facilitates a holistic view of how data flows through the JZFS and Akash.
With Data Lineage, you can:
Architecture
Data-driven decision making requires combining trusted information using arbitrary compute operations. Here, we introduce a protocol called Operad which allows end-to-end tracing of data provenance over computational pipelines. The protocol models pipelines as morphisms in symmetric monoidal categories (operad morphisms) anchored in content-addressed data types. This allows both reproducible and non-reproducible real-world processes to be modeled as wiring diagrams, in which elements of the process diagram relate to elements of data and provenance wiring diagrams with a corresponding structure. Together, data, transformations and provenance form a three layer system. This allows a rigorous examination of the flow of trust through complex and multi-party computational processes.
Data Layer
The data layer refers to the content itself, such as the input information provided by end users or API endpoints
Provenance Layer (JZFS)
The provenance layer validates the final outcome of the data based on community standards and provides a seal of authenticity
Transformation Layer (Akash)
The transformation layer shows how data is processed through code or an AI model
Use Cases
Collect data and prove where it came from
Effortlessly create forms and data streams with a provenance trail
Protect your data and code with a digital fingerprint
Everything you do in Operad.ai is content addressed through IPFS so you can detect the smallest changes
Use powerful AI models in a verifiable environment
Prove the origins of AI-generated content to attribute credit... or blame
Share on your terms, charge for every use or every API call
Keep your data and code private. Allow others to use it, and get paid whenever it is run or accessed
PS
GitData Labs
Embrace Data-Centric AI with GitData
https://gitdata.ai/
JZFS
An Git-like version control file system for data lineage & data collaboration.
https://github.com/GitDataAI/jzfs
Beta Was this translation helpful? Give feedback.
All reactions