Skip to content

Latest commit

 

History

History
102 lines (66 loc) · 4.16 KB

README.md

File metadata and controls

102 lines (66 loc) · 4.16 KB

A2C2 width=

A2C2 - Natural Language-Intructed Autonomous Agent for Computer Control

This repository is for 👨‍💻 developing / 🛠️ constructing / 🧪 testing and 🚀 moonshoting ideas for our bachelor thesis: Natural Language-Intructed Autonomous Agent for Computer Control (A2C2)

As part of module Machine Learning Operations, we developed a prototype of an A2C2 and integrated several tools that we learnt about in the module to represent the development of our prototype in an ML pipeline.

Motivation

Why do we need an A2C2 Alternativer Text

  • The ultimate AI application
  • Assistant in using computer systems
  • Helpful in everyday task

Current challenges on the way to an A2C2 Alternativer Text

  • Data Generation - How and where to collect training data? -
  • Dynamic Action Inference - How can the actions relevant for the instruction be determined? -
  • Refinement with the User - Where does it require further information from the user? -
  • User Interaction for Critical Tasks - When are further enquiries to the user necessary? -

Goal

  • User friendly Chatbot
  • Critical Task Detection
  • Missing Information Detection
  • Basic Pipeline for ViT Training

Components

Components

UI

  • Screen Captioning
  • Chatbot
  • I/O Execution

Data Storage

  • Storing Experience Embeddings

Planning

  • Task Decomposition & Refinement

Web-Crawler

  • Gathering real-life Data

ViT Training

  • Model Fine-Tuning

Conversational Validation Component

  • Critical Task Detection
  • Missing Information Detection

LLM & VLM

Browser

Components & ML- Tools

Components

  1. UI interacts with planning component through REST

    • UI with Tkinter, Python & pyautogui
    • Interaction through REST with FastAPI
  2. Planning Ccomponent does RAG for gathering more information

    • Data storage (decomposition prompts & planning prompts) with oxen.ai
  3. Conversational - validator checks if critical action or missing information

  4. Planning component utilize model for user instruction interpretation & visual analysis

    • GPT-4 Vision (first)
    • YOLOv8 (fine-tuned but not yet optimized for utilizing with planning component)
  5. Web-crawler interacts with browser to gather training data

    • Gathering training data with Selenium
      • Data storage (data from web crawling) with oxen.ai
  6. Model is fine-tuned, stored and re-deployed

    • Hyperparameter tuning with RayTune & wandb
  7. (5. & 6)

    • Workflow with GitHub Actions (Tried to solve it with Airflow via Google Cloud. Unfortunately without success. Hence the use of GitHub Actions instead. However, no temporal triggering possible, but automated)
  8. (1 -6)

    • Automated Testing with GitHub Actions (CICD pipeline for deployment; CI: test with Flake8 whether Python syntax is correct; CD pipeline is triggered using semantic release; CD: Executable for win & mac will be created)

Install Guide

TODO