diff --git a/README.md b/README.md index a9822bc..e5ad481 100644 --- a/README.md +++ b/README.md @@ -2,22 +2,22 @@ huggingface-mlrun -In this demo we will be showcasing how we used LLMs to turn call center conversation audio files of customers and agents into valueable data in a single workflow orchastrated by MLRun. +This demo showcases how to use LLMs to turn audio files from call center conversations between customers and agents into valuable data, all in a single workflow orchestrated by MLRun. -MLRun will be automating the entire workflow, auto-scale resources as needed and automatically log and parse values between the workflow different steps. +MLRun automates the entire workflow, auto-scales resources as needed, and automatically logs and parses values between the different workflow steps. -By the end of this demo you will see the potential power of LLMs for feature extraction, and how easy it is being done using MLRun! +By the end of this demo you will see the potential power of LLMs for feature extraction, and how easily you can do this with MLRun! -We will use: -* [**OpenAI's Whisper**](https://openai.com/research/whisper) - To transcribe the audio calls into text. -* [**Flair**](https://flairnlp.github.io/) and [**Microsoft's Presidio**](https://microsoft.github.io/presidio/) - To recognize PII for filtering it out. -* [**HuggingFace**](https://huggingface.co/) - as the main machine learning framework to get the model and tokenizer for the features extraction. The demo uses [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) as the LLM to asnwer questions. -* and [**MLRun**](https://www.mlrun.org/) - as the orchastraitor to operationalize the workflow. +This demo uses: +* [**OpenAI's Whisper**](https://openai.com/research/whisper) — To transcribe the audio calls into text. +* [**Flair**](https://flairnlp.github.io/) and [**Microsoft's Presidio**](https://microsoft.github.io/presidio/) - To recognize PII so it can be filtered out. +* [**HuggingFace**](https://huggingface.co/) — The main machine-learning framework to get the model and tokenizer for the features extraction. The demo uses [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) as the LLM to answer questions. +* and [**MLRun**](https://www.mlrun.org/) — as the orchestrator to operationalize the workflow. -The demo contains a single [notebook](./notebook.ipynb) that covers the entire demo. +The demo contains a single [notebook](./notebook.ipynb) that encompasses the entire demo. -Most of the functions are being imported from [MLRun's hub](https://docs.mlrun.org/en/stable/runtimes/load-from-hub.html) - a wide range of functions that can be used for a variety of use cases. You can find all the python source code under [/src](./src) and links to the used functions from the hub in the notebook. +Most of the functions are imported from [MLRun's function hub](https://docs.mlrun.org/en/stable/runtimes/load-from-hub.html), which contains a wide range of functions that can be used for a variety of use cases. All functions used in the demo include links to their source in the hub. All of the python source code is under [/src](./src). Enjoy! ___ @@ -29,7 +29,7 @@ This project can run in different development environments: * Inside GitHub Codespaces * Other managed Jupyter environments -### Install the code and mlrun client +### Install the code and the mlrun client To get started, fork this repo into your GitHub account and clone it into your development environment. @@ -37,17 +37,17 @@ To install the package dependencies (not required in GitHub codespaces) use: make install-requirements -If you prefer to use Conda use this instead (to create and configure a conda env): +If you prefer to use Conda, use this instead (to create and configure a conda env): make conda-env > Make sure you open the notebooks and select the `mlrun` conda environment -### Install or connect to MLRun service/cluster +### Install or connect to the MLRun service/cluster The MLRun service and computation can run locally (minimal setup) or over a remote Kubernetes environment. -If your development environment support docker and have enough CPU resources run: +If your development environment supports Docker and there are sufficient CPU resources, run: make mlrun-docker @@ -57,10 +57,10 @@ If your environment is minimal, run mlrun as a process (no UI): [conda activate mlrun &&] make mlrun-api -For MLRun to run properly you should set your client environment, this is not required when using **codespaces**, the mlrun **conda** environment, or **iguazio** managed notebooks. +For MLRun to run properly you should set your client environment. This is not required when using **codespaces**, the mlrun **conda** environment, or **iguazio** managed notebooks. Your environment should include `MLRUN_ENV_FILE= ` (point to the mlrun .env file -in this repo), see [mlrun client setup](https://docs.mlrun.org/en/latest/install/remote.html) instructions for details. +in this repo); see [mlrun client setup](https://docs.mlrun.org/en/latest/install/remote.html) instructions for details. -> Note: You can also use a remote MLRun service (over Kubernetes), instead of starting a local mlrun, -> edit the [mlrun.env](./mlrun.env) and specify its address and credentials +> Note: You can also use a remote MLRun service (over Kubernetes): instead of starting a local mlrun: +> edit the [mlrun.env](./mlrun.env) and specify its address and credentials. diff --git a/notebook.ipynb b/notebook.ipynb index 2753526..02b6ead 100644 --- a/notebook.ipynb +++ b/notebook.ipynb @@ -154,7 +154,7 @@ "\n", "> Note: Multiple GPUs (`gpus` > 1) automatically deploy [OpenMPI](https://www.open-mpi.org/) jobs for **better performance and GPU utilization**.\n", "\n", - "There are not many functions under the source directory. That's because most of the code in this project is imported from [**MLRun's Functions Hub**](https://www.mlrun.org/hub/) — a collection of reusable functions and assets that are optimized and tested to simplify and accelate the move to production!" + "There are not many functions under the source directory. That's because most of the code in this project is imported from [**MLRun's Function hub**](https://www.mlrun.org/hub/) — a collection of reusable functions and assets that are optimized and tested to simplify and accelate the move to production!" ] }, { @@ -1167,14 +1167,6 @@ "* [x] **Anonymization** - Anonymize the text before inferring.\n", "* [x] **Analysis** - Perform question answering for feature extraction using Falcon-40B." ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2f13c10d-9f21-4c1a-8c62-b49c31880ca4", - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": {