forked from opea-project/GenAIComps
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArangoDB Integration #3
Draft
aMahanna
wants to merge
37
commits into
main
Choose a base branch
from
arangodb
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Merged
* initial commit * updating feedback management readme to match arango * Removing comments above import * Working API test and updated readme * Working docker compose file * Docker compose creating network and docker image * code review * update readme & dev yaml * delete dev files * Delete arango_store.py --------- Co-authored-by: Anthony Mahanna <[email protected]>
* Initial commit * remove unnecessary files * code review * update: `prompt_search` * new: `ARANGO_PROTOCOL` * README * cleanup --------- Co-authored-by: lasyasn <[email protected]> Co-authored-by: Anthony Mahanna <[email protected]>
aMahanna
pushed a commit
that referenced
this pull request
Nov 26, 2024
* Adds an endpoint for image ingestion Signed-off-by: Melanie Buehler <[email protected]> * Combined image and video endpoint Signed-off-by: Melanie Buehler <[email protected]> * Add test and update README Signed-off-by: Melanie Buehler <[email protected]> * fixed variable name for embedding model (#1) Signed-off-by: okhleif-IL <[email protected]> * Fixed test script Signed-off-by: Melanie Buehler <[email protected]> * Remove redundant function Signed-off-by: Melanie Buehler <[email protected]> * get_videos, delete_videos --> get_files, delete_files (#3) Signed-off-by: okhleif-IL <[email protected]> * Updates test per review feedback Signed-off-by: Melanie Buehler <[email protected]> * Fixed test Signed-off-by: Melanie Buehler <[email protected]> * Add support for audio files multimodal data ingestion (#4) * Add support for audio files multimodal data ingestion Signed-off-by: dmsuehir <[email protected]> * Update function name Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]> * Change videos_with_transcripts to ingest_with_text Signed-off-by: Melanie Buehler <[email protected]> * Add image support to video ingestion with transcript functionality Signed-off-by: Melanie Buehler <[email protected]> * Update test and README Signed-off-by: Melanie Buehler <[email protected]> * Updated for review suggestions Signed-off-by: Melanie Buehler <[email protected]> * Add two tests for ingest_with_text Signed-off-by: Melanie Buehler <[email protected]> * LVM TGI Gaudi update for prompts without images (#7) * LVM Gaudi TGI update for prompts without images Signed-off-by: dmsuehir <[email protected]> * Wording Signed-off-by: dmsuehir <[email protected]> * Add a test Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change dummy image to be b64 encoded instead of the url (#9) Signed-off-by: dmsuehir <[email protected]> * Updates based on review feedback (#10) Signed-off-by: dmsuehir <[email protected]> * Test fix (#11) Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: Melanie Buehler <[email protected]> Signed-off-by: okhleif-IL <[email protected]> Signed-off-by: dmsuehir <[email protected]> Co-authored-by: dmsuehir <[email protected]> Co-authored-by: Omar Khleif <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <[email protected]>
* initial commit * updating feedback management readme to match arango * Removing comments above import * Working API test and updated readme * Working docker compose file * Docker compose creating network and docker image * code review * update readme & dev yaml * delete dev files * Delete arango_store.py --------- Co-authored-by: Anthony Mahanna <[email protected]>
* Initial commit * remove unnecessary files * code review * update: `prompt_search` * new: `ARANGO_PROTOCOL` * README * cleanup --------- Co-authored-by: lasyasn <[email protected]> Co-authored-by: Anthony Mahanna <[email protected]>
* Initial chat history implementation without API and docker implementation * make copy and remove async * API functionality matching MongoDB implementation Working API functionality, update to dockerfile required, and additional checks when updating document required. * Delete temp.py * Push changes and reset repo * Async definitions working in curl calls, updated read me to ArangoDB setup * Working docker container with network * Removing need for network to be created before docker compose * Cleanup async files and backup files * code review * fix: typo * revert mongo changes --------- Co-authored-by: Anthony Mahanna <[email protected]>
* initial commit: rename arango envs * fix comment
* initial commit * fix: env * Update README.md * Revert "Update README.md" This reverts commit 8f750e4. * fix: create database * cleanup * new: chunk embedding generation * new: `cithash` dep * cleanup: `ingest_data_to_arango` * new: envs in `config` * fix: more envs * more env cleanup * fix: deprecated line * fix: graph doc * update dataprep-compose * Dockerfile update and parametrized prepare_doc_arango.py (#15) * Initial readme and prepare doc arango, with embeddings by Anthony * Adding git to Dockerfile, tested dockerfile and dockercompose. Also parametrized variables in prepare_doc_arango.py * Updating readme with adjustable parameters listed * Only printing debug statements if log flag is on * add review * review pt 2 --------- Co-authored-by: Anthony Mahanna <[email protected]> * update dataprep readme --------- Co-authored-by: Ajay Kallepalli <[email protected]>
* wip: retriever * rename: `arango` * checkpoint * cleanup * fix: env * update retriever compose * add test file * fix: config & dockerfile * fix: embedding field name * new: config variables * new: traverse graph after similarity * fix: string * add `uniqueVertices` * add filter * infra * fix: query * remove: `similarity_distance_threshold` * temp: replace `p` * cleanup * remove: `ARANGO_TRAVERSAL_MIN_DEPTH` * update max_depth * new: `fetch_neighborhoods` * fix: test * cleanup: `prepare_doc_arango.py` * move `graph` & `vector_db` instantiation * cleanup: dataprep readme * cleanup: retriever * fix: arango test scripts * Update test_retrievers_arango_langchain.sh * update `ARANGO_EMBEDDING_DIMENSION` * fix: env vars * cleanup: retriever port * new: `test_dataprep_arango_langchain` * new: retriever yaml * Changing naming convention from arangodb to arango to ensure consistency between microservices, updated dockerfile to match and removed space in port * fix: retriever name * remove: `retriever_arangodb` --------- Co-authored-by: Ajay Kallepalli <[email protected]>
* dataprep improvements * fix: readme * new: make embedding generation mandatory * fix: exception handling * add logs * new: `ARANGO_USE_GRAPH_NAME`
* retriever improvements * new: `collection_count` * new: `empty_result` object * remove: `raise` no longer required * set `LOGFLAG` to `True` * Removing config variable ARANGO_EMBED_DIMENSION, getting embed dimension automatically from the db * minor cleanup * whitespace * log cleanup --------- Co-authored-by: Ajay Kallepalli <[email protected]>
…unk overlap, and process table. CURL command will supercede environment variables (#18)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR tracks the in-progress & completed ArangoDB Microservices for GenAIComps
Depends on:
Status:
Development Setup for using new LangChain functionality
Depends on arangoml/langchain#1
Clone this repository
Switch to the
arangodb
branchCreate a virtual environment:
python -m venv .venv source .venv/bin/activate
Note: Check out the contents in arangoml/langchain#1 to better understand the 3 different
langchain
classes we'll be using in this repo (ArangoGraph
,ArangoGraphQAChain
, andArangoVector
)For ARM:
For AMD:
Note: This is an ArangoDB Image that is based off of an ArangoDB PR that introduces Vector Indexing and Vector Similarity support via FAISS. Ask Anthony for more details.
Set your
OPENAI_API_KEY
environment variable (contact Anthony for access)Run the test script to confirm LangChain is working: