Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cleanup refactor vector collection #1432

Merged
merged 31 commits into from
Oct 21, 2024

Conversation

emrgnt-cmplxty
Copy link
Contributor

@emrgnt-cmplxty emrgnt-cmplxty commented Oct 18, 2024

Important

Refactor vector collection system by consolidating database operations into PostgresDBProvider, removing redundant classes, and updating tests.

  • Refactor and Cleanup:
    • Removed RelationalDBProvider and VectorDBProvider classes from database.py.
    • Consolidated database operations into PostgresDBProvider in postgres.py.
    • Removed vecs client and collection classes, integrating vector operations directly into PostgresDBProvider.
  • Database Operations:
    • Added SemaphoreConnectionPool for managing database connections in base.py.
    • Implemented vector operations like upsert, semantic_search, and delete directly in VectorDBMixin.
    • Updated create_index method to handle index creation for vectors.
  • API and Tests:
    • Updated API routes and handlers to use new database methods.
    • Refactored tests in test_vector_db_provider.py, test_document_db.py, and others to align with new database structure.
    • Adjusted integration tests in runner_cli.py and runner_sdk.py to reflect changes in vector operations.

This description was created by Ellipsis for 1ec7e74. It will automatically update as commits are pushed.

emrgnt-cmplxty and others added 20 commits October 16, 2024 15:04
* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* add option to include vectors in document chunks

* checkin

* update vector

---------

Co-authored-by: Nolan Tremelling <[email protected]>
* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* add option to include vectors in document chunks

* checkin

* update vector

* some various documentation tweaks

* some various documentation tweaks

---------

Co-authored-by: Nolan Tremelling <[email protected]>
* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* up

* add tests

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* modify timeouts

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* feat: Make prompt provider methods asynchronous

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Copy link

vercel bot commented Oct 18, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
yc_demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 21, 2024 2:12am
yc-demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 21, 2024 2:12am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
recommendation_platform ⬜️ Ignored (Inspect) Oct 21, 2024 2:12am

@emrgnt-cmplxty emrgnt-cmplxty marked this pull request as ready for review October 21, 2024 02:11
@emrgnt-cmplxty emrgnt-cmplxty merged commit 9e4a1cc into dev-minor Oct 21, 2024
14 of 22 checks passed
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 1ec7e74 in 2 minutes and 2 seconds

More details
  • Looked at 6768 lines of code in 68 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/tests/core/providers/logging/test_chat_logging_provider.py:63
  • Draft comment:
    The test_branches_overview function is commented out. If this is not intentional, consider uncommenting it to ensure the test is executed.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The test function test_branches_overview is commented out, which might be intentional for debugging or other reasons. However, if this is not intentional, it should be uncommented to ensure the test is executed.

Workflow ID: wflow_BIhOB5UigQ0Fd1uS


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

emrgnt-cmplxty added a commit that referenced this pull request Oct 23, 2024
* fix-actions (#1426)

* up

* modify

* add to github path

* Contextual Chunk Enrichment (#1433)

* add semantic chunking

* working

* precommit

* pre-commits

* Entity Deduplication (#1431)

* Modify graphrag prompt (#1421)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* up

* add tests

* Feature/include vectors option document chunks (#1419)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* add option to include vectors in document chunks

* checkin

* update vector

---------

Co-authored-by: Nolan Tremelling <[email protected]>

* Allow env var to set the default R2R deployment for the dashboard (#1417)

* modify community_summary_prompt function and corresponding prompt

* add tests

* up

* Feature/various documentation tweaks (#1422)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* add option to include vectors in document chunks

* checkin

* update vector

* some various documentation tweaks

* some various documentation tweaks

---------

Co-authored-by: Nolan Tremelling <[email protected]>

* Graphrag tests (#1418)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* up

* add tests

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* Modify graphrag tests timeouts (#1416)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* modify timeouts

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* feat: Make prompt provider methods asynchronous (comments below) (#1415)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* feat: Make prompt provider methods asynchronous

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* up

* up

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* Add routes, service level methods around conversations (#1420)

* Add routes, service level methods around conversations

* Slight refactor to match project conventions, add JS methods

* Updated JS methods

* JS docs

* Add python

* Update JS user tests

* add deduplication pipe, workflow, api, sdk, cli

* add summary workflow

* bug fixes

* pre-commit

* working

* search working

* adding dedup test files

* modify the update query

* precommit

* more testing

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* Refactor Python SDK for Intellisense, Thread Safety (#1430)

* Refactor Python SDK

* Fix CLI after SDK changes

* Add convo to agent

* Update conversation error handling, JS

* Remove unused, bad import

* Feature/cleanup refactor vector collection (#1432)

* Feature/include vectors option document chunks (#1419)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* add option to include vectors in document chunks

* checkin

* update vector

---------

Co-authored-by: Nolan Tremelling <[email protected]>

* Allow env var to set the default R2R deployment for the dashboard (#1417)

* Feature/various documentation tweaks (#1422)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* add option to include vectors in document chunks

* checkin

* update vector

* some various documentation tweaks

* some various documentation tweaks

---------

Co-authored-by: Nolan Tremelling <[email protected]>

* Graphrag tests (#1418)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* up

* add tests

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* Modify graphrag tests timeouts (#1416)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* modify timeouts

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* feat: Make prompt provider methods asynchronous (comments below) (#1415)

* Fix async JSON parsing (#1408)

* Fix async JSON parsing

* Remove score completion from js

* clean up js

* lockfile

* Feature/build custom logger (#1409)

* building a custom logger for r2r

* fix log

* maintain bkwd compat

* Feature/add kg description prompt (#1411)

* add kg desc prompt

* add kg desc prompt

* add kg desc prompt

* fix prompt name

* separate test run freq (#1412)

* separate test run freq

* task_id check fix

* add ingestion docs

* updatet

* add

* rm old prompts

* rm old prompots

* rm old prompts

* rm old prompts

* Prod fixes + enhancements (#1407)

* change default settings back to fp32

* add logging and cache triples

* up

* up

* pre-commit and cleanups

* making community summary prompt async

* up

* up

* revert prompt changes

* up

* up

* modify default

* bump test timeout due to stricter concurrency limits

* bump sleep

* rm ubuntu from windows/mac workflows

* feat: Make prompt provider methods asynchronous

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* bump pyproject version

* first commit

* towards slimmer vector implementation logic

* up

* iterate

* up

* checkin

* up

* work doc chunks

* working vector search

* working full text search

* remove asyncpg

* passing vector tests

* up

* merge

* rm pytest

* up

* up

* fix delete

* up

* up

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* Add tests + Cleanup (#1437)

* up

* add tests

* test rename to sdk

* up

* fix tests

* typo

* modify chunk enrichment prompt (#1438)

* modify prompt

* up

* Fix type error on port argument of CLI (#1439)

* finish (#1440)

* finish

* up

* fix

* fix

* up

* fix

* final cleanups

* fix naming convention

* fix schema error

* increase timeout

* split graphrag actions

* fix collection exists error

* up (#1442)

* Add error message (#1443)

* up

* sdk fix

* locally testing build

* up docs (#1445)

* checkin work (#1444)

* checkin work

* finish index functionality extension

* fix concurrency

* add alembic (#1446)

* Prompt Tuning (#1447)

* Check in

* Fix after merging dev-minor in

* Ensure to not cause int overflow with hatchet (#1454)

* Bump JS (#1456)

* Ensure to not cause int overflow with hatchet

* bump js

* improve migration implementation (#1452)

* improve migration implementation

* refine migrations to include kg

* add alembic cli

* extend documentation

* extend docs and all that

* Revert change of default behaviour of entities endpoint, docs, tests (#1455)

* change def behavior of entities + delete endpoint

* pre-commit

* add deduplication tests

* Delete graph (#1450)

* up docs

* up

* up

* rename to raw_chunks

* up

* add tests

* up

* up

* change default

* change cli

* separate out deduplication tests

* change run type in the test

* up

* up

* add test concurrency

* up

* rm concurrency groups

* rm dedup tests

* remove json

* tests

* up

* fix lock

* Update postgres.py

* Feature/merge dev minor main (#1457)

* add run without orchestration (#1448)

* add run without orchestration

* bump versions

* bump versions

* bump versions

* fix

* up

* add end points

* add run without orchestration (#1448) (#1458)

* add run without orchestration

* bump versions

* bump versions

* bump versions

* fix

* up

* sync migration changes

* Ensure that we await ingest files in ingest_files method (#1460)

* Nolan/await update files (#1461)

* Ensure that we await ingest files in ingest_files method

* Await update files as well

* Docs changes (#1462)

* up

* up

* up

* up

* fix failed find and replace (#1463)

* fix failed find and replace

* fix

* Fix JS Client for Ingest Chunks (#1464)

* Ensure that we await ingest files in ingest_files method

* Await update files as well

* Fix js client

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Nolan Tremelling <[email protected]>
@emrgnt-cmplxty emrgnt-cmplxty deleted the feature/cleanup-refactor-vector-collection branch October 25, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants