Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3 graph implmentations #1593

Merged
merged 25 commits into from
Nov 15, 2024
Merged

Conversation

shreyaspimpalgaonkar
Copy link
Contributor

@shreyaspimpalgaonkar shreyaspimpalgaonkar commented Nov 15, 2024

Important

Refactor and enhance knowledge graph functionality by renaming triples to relationships, updating models, handlers, and API endpoints, and adjusting tests accordingly.

  • Behavior:
    • Renamed getTriples to getRelationships in r2rV2ClientIntegrationSuperUser.test.ts and r2rV2ClientIntegrationUser.test.ts.
    • Updated KGCreationSettings in models.tsx to use kg_relationships_extraction_prompt and max_knowledge_relationships.
    • Modified get_triples to get_relationships in kg.py.
    • Changed API endpoints in kg_router.py and graph_router.py to handle relationships instead of triples.
  • Models:
    • Replaced Triple with Relationship in __init__.py and other related files.
    • Updated CommunityReport to Community in abstractions/__init__.py.
  • Handlers:
    • Introduced PostgresRelationshipHandler and PostgresEntityHandler in kg.py.
    • Added methods for creating, updating, and deleting entities and relationships.
  • Pipes:
    • Renamed KGTriplesExtractionPipe to KGRelationshipsExtractionPipe in pipes/__init__.py.
    • Updated logic in community_summary.py and deduplication.py to handle relationships.
  • Prompts:
    • Updated graphrag_relationships_extraction_few_shot.yaml to reflect relationship extraction changes.
  • Tests:
    • Adjusted tests in test_kg_community_summary_pipe.py and test_kg_logic.py to align with new relationship handling.

This description was created by Ellipsis for 1cb66cf. It will automatically update as commits are pushed.

@shreyaspimpalgaonkar shreyaspimpalgaonkar changed the base branch from feature/api-v3 to main November 15, 2024 04:29
@NolanTrem NolanTrem changed the base branch from main to feature/api-v3 November 15, 2024 04:42
Copy link
Collaborator

@NolanTrem NolanTrem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just needs a few tweaks. Would like to take another look tomorrow before we merge

*/
@feature("getTriples")
async getTriples(
@feature("getRelationships")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the SDK methods here to getTriples, as not to break any current implementation. We can also add an @deprecated warning to this such that it warns users to switch to the v3 SDK methods

@@ -229,7 +229,7 @@ async def get_entities(
@click.option(
"--collection-id",
required=True,
help="Collection ID to retrieve triples from.",
help="Collection ID to retrieve relationships from.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the SDK, let's leave the CLI as is. I have a deprecation wrapper that we can add to these methods, and then we will point users to the new command.

WrappedKGTunePromptResponse,
)


from shared.api.models.kg.responses_v3 import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to get ride of the validation on the v2 endpoints—it's there for legacy, and should work, but, no need to have two versions of this for legacy code.

@@ -308,9 +309,9 @@ async def get_triples(
description="Number of items to return. Use -1 to return all items.",
),
auth_user=Depends(self.service.providers.auth.auth_wrapper),
) -> WrappedKGTriplesResponse:
) -> WrappedKGRelationshipsResponse:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should modify these to WrappedRelationshipResponse and WrappedRelationshipsResponse to match the rest of the codebase

@@ -1,22 +1,33 @@
import logging
import textwrap
from typing import Optional
from typing import Optional, Union
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but just use |

@@ -110,7 +110,7 @@ async def local_search(

# entity search
search_type = "__Entity__"
async for search_result in await self.database_provider.vector_query( # type: ignore
async for search_result in await self.database_provider.graph_search( # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but lets stop passing kwargs and move to args whenever possible

vector_column_str = _decorate_vector_type(
f"({self.dimension})", self.quantization_type
)

query = f"""
CREATE TABLE IF NOT EXISTS {self._get_table_name("chunk_entity")} (
id SERIAL PRIMARY KEY,
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
sid SERIAL NOT NULL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure this gets in the migration script

if entity_level is None:
raise ValueError("Entity level is not set")

for entity in entities:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to a generator?

if not all(entity.level == entity_level for entity in entities):
    raise ValueError("All entities must be of the same level")

py/sdk/v2/kg.py Outdated
@@ -103,39 +103,39 @@ async def get_entities(

return await self._make_request("GET", "entities", params=params) # type: ignore

async def get_triples(
async def get_relationships(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep v2 method names, it's fine to change the internal naming conventions, though

WrappedKGTriplesResponse = ResultsWrapper[KGTriplesResponse]

# KG Entities
WrappedKGEntityResponse = ResultsWrapper[KGEntitiesResponse]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but it should be singular KGEntitiesResponse and so on

@shreyaspimpalgaonkar shreyaspimpalgaonkar changed the base branch from feature/api-v3 to main November 15, 2024 19:30
@shreyaspimpalgaonkar shreyaspimpalgaonkar changed the base branch from main to feature/api-v3 November 15, 2024 21:02
@shreyaspimpalgaonkar shreyaspimpalgaonkar marked this pull request as ready for review November 15, 2024 21:02
@shreyaspimpalgaonkar shreyaspimpalgaonkar merged commit e819d2e into feature/api-v3 Nov 15, 2024
16 of 20 checks passed
@shreyaspimpalgaonkar shreyaspimpalgaonkar deleted the v3-graph-implmentations branch November 15, 2024 21:03
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 1cb66cf in 2 minutes and 51 seconds

More details
  • Looked at 10379 lines of code in 57 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. py/core/main/api/v2/kg_router.py:312
  • Draft comment:
    The parameter triple_ids should be renamed to relationship_ids to maintain consistency with the rest of the code changes.
            relationship_ids: Optional[list[str]] = Query(
  • Reason this comment was not posted:
    Marked as duplicate.
2. py/tests/core/providers/kg/test_kg_logic.py:87
  • Draft comment:
    Consider renaming triples_raw_list to relationships_raw_list for consistency with the updated terminology.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR changes the term 'triples' to 'relationships' across the codebase. However, in the test file, the variable name 'triples_raw_list' should also be updated to 'relationships_raw_list' for consistency.

Workflow ID: wflow_aL6oV2GYDiYT7UXk


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -308,9 +311,9 @@ async def get_triples(
description="Number of items to return. Use -1 to return all items.",
),
auth_user=Depends(self.service.providers.auth.auth_wrapper),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter triple_ids should be renamed to relationship_ids to maintain consistency with the rest of the code changes.

Suggested change
auth_user=Depends(self.service.providers.auth.auth_wrapper),
relationship_ids: Optional[list[str]] = Query(

emrgnt-cmplxty added a commit that referenced this pull request Dec 4, 2024
* improve ci/cd runtime

* update prompt tests

* improve ci/cd runtime (#1535)

* improve ci/cd runtime

* update prompt tests

* Support Python ^3.10 (#1534)

* add azure

* up

* up

* spec out v3 api

* checkin document router

* adding chunk abstr

* add list chunks

* add chunk search

* up

* add users routes

* up

* checkin progress

* add collections annotations

* add indices

* add user

* checkin work

* up

* complete conversations CRUD

* fix type errors

* add graph router

* add graphs

* Update JS (#1563)

* Feature/add graph to v3 (#1565)

* complete simple tests, cleanup routers

* up

* Harmonize Pagination across endpoints (#1564)

* Pagination

* Add fixmes

* Fix nested deletion filter bug (#1567)

* Remove Mintlify docs (#1569)

* Nolan/list collection (#1568)

* Check in

* More

* Fix nested transactions issue in sqlite logger

* Fix update collection return type

* JS V3 (#1571)

* Sync collections JS

* More documents

* Clean up messy code

* list not List

* Users first pass

* User tests and fixmes

* More

* typo

* More prompts

* Pre-commit improvements

* Remove prints

* Cleanups on conversations

* Branches response

* Chunks

* More work on the return types

* Jest config

* Fix branch creation time

* Fix lock

* Nolan/v3 tests (#1578)

* Add deprecated command back

* Add warning

* Fix GraphRAG tests (#1579)

* More cleanup (#1580)

* More cleanup

* More

* Fix test

* More cleanups

* More cleanups

* More

* Merge main

* Python SDK V3 (#1585)

* Python SDK V3

* Fix

* First pass (#1586)

* More V3 (#1587)

* Validation errors

* Update js test

* more

* Fix sync methods on v2 sdk, add check for download files (#1588)

* More CLI (#1589)

* Print logs on failing tests (#1590)

* Print logs on failing tests

* MOre

* cleanup

* Again

* Again

* More JS testing (#1591)

* More JS testing

* Cleanup

* More refactors for tests (#1592)

* System Routes (#1594)

* Fix type errors, pass collection id (#1595)

* Hotfix: dict

* V3 graph implmentations (#1593)

* complete simple tests, cleanup routers

* up

* up

* checkin

* up

* up

* response models

* checkin

* up

* checkin

* up

* up

* up

* up

* up

* up

* v2

* up

* up

* up

* up

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Allow passing of collection id at document ingestion (#1596)

* KG Response sync (#1597)

* fix

* Fix Prompt Override (#1599)

* Fix Prompt Override

* print

* Caching

* Fix

* Updated Graph Models, Drop SID (#1598)

* New Graph Models

* Fix

* minor tweaks

* fix summary model (#1604)

* incr progress

* Add /users/me (#1605)

* Add /users/me

* oops

* Resolve Merge Conflicts (#1607)

* Fix conflicts

* Clean up

* Nolan/conflicts (#1608)

* expose reset data to admin (#1602)

* up (#1603)

* up

* up

* wtf github is a piece of garbage

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* wrapup walkthrough

* Add delete user method, sync JS to camel case (#1609)

* V3 graph testing (#1606)

* up

* up

* up

* graph crud

* up

* community endpts

* up

* up

* up

* up

* up

* up

* up

* up

* add back routers

* up

* pre-commit

* Fix Broken V2 Graphs, Better Response Models (#1612)

* Increase test coverage

* Fix v2 graphs, better response models

* Remaining types

* Add types to Python SDK

* Typo

* update tests

* revert test change

* up

* Add types to package export (#1613)

* Graph refactor (#1611)

* up

* up

* add back routers

* up

* pre-commit

* update tests

* revert test change

* up

* simplify

* up

* add the add/remove endpoints

* up

* include routers back

* Create branch update (#1617)

* Graph refactor (#1616)

* up

* up

* add back routers

* up

* pre-commit

* update tests

* revert test change

* up

* simplify

* up

* add the add/remove endpoints

* up

* include routers back

* List collections (#1619)

* up

* up

* up

* up

* Graph refactor (#1620)

* up

* up

* add back routers

* up

* pre-commit

* update tests

* revert test change

* up

* simplify

* up

* add the add/remove endpoints

* up

* include routers back

* up

* up

* up

* up

* Nolan/update graph (#1621)

* List collections

* Update Graph JS SDK

* up

* up

* cleanup

* Graph refactor (#1622)

* up

* up

* add back routers

* up

* pre-commit

* update tests

* revert test change

* up

* simplify

* up

* add the add/remove endpoints

* up

* include routers back

* up

* up

* up

* up

* up

* up

* cleanup

* up

* up

* up

* remove unnecessary functions

* up

* up

* complete document embedding workflow

* working get command on graph

* checkin progress

* up

* add entity and relationship deletions

* no verif

* up

* up

* up

* up (#1636)

* up

* sync graph

* up

* up

* fix relationship distance calc.

* fix issue with faulty collection filter (#1637)

* Patch/alternative fix logics 2 (#1638)

* fix issue with faulty collection filter

* further refinements, like fixing limits

* up

* fix logic around include metadata and scores

* fix double collection assignment

* up

* fix communities

* working clusters

* up

* add collection extraction

* add collection extraction

* up

* prep for merge

* Patch/alternative up with nolan (#1643)

* SDK First pass

* Add feature tracking

* Typo

* Check in

* Rebase

* Add Graph tests

* Fix Agent empty message bug

* Check in JS routes

* More tests, examples

* Sync python

* Expose Entity/Relationship Params in Routes (#1640)

* Expose Entity/Relationship Params

* Descriptions

* Modify create entities

* Create relationships

* set parent_id

* Update entitiy

* Update Relationships

* Check in

* Ellipsis fixes

* More cleanup

* Start CRUD on communities

* Communities DB

* Explicit working path

* Once again

* Fail fast false

* Testing around community creation

* Delete community test

* Update community tests

* Clean up type errors, cleaner code

* More cleanup

* More

* remove chunk_entity

* Delete bad, unused methods

* More

* fixup crud

* rm pull

---------

Co-authored-by: NolanTrem <[email protected]>

* Feature/fix graph permissions (#1645)

* update docs / collections

* up

* Feature/fix auth checks (#1647)

* update docs / collections

* up

* fix super user and more

* up

* up (#1648)

* Feature/rm v2 api (#1649)

* SDK First pass

* Add feature tracking

* Typo

* Check in

* Rebase

* Add Graph tests

* Fix Agent empty message bug

* Check in JS routes

* More tests, examples

* Sync python

* Expose Entity/Relationship Params in Routes (#1640)

* Expose Entity/Relationship Params

* Descriptions

* Modify create entities

* Create relationships

* set parent_id

* Update entitiy

* Update Relationships

* Check in

* Ellipsis fixes

* More cleanup

* Start CRUD on communities

* Communities DB

* Explicit working path

* Once again

* Fail fast false

* Testing around community creation

* Delete community test

* Update community tests

* Clean up type errors, cleaner code

* More cleanup

* More

* remove chunk_entity

* Delete bad, unused methods

* More

* remove v2 api

* rm kg router

* cleanups

* fixup delete by filter

* fixup delete by filter

* fixes

* up

* up

---------

Co-authored-by: NolanTrem <[email protected]>

* Improved Data Structures (#1650)

* Check in

* Most tests fixed

* fix tables

* Once more

* Move to a single community table

* Don't modify existing migration script--keep them atomic

* Migration

* Migration, more clean up

* All but deletion working

* Up

* Feature/tweaks for prod (#1651)

* tweaks for prod

* up

* final tweaks

* Nolan/deletion (#1652)

* Check in

* Most tests fixed

* fix tables

* Once more

* Move to a single community table

* Don't modify existing migration script--keep them atomic

* Migration

* Migration, more clean up

* All but deletion working

* Up

* Fix deletion

* Working migration (#1654)

* Feature/production tweaks (#1653)

* tweaks for prod

* up

* final tweaks

* prod tweaks

* fixed

---------

Co-authored-by: NolanTrem <[email protected]>

* sort

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants