Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema description endpoint #459

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

jonels-msft
Copy link
Collaborator

Add a metadata/schema endpoint which outputs a terse natural-language description of the public schema.

@jonels-msft jonels-msft requested a review from a team as a code owner August 9, 2023 18:26
The ADS copilot will need a SQL description of the schema in the form of
a series of CREATE TABLE statements. This begins the creation of a
endpoint -- tentatively named metadata/sql -- to provide that info.

This commit creates the endpoint to return a hardcoded string. My code
is cargo-culted from the surrounding patterns and feels unnecessarily
repetitive.

To test it manually, send this payload

    Content-Length: 96

    {
	    "jsonrpc": "2.0",
	    "id": 1,
	    "method": "metadata/sql",
	    "params": {
		    "owner_uri": "foo"
	    }
    }

to stdin of `python3 ossdbtoolsservice/ossdbtoolsservice_main.py`

The result is:

    Content-Length: 66

    {"id": 1, "jsonrpc": "2.0", "result": {"metadata": "Hello world"}}
Make a simple database call in the endpoint rather than returning
a hardcoded string. Attempt to test the result.

TODO:

* Make test pass
* Create proper response in endpoint
TODO: make realistic passing integration test
The metadata/schema endpoint now describes the schema in natural
language (for use by LLMs) with the payload

  { "description": "..." }
@jonels-msft
Copy link
Collaborator Author

Thanks @nasc17, I updated the handler mock check as you suggested.

Any other suggestions?

Copy link
Contributor

@DaeunYim DaeunYim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we trying to achieve here eventually? What kind of user interaction will be added to use this feature?

# Get current connection
connection_service = self._service_provider[constants.CONNECTION_SERVICE_NAME]
connection: ServerConnection = connection_service.get_connection(owner_uri, ConnectionType.DEFAULT)
schema = SchemaMetadata(connection.cursor(), 'public')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why do we only support 'public' schema? In the SchemaMetadata, init set param schema's default value as public

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we're using the public schema for now, but I'll let the default argument take care of that rather than specifying it again here.

)
tables = self.cur.fetchall()
cur_table = None
for table, indx in tables:
Copy link
Contributor

@DaeunYim DaeunYim Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's less likely to happen here but in case of large schema with 1000+ tables, it's always better to use list append + join than string concatenation. (applies to other two methods as well)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave that a try in dc8d5d9 -- see what you think. I'm not sure whether that particular implementation really will go faster with all the list comprehensions and dictionary lookups, but curious what your opinion is.

With all the list comprehensions and dictionary lookups, it's
questionable whether the new code goes faster, but it's an idea.
@jonels-msft
Copy link
Collaborator Author

What are we trying to achieve here eventually? What kind of user interaction will be added to use this feature?

Upon connection to the database, our front-end tool will request a schema description from this endpoint. The front-end will include the description in copilot completion requests to improve the relevance of generated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants