-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs(aggregations): structured view aggregations (#87)
- Loading branch information
Showing
13 changed files
with
377 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,45 @@ | ||
# Concept: IQL | ||
|
||
Intermediate Query Language (IQL) is a simple language that serves as an abstraction layer between natural language and data source-specific query syntax, such as SQL. With db-ally's [structured views](./structured_views.md), LLM utilizes IQL to express complex queries in a simplified way. | ||
Intermediate Query Language (IQL) is a simple language that serves as an abstraction layer between natural language and data source-specific query syntax, such as SQL. With db-ally's [structured views](structured_views.md), LLM utilizes IQL to express complex queries in a simplified way. IQL allows developers to model operations such as filtering and aggregation on the underlying data. | ||
|
||
## Filtering | ||
|
||
For instance, an LLM might generate an IQL query like this when asked "Find me French candidates suitable for a senior data scientist position": | ||
|
||
```python | ||
from_country("France") AND senior_data_scientist_position() | ||
``` | ||
from_country('France') AND senior_data_scientist_position() | ||
|
||
The capabilities made available to the AI model via IQL differ between projects. Developers control these by defining special [views](structured_views.md). db-ally automatically exposes special methods defined in structured views, known as "filters", via IQL. For instance, the expression above suggests that the specific project contains a view that includes the `from_country` and `senior_data_scientist_position` methods (and possibly others that the LLM did not choose to use for this particular question). Additionally, the LLM can use boolean operators (`AND`, `OR`, `NOT`) to combine individual filters into more complex expressions. | ||
|
||
## Aggregation | ||
|
||
Similar to filtering, developers can define special methods in [structured views](structured_views.md) that perform aggregation. These methods are also exposed to the LLM via IQL. For example, an LLM might generate the following IQL query when asked "What's the average salary for each country?": | ||
|
||
```python | ||
average_salary_by_country() | ||
``` | ||
|
||
The capabilities made available to the AI model via IQL differ between projects. Developers control these by defining special [Views](structured_views.md). db-ally automatically exposes special methods defined in structured views, known as "filters", via IQL. For instance, the expression above suggests that the specific project contains a view that includes the `from_country` and `senior_data_scientist_position` methods (and possibly others that the LLM did not choose to use for this particular question). Additionally, the LLM can use Boolean operators (`and`,`or`, `not`) to combine individual filters into more complex expressions. | ||
The `average_salary_by_country` groups candidates by country and calculates the average salary for each group. | ||
|
||
The aggregation IQL call has access to the raw query, so it can perform even more complex aggregations. Like grouping different columns, or applying a custom functions. We can ask db-ally to generate candidates raport with the following IQL query: | ||
|
||
```python | ||
candidate_report() | ||
``` | ||
|
||
In this case, the `candidate_report` method is defined in a structured view, and it performs a series of aggregations and calculations to produce a report with the average salary, number of candiates, and other metrics, by country. | ||
|
||
## Operation chaining | ||
|
||
Some queries require filtering and aggregation. For example, to calculate the average salary for a data scientist in the US, we first need to filter the data to include only US candidates who are senior specialists, and then calculate the average salary. In this case, db-ally will first generate an IQL query to filter the data, and then another IQL query to calculate the average salary. | ||
|
||
```python | ||
from_country("USA") AND senior_data_scientist_position() | ||
``` | ||
|
||
```python | ||
average_salary() | ||
``` | ||
|
||
In this case, db-ally will execute queries sequentially to build a single query plan to execute on the data source. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# Quickstart: Aggregations | ||
|
||
This guide is a continuation of the [Intro](./intro.md) guide. It assumes that you have already set up the views and the collection. If not, please refer to the complete Part 1 code on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/intro.py){:target="_blank"}. | ||
|
||
In this guide, we will add aggregations to our view to calculate general metrics about the candidates. | ||
|
||
## View Definition | ||
|
||
To add aggregations to our [structured view](../concepts/structured_views.md), we'll define new methods. These methods will allow the LLM model to perform calculations and summarize data across multiple rows. Let's add three aggregation methods to our `CandidateView`: | ||
|
||
```python | ||
class CandidateView(SqlAlchemyBaseView): | ||
""" | ||
A view for retrieving candidates from the database. | ||
""" | ||
|
||
def get_select(self) -> sqlalchemy.Select: | ||
""" | ||
Creates the initial SqlAlchemy select object, which will be used to build the query. | ||
""" | ||
return sqlalchemy.select(Candidate) | ||
|
||
@decorators.view_aggregation() | ||
def average_years_of_experience(self) -> sqlalchemy.Select: | ||
""" | ||
Calculates the average years of experience of candidates. | ||
""" | ||
return self.select.with_only_columns( | ||
sqlalchemy.func.avg(Candidate.years_of_experience).label("average_years_of_experience") | ||
) | ||
|
||
@decorators.view_aggregation() | ||
def positions_per_country(self) -> sqlalchemy.Select: | ||
""" | ||
Returns the number of candidates per position per country. | ||
""" | ||
return ( | ||
self.select.with_only_columns( | ||
sqlalchemy.func.count(Candidate.position).label("number_of_positions"), | ||
Candidate.position, | ||
Candidate.country, | ||
) | ||
.group_by(Candidate.position, Candidate.country) | ||
.order_by(sqlalchemy.desc("number_of_positions")) | ||
) | ||
|
||
@decorators.view_aggregation() | ||
def candidates_per_country(self) -> sqlalchemy.Select: | ||
""" | ||
Returns the number of candidates per country. | ||
""" | ||
return ( | ||
self.select.with_only_columns( | ||
sqlalchemy.func.count(Candidate.id).label("number_of_candidates"), | ||
Candidate.country, | ||
) | ||
.group_by(Candidate.country) | ||
) | ||
``` | ||
|
||
By setting up these aggregations, you enable the LLM to calculate metrics about the average years of experience, the number of candidates per position per country, and the top universities based on the number of candidates. | ||
|
||
## Query Execution | ||
|
||
Having already defined and registered the view with the collection, we can now execute the query: | ||
|
||
```python | ||
result = await collection.ask("What is the average years of experience of candidates?") | ||
print(result.results) | ||
``` | ||
|
||
This will return the average years of experience of candidates. | ||
|
||
<details> | ||
<summary>The expected output</summary> | ||
``` | ||
The generated SQL query is: SELECT avg(candidates.years_of_experience) AS average_years_of_experience | ||
FROM candidates | ||
Number of rows: 1 | ||
{'average_years_of_experience': 4.98} | ||
``` | ||
</details> | ||
|
||
Feel free to try other questions like: "What's the distribution of candidates across different positions and countries?" or "How many candidates are from China?". | ||
|
||
## Full Example | ||
|
||
Access the full example on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/aggregations.py){:target="_blank"}. | ||
|
||
## Next Steps | ||
|
||
Explore [Quickstart Part 3: Semantic Similarity](./semantic-similarity.md) to expand on the example and learn about using semantic similarity. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.