Skip to content

Commit

Permalink
Merge pull request #84 from MSUSAzureAccelerators/main
Browse files Browse the repository at this point in the history
Adding Pull requests from MSUS repo
  • Loading branch information
pablomarin authored Apr 23, 2024
2 parents 8664c37 + 27442a6 commit cc62123
Show file tree
Hide file tree
Showing 4 changed files with 95 additions and 181 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{
"name": "Python 3",
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"image": "mcr.microsoft.com/devcontainers/python:0-3.10",
"image": "mcr.microsoft.com/devcontainers/python:0-3.10-bullseye",

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},
Expand Down
258 changes: 78 additions & 180 deletions 08-SQLDB_QA.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,11 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "c1fb79a3-4856-4721-988c-112813690a90",
"metadata": {},
"metadata": {
"metadata": {}
},
"outputs": [],
"source": [
"import os\n",
Expand All @@ -51,9 +53,11 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "258a6e99-2d4f-4147-b8ee-c64c85296181",
"metadata": {},
"metadata": {
"metadata": {}
},
"outputs": [],
"source": [
"# Set the ENV variables that Langchain needs to connect to Azure OpenAI\n",
Expand All @@ -65,7 +69,27 @@
"id": "1e8e0b32-a6b5-4b1c-943d-e57b737213fa",
"metadata": {},
"source": [
"# Install MS SQL DB driver in your machine"
"# Install MS SQL DB driver in your machine\n",
"\n",
"Use `lsb_release -a` to verify OS version details"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a353df6-0966-4e43-a914-6a2856eb140a",
"metadata": {},
"outputs": [],
"source": [
"!lsb_release -a"
]
},
{
"cell_type": "markdown",
"id": "8112882c",
"metadata": {},
"source": [
"## Using AML Instance"
]
},
{
Expand All @@ -75,19 +99,43 @@
"source": [
"\n",
"You might need the driver installed in order to talk to the SQL DB, so run the below cell once. Then restart the kernel and continue<br>\n",
"[Reference](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver16&tabs=ubuntu18-install%2Calpine17-install%2Cdebian8-install%2Credhat7-13-install%2Crhel7-offline)"
"[Microsoft Learn Reference](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver16&tabs=ubuntu18-install%2Calpine17-install%2Cdebian8-install%2Credhat7-13-install%2Crhel7-offline)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "65fbffc7-e149-4eb3-a4db-9f114b06f205",
"metadata": {},
"outputs": [],
"source": [
"# !sudo ./download_odbc_driver.sh"
]
},
{
"cell_type": "markdown",
"id": "357bca72",
"metadata": {},
"source": [
"## Using Dev Container\n",
"\n",
"You might need the driver installed in order to talk to the SQL DB, so run the below cell once. Then restart the kernel and continue<br>\n",
"[Microsoft Learn Reference](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver16&tabs=ubuntu18-install%2Cdebian17-install%2Cdebian8-install%2Credhat7-13-install%2Crhel7-offline#17)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59c04434",
"metadata": {
"metadata": {}
},
"outputs": [],
"source": [
"!chmod +x ./download_odbc_driver_dev_container.sh\n",
"!./download_odbc_driver_dev_container.sh"
]
},
{
"cell_type": "markdown",
"id": "35e30fa1-877d-4d3b-80b0-e17459c1e4f4",
Expand All @@ -107,19 +155,12 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"id": "26739d89-e075-4098-ab38-92cccf9f9425",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n",
"('Microsoft SQL Azure (RTM) - 12.0.2000.8 \\n\\tFeb 2 2024 04:20:23 \\n\\tCopyright (C) 2022 Microsoft Corporation\\n',)\n"
]
}
],
"metadata": {
"metadata": {}
},
"outputs": [],
"source": [
"from sqlalchemy import create_engine, text\n",
"from sqlalchemy.engine import URL\n",
Expand Down Expand Up @@ -156,20 +197,12 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "acaf202c-33a1-4105-b506-c26f2080c1d8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(pyodbc.ProgrammingError) ('42S01', \"[42S01] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]There is already an object named 'covidtracking' in the database. (2714) (SQLExecDirectW)\")\n",
"[SQL: CREATE TABLE covidtracking (date VARCHAR(MAX), state VARCHAR(MAX), death FLOAT, deathConfirmed FLOAT, deathIncrease INT, deathProbable FLOAT, hospitalized FLOAT, hospitalizedCumulative FLOAT, hospitalizedCurrently FLOAT, hospitalizedIncrease INT, inIcuCumulative FLOAT, inIcuCurrently FLOAT, negative FLOAT, negativeIncrease INT, negativeTestsAntibody FLOAT, negativeTestsPeopleAntibody FLOAT, negativeTestsViral FLOAT, onVentilatorCumulative FLOAT, onVentilatorCurrently FLOAT, positive FLOAT, positiveCasesViral FLOAT, positiveIncrease INT, positiveScore INT, positiveTestsAntibody FLOAT, positiveTestsAntigen FLOAT, positiveTestsPeopleAntibody FLOAT, positiveTestsPeopleAntigen FLOAT, positiveTestsViral FLOAT, recovered FLOAT, totalTestEncountersViral FLOAT, totalTestEncountersViralIncrease INT, totalTestResults FLOAT, totalTestResultsIncrease INT, totalTestsAntibody FLOAT, totalTestsAntigen FLOAT, totalTestsPeopleAntibody FLOAT, totalTestsPeopleAntigen FLOAT, totalTestsPeopleViral FLOAT, totalTestsPeopleViralIncrease INT, totalTestsViral FLOAT, totalTestsViralIncrease INT)]\n",
"(Background on this error at: https://sqlalche.me/e/20/f405)\n"
]
}
],
"metadata": {
"metadata": {}
},
"outputs": [],
"source": [
"# Read CSV file into a pandas dataframe\n",
"csv_path = \"./data/all-states-history.csv\"\n",
Expand Down Expand Up @@ -235,7 +268,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"id": "7faef3c0-8166-4f3b-a5e3-d30acfd65fd3",
"metadata": {},
"outputs": [],
Expand All @@ -245,7 +278,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"id": "6cbe650c-9e0a-4209-9595-de13f2f1ee0a",
"metadata": {},
"outputs": [],
Expand All @@ -256,7 +289,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"id": "ae80c022-415e-40d1-b205-1744a3164d70",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -299,7 +332,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"id": "2b51fb36-68b5-4770-b5f1-c042a08e0a0f",
"metadata": {},
"outputs": [],
Expand All @@ -319,129 +352,21 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"id": "21c6c6f5-4a14-403f-a1d0-fe6b0c34a563",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[QuerySQLDataBaseTool(description=\"Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.\", db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x7f477310d240>),\n",
" InfoSQLDatabaseTool(description='Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x7f477310d240>),\n",
" ListSQLDatabaseTool(db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x7f477310d240>),\n",
" QuerySQLCheckerTool(description='Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x7f477310d240>, llm=AzureChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7f477310e710>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7f477310dcf0>, temperature=0.5, openai_api_key=SecretStr('**********'), openai_proxy='', max_tokens=2000, azure_endpoint='https://gios-opeain-australia.openai.azure.com/', deployment_name='gpt-35-turbo-1106', openai_api_version='2023-12-01-preview', openai_api_type='azure'), llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['dialect', 'query'], template='\\n{query}\\nDouble check the {dialect} query above for common mistakes, including:\\n- Using NOT IN with NULL values\\n- Using UNION when UNION ALL should have been used\\n- Using BETWEEN for exclusive ranges\\n- Data type mismatch in predicates\\n- Properly quoting identifiers\\n- Using the correct number of arguments for functions\\n- Casting to the correct data type\\n- Using the proper columns for joins\\n\\nIf there are any of the above mistakes, rewrite the query. If there are no mistakes, just reproduce the original query.\\n\\nOutput the final SQL query only.\\n\\nSQL Query: '), llm=AzureChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7f477310e710>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7f477310dcf0>, temperature=0.5, openai_api_key=SecretStr('**********'), openai_proxy='', max_tokens=2000, azure_endpoint='https://gios-opeain-australia.openai.azure.com/', deployment_name='gpt-35-turbo-1106', openai_api_version='2023-12-01-preview', openai_api_type='azure')))]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"# As we know by now, Agents use expert/tools. Let's see which are the tools for this SQL Agent\n",
"agent_executor.tools"
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"id": "6d7bb8cf-8661-4174-8185-c64b4b20670d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `sql_db_list_tables` with `{'tool_input': ''}`\n",
"\n",
"\n",
"\u001b[0m\u001b[38;5;200m\u001b[1;3mcovidtracking\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `sql_db_schema` with `{'table_names': 'covidtracking'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[33;1m\u001b[1;3m\n",
"CREATE TABLE covidtracking (\n",
"\tdate VARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, \n",
"\tstate VARCHAR(max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, \n",
"\tdeath FLOAT(53) NULL, \n",
"\t[deathConfirmed] FLOAT(53) NULL, \n",
"\t[deathIncrease] BIGINT NULL, \n",
"\t[deathProbable] FLOAT(53) NULL, \n",
"\thospitalized FLOAT(53) NULL, \n",
"\t[hospitalizedCumulative] FLOAT(53) NULL, \n",
"\t[hospitalizedCurrently] FLOAT(53) NULL, \n",
"\t[hospitalizedIncrease] BIGINT NULL, \n",
"\t[inIcuCumulative] FLOAT(53) NULL, \n",
"\t[inIcuCurrently] FLOAT(53) NULL, \n",
"\tnegative FLOAT(53) NULL, \n",
"\t[negativeIncrease] BIGINT NULL, \n",
"\t[negativeTestsAntibody] FLOAT(53) NULL, \n",
"\t[negativeTestsPeopleAntibody] FLOAT(53) NULL, \n",
"\t[negativeTestsViral] FLOAT(53) NULL, \n",
"\t[onVentilatorCumulative] FLOAT(53) NULL, \n",
"\t[onVentilatorCurrently] FLOAT(53) NULL, \n",
"\tpositive FLOAT(53) NULL, \n",
"\t[positiveCasesViral] FLOAT(53) NULL, \n",
"\t[positiveIncrease] BIGINT NULL, \n",
"\t[positiveScore] BIGINT NULL, \n",
"\t[positiveTestsAntibody] FLOAT(53) NULL, \n",
"\t[positiveTestsAntigen] FLOAT(53) NULL, \n",
"\t[positiveTestsPeopleAntibody] FLOAT(53) NULL, \n",
"\t[positiveTestsPeopleAntigen] FLOAT(53) NULL, \n",
"\t[positiveTestsViral] FLOAT(53) NULL, \n",
"\trecovered FLOAT(53) NULL, \n",
"\t[totalTestEncountersViral] FLOAT(53) NULL, \n",
"\t[totalTestEncountersViralIncrease] BIGINT NULL, \n",
"\t[totalTestResults] FLOAT(53) NULL, \n",
"\t[totalTestResultsIncrease] BIGINT NULL, \n",
"\t[totalTestsAntibody] FLOAT(53) NULL, \n",
"\t[totalTestsAntigen] FLOAT(53) NULL, \n",
"\t[totalTestsPeopleAntibody] FLOAT(53) NULL, \n",
"\t[totalTestsPeopleAntigen] FLOAT(53) NULL, \n",
"\t[totalTestsPeopleViral] FLOAT(53) NULL, \n",
"\t[totalTestsPeopleViralIncrease] BIGINT NULL, \n",
"\t[totalTestsViral] FLOAT(53) NULL, \n",
"\t[totalTestsViralIncrease] BIGINT NULL\n",
")\n",
"\n",
"/*\n",
"3 rows from covidtracking table:\n",
"date\tstate\tdeath\tdeathConfirmed\tdeathIncrease\tdeathProbable\thospitalized\thospitalizedCumulative\thospitalizedCurrently\thospitalizedIncrease\tinIcuCumulative\tinIcuCurrently\tnegative\tnegativeIncrease\tnegativeTestsAntibody\tnegativeTestsPeopleAntibody\tnegativeTestsViral\tonVentilatorCumulative\tonVentilatorCurrently\tpositive\tpositiveCasesViral\tpositiveIncrease\tpositiveScore\tpositiveTestsAntibody\tpositiveTestsAntigen\tpositiveTestsPeopleAntibody\tpositiveTestsPeopleAntigen\tpositiveTestsViral\trecovered\ttotalTestEncountersViral\ttotalTestEncountersViralIncrease\ttotalTestResults\ttotalTestResultsIncrease\ttotalTestsAntibody\ttotalTestsAntigen\ttotalTestsPeopleAntibody\ttotalTestsPeopleAntigen\ttotalTestsPeopleViral\ttotalTestsPeopleViralIncrease\ttotalTestsViral\ttotalTestsViralIncrease\n",
"2021-03-07\tAK\t305.0\t0.0\t0\t0.0\t1293.0\t1293.0\t33.0\t0\t0.0\t0.0\t0.0\t0\t0.0\t0.0\t1660758.0\t0.0\t2.0\t56886.0\t0.0\t0\t0\t0.0\t0.0\t0.0\t0.0\t68693.0\t0.0\t0.0\t0\t1731628.0\t0\t0.0\t0.0\t0.0\t0.0\t0.0\t0\t1731628.0\t0\n",
"2021-03-07\tAL\t10148.0\t7963.0\t-1\t2185.0\t45976.0\t45976.0\t494.0\t0\t2676.0\t0.0\t1931711.0\t2087\t0.0\t0.0\t0.0\t1515.0\t0.0\t499819.0\t392077.0\t408\t0\t0.0\t0.0\t0.0\t0.0\t0.0\t295690.0\t0.0\t0\t2323788.0\t2347\t0.0\t0.0\t119757.0\t0.0\t2323788.0\t2347\t0.0\t0\n",
"2021-03-07\tAR\t5319.0\t4308.0\t22\t1011.0\t14926.0\t14926.0\t335.0\t11\t0.0\t141.0\t2480716.0\t3267\t0.0\t0.0\t2480716.0\t1533.0\t65.0\t324818.0\t255726.0\t165\t0\t0.0\t0.0\t0.0\t81803.0\t0.0\t315517.0\t0.0\t0\t2736442.0\t3380\t0.0\t0.0\t0.0\t481311.0\t0.0\t0\t2736442.0\t3380\n",
"*/\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `sql_db_query` with `{'query': \"SELECT SUM(hospitalizedIncrease) AS Texas_hospitalized_July_2020 FROM covidtracking WHERE state = 'TX' AND date LIKE '2020-07%'\"}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m[(0,)]\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `sql_db_query` with `{'query': \"SELECT SUM(hospitalizedIncrease) AS Nationwide_hospitalized_July_2020 FROM covidtracking WHERE date LIKE '2020-07%'\"}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m[(63105,)]\u001b[0m\u001b[32;1m\u001b[1;3mThe number of patients hospitalized in Texas during July 2020 was 0.\n",
"The nationwide total of all states hospitalized during July 2020 was 63105.\n",
"\n",
"Explanation:\n",
"- To find the number of patients hospitalized in Texas during July 2020, I used the SQL query:\n",
" ```sql\n",
" SELECT SUM(hospitalizedIncrease) AS Texas_hospitalized_July_2020 FROM covidtracking WHERE state = 'TX' AND date LIKE '2020-07%'\n",
" ```\n",
" This query sums the `hospitalizedIncrease` column for Texas where the date starts with '2020-07', but it returned 0, which might indicate that there were no hospitalizations recorded for Texas in July 2020.\n",
" \n",
"- To find the nationwide total of all states hospitalized during July 2020, I used the SQL query:\n",
" ```sql\n",
" SELECT SUM(hospitalizedIncrease) AS Nationwide_hospitalized_July_2020 FROM covidtracking WHERE date LIKE '2020-07%'\n",
" ```\n",
" This query sums the `hospitalizedIncrease` column for all states where the date starts with '2020-07', resulting in a nationwide total of 63105 hospitalizations.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"outputs": [],
"source": [
"try:\n",
" response = agent_executor.invoke(QUESTION) \n",
Expand All @@ -451,37 +376,10 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"id": "f23d2135-2199-474e-ae83-455aefc9b93b",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"The number of patients hospitalized in Texas during July 2020 was 0.\n",
"The nationwide total of all states hospitalized during July 2020 was 63105.\n",
"\n",
"Explanation:\n",
"- To find the number of patients hospitalized in Texas during July 2020, I used the SQL query:\n",
" ```sql\n",
" SELECT SUM(hospitalizedIncrease) AS Texas_hospitalized_July_2020 FROM covidtracking WHERE state = 'TX' AND date LIKE '2020-07%'\n",
" ```\n",
" This query sums the `hospitalizedIncrease` column for Texas where the date starts with '2020-07', but it returned 0, which might indicate that there were no hospitalizations recorded for Texas in July 2020.\n",
" \n",
"- To find the nationwide total of all states hospitalized during July 2020, I used the SQL query:\n",
" ```sql\n",
" SELECT SUM(hospitalizedIncrease) AS Nationwide_hospitalized_July_2020 FROM covidtracking WHERE date LIKE '2020-07%'\n",
" ```\n",
" This query sums the `hospitalizedIncrease` column for all states where the date starts with '2020-07', resulting in a nationwide total of 63105 hospitalizations."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"outputs": [],
"source": [
"printmd(response[\"output\"])"
]
Expand Down Expand Up @@ -535,9 +433,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10 - SDK v2",
"display_name": "Python 3",
"language": "python",
"name": "python310-sdkv2"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -549,7 +447,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit cc62123

Please sign in to comment.