Schema Naming Assessments for Improved LLM-Based SQL Inference

This repository contains datasets, code, and analysis described in the paper: SNAILS: Schema Naming Assessments for Improved LLM Systems.

For indepth documentation, visit the SNAILS website.

For questions and assistance, contact the first author at [email protected].

What is SNAILS?

SNAILS is a set of NL-to-SQL-related artifacts designed to explore the effects of schema naming on LLM-based NL-to-SQL performance. It includes a set of real-world databases and associated natural language (NL) : SQL gold query pairs, human-labeled data containing naturalness classifications the tables and columns in the database collection, a model and method for ML-based naturalness classification, and prompting strategies for improving schema identifier naturalness.

We used these artifacts to explore the true effect of schema identifier (tables and columns) names on NL-to-SQL performance, and present them in our SIGMOD 2025 paper. Our results can be reproduced using the artifacts in this repository using the instructions described below.

What SNAILS isn't:

SNAILS is not a fully-featured benchmark similar to Spider or BIRD that maintains a leaderboard, and provides a training, and dev dataset while also retaining a hidden test set. Instead SNAILS artifacts are a starting point for further research into schema-specific effects on NL-to-SQL. It can also be used to support the development of new NL-to-SQL methods. We encourage other researchers and developers to use our artifacts for their research objectives, and invite collaboration and feedback from the community.

SNAILS Artifacts

For ease of access, we make the artifacts referenced in our paper available in the following directory tree:

SNAILS\
|___SNAILS_Artifacts\
   |__databases\ # Contains .txt files with download links and instructions
   |__naturalness_classifications\ # .xlsx and .csv files with all SNAILS identifiers and naturalness scores
   |__naturalness_classifiers\ # ML Model and prompts for naturalness classification
   |__naturalness_modified_identifiers\ # crosswalks (or maps) of native to modified naturalness identifiers
   |__naturalness_modifier\ # RAG-based identifier renamer (see README.md here for more detail)
   |__nl_sql_question_query_pairs\ # 503 NL and gold query pairs for SNAILS evaluation

Results Reproducibility

The workflows for generating SNAILS results are somewhat complex and require multiple steps which we describe below in the SNAILS Project Setup section. You can reproduce the entire process, including LLM-based SQL generation, or you can use the SQL which the LLMs already generated to recreate the data that led to our findings. Scroll down to the section to get started. The root folder contains Jupyter notebooks to guide the reproduction process and should be executed in the proper order:

##### Core Findings #####
|__01-SNAILS-NL-to-SQL-Inference-and-Scoring.ipynb - SQL Inference over SNAILS collection
|__02-SNAILS-NL-to-SQL-results-analysis.ipynb      - Performance metric creation (e.g., QueryRecall, Execution Accuracy)
|__03-SNAILS-identifier-analysis.ipynb             - Identifier-focused metrics (e.g., Identifier Recall)
##### Supplementary Findings and Exploration #####
|__04-SNAILS-tokenizer-analysis.ipynb              - Tokenizes SNAILS identifiers and explores their properties
|__05-SNAILS-token-naturalness-analysis.ipynb      - Exploring the alignment of tokens to natural language
|__06-SNAILS-naturalness-comparisons.ipynb         - SNAILS vs. Spider vs. Bird vs. SchemaPile naturalness
|__07-schemapile-naturalness.ipynb                 - ETL scripts for schemapile extraction and evaluation
|__08a-codes_query_execution_and_selection.ipynb   - Augmenting CodeS process to select first correct SQL
|__08b-DINSQL-CodeS-schema-subsetting-analysis.ipynb - Evaluating schema subsets generated by CodeS and DINSQL
|__09-spider-query-analysis.ipynb                  - Performance metric creation for Spider DEV (Native and modified) inference

Before running any of these notebooks, you will need to follow the project setup steps listed below.

Natural View Building

One of the most pragmatic takeaways from our work on SNAILS is the idea of a natural view. Instead of adding additional information (e.g., column descriptions, example values, etc.) to a schema description in an LLM prompt, why don't we just make the target schema more natural? If you're building a new schema, that's relatively easy. If you're dealing with an already-existing schema, it's easier said then done. For the case of existing schemas, we offer a view building pipeline that incorporates our classifier and schema renamer.

SNAILS Natural View Creation Workflow:

You will need to follow the steps in the README located in SNAILS_Arifacts/naturalness_modifier/data_dict_reader to get set up for your database. Once the data_dict_reader has been configured, then run:

python ./classify_rename_and_build_view.py --database [your database name]

This will create a score file and a naturalness crosswalk file in ./schema_classifier_renamer_view_builder_output. Review the score file and make necessary corrections, then run:

python ./classify_rename_and_build_view.py --database [your database name] --build_view

This will create a .sql view file in the same directory. You can then run this file over your target database to generate your natural views. Note that it assumes the presence of a db_nl schema, which you should create prior to executing the view creation sql in your database.

If you want to try it out on a SNAILS database, run:

python ./classify_rename_and_build_view.py --database NYSED_SRC2022

and then

python ./classify_rename_and_build_view.py --database NYSED_SRC2022 --build_view

Which will create natural view .sql files for the SNAILS NYSED database using the NYSED PDF metadata and already-created fewshot prompt.

NL-to-SQL Experiment Quick Start

To rerun the NL-to-SQL experiments, you will need to, you may use the notebooks listed below. This does not include the identifier naturalness classification experiments--these are described in more detail later in this document. You can also find more detailed descriptions of these notebooks in the Experiment Notebooks section below.

Generate NL-to-SQL and evaluate: run end-to-end-data-prep-and-prediction.py
Evaluate tokens: tokenizer_analysis.ipynb
Run query-level experiments: end-to-end-prototype-schema-and-query-analysis.ipynb
Run identifier-level tokens: identifier-analysis.ipynb

Gold query profiling tool

In order to ensure a good usage of as many tables and columns as possible in each schema, we use a Gold query writing tool notebook that parses each query to extract tables and columns, and compares this output to all tables and columns in the target database. It gives a view of the tables and columns and indicates which ones have already been used, and which have not. It also provides a histogram plot of the count of different clauses within each query to provide a visual understanding of the distribution of query complexity for each question set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Schema Naming Assessments for Improved LLM-Based SQL Inference

What is SNAILS?

What SNAILS isn't:

SNAILS Artifacts

Results Reproducibility

Natural View Building

SNAILS Natural View Creation Workflow:

NL-to-SQL Experiment Quick Start

Gold query profiling tool

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
.local		.local
.local_example		.local_example
SNAILS_Artifacts		SNAILS_Artifacts
bin/SQLParserQueryAnalyzer_jar		bin/SQLParserQueryAnalyzer_jar
data		data
db		db
figure_output		figure_output
logs/schema-renaming		logs/schema-renaming
prompts		prompts
src		src
.gitignore		.gitignore
.gitignore:Zone.Identifier		.gitignore:Zone.Identifier
01-SNAILS-NL-to-SQL-Inference-and-Scoring.ipynb		01-SNAILS-NL-to-SQL-Inference-and-Scoring.ipynb
02-SNAILS-NL-to-SQL-results-analysis.ipynb		02-SNAILS-NL-to-SQL-results-analysis.ipynb
03-SNAILS-identifier-analysis.ipynb		03-SNAILS-identifier-analysis.ipynb
04-SNAILS-tokenizer-analysis.ipynb		04-SNAILS-tokenizer-analysis.ipynb
05-SNAILS-token-naturalness-analysis.ipynb		05-SNAILS-token-naturalness-analysis.ipynb
06-SNAILS-naturalness-comparisons.ipynb		06-SNAILS-naturalness-comparisons.ipynb
07-schemapile-naturalness.ipynb		07-schemapile-naturalness.ipynb
08a-codes_query_execution_and_selection.ipynb		08a-codes_query_execution_and_selection.ipynb
08b-DINSQL-CodeS-schema-subsetting-analysis.ipynb		08b-DINSQL-CodeS-schema-subsetting-analysis.ipynb
09-spider-query-analysis.ipynb		09-spider-query-analysis.ipynb
LICENSE		LICENSE
README.md		README.md
classify_rename_and_build_view.py		classify_rename_and_build_view.py
gold_query_writing_tool.ipynb		gold_query_writing_tool.ipynb
requirements.txt		requirements.txt
requirements_bloat.txt		requirements_bloat.txt

License

KyleLuoma/SNAILS

Folders and files

Latest commit

History

Repository files navigation

Schema Naming Assessments for Improved LLM-Based SQL Inference

What is SNAILS?

What SNAILS isn't:

SNAILS Artifacts

Results Reproducibility

Natural View Building

SNAILS Natural View Creation Workflow:

NL-to-SQL Experiment Quick Start

Gold query profiling tool

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages