Skip to content

KyleLuoma/SNAILS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Schema Naming Assessments for Improved LLM-Based SQL Inference

This repository contains datasets, code, and analysis described in the paper: SNAILS: Schema Naming Assessments for Improved LLM Systems.

For indepth documentation, visit the SNAILS website.

For questions and assistance, contact the first author at [email protected].

What is SNAILS?

SNAILS is a set of NL-to-SQL-related artifacts designed to explore the effects of schema naming on LLM-based NL-to-SQL performance. It includes a set of real-world databases and associated natural language (NL) : SQL gold query pairs, human-labeled data containing naturalness classifications the tables and columns in the database collection, a model and method for ML-based naturalness classification, and prompting strategies for improving schema identifier naturalness.

We used these artifacts to explore the true effect of schema identifier (tables and columns) names on NL-to-SQL performance, and present them in our SIGMOD 2025 paper. Our results can be reproduced using the artifacts in this repository using the instructions described below.

What SNAILS isn't:

SNAILS is not a fully-featured benchmark similar to Spider or BIRD that maintains a leaderboard, and provides a training, and dev dataset while also retaining a hidden test set. Instead SNAILS artifacts are a starting point for further research into schema-specific effects on NL-to-SQL. It can also be used to support the development of new NL-to-SQL methods. We encourage other researchers and developers to use our artifacts for their research objectives, and invite collaboration and feedback from the community.

SNAILS Artifacts

For ease of access, we make the artifacts referenced in our paper available in the following directory tree:

SNAILS\
|___SNAILS_Artifacts\
   |__databases\ # Contains .txt files with download links and instructions
   |__naturalness_classifications\ # .xlsx and .csv files with all SNAILS identifiers and naturalness scores
   |__naturalness_classifiers\ # ML Model and prompts for naturalness classification
   |__naturalness_modified_identifiers\ # crosswalks (or maps) of native to modified naturalness identifiers
   |__naturalness_modifier\ # RAG-based identifier renamer (see README.md here for more detail)
   |__nl_sql_question_query_pairs\ # 503 NL and gold query pairs for SNAILS evaluation 

Results Reproducibility

The workflows for generating SNAILS results are somewhat complex and require multiple steps which we describe below in the SNAILS Project Setup section. You can reproduce the entire process, including LLM-based SQL generation, or you can use the SQL which the LLMs already generated to recreate the data that led to our findings. Scroll down to the section to get started. The root folder contains Jupyter notebooks to guide the reproduction process and should be executed in the proper order:

##### Core Findings #####
|__01-SNAILS-NL-to-SQL-Inference-and-Scoring.ipynb - SQL Inference over SNAILS collection
|__02-SNAILS-NL-to-SQL-results-analysis.ipynb      - Performance metric creation (e.g., QueryRecall, Execution Accuracy)
|__03-SNAILS-identifier-analysis.ipynb             - Identifier-focused metrics (e.g., Identifier Recall)
##### Supplementary Findings and Exploration #####
|__04-SNAILS-tokenizer-analysis.ipynb              - Tokenizes SNAILS identifiers and explores their properties
|__05-SNAILS-token-naturalness-analysis.ipynb      - Exploring the alignment of tokens to natural language
|__06-SNAILS-naturalness-comparisons.ipynb         - SNAILS vs. Spider vs. Bird vs. SchemaPile naturalness
|__07-schemapile-naturalness.ipynb                 - ETL scripts for schemapile extraction and evaluation
|__08a-codes_query_execution_and_selection.ipynb   - Augmenting CodeS process to select first correct SQL
|__08b-DINSQL-CodeS-schema-subsetting-analysis.ipynb - Evaluating schema subsets generated by CodeS and DINSQL
|__09-spider-query-analysis.ipynb                  - Performance metric creation for Spider DEV (Native and modified) inference

Before running any of these notebooks, you will need to follow the project setup steps listed below.

Natural View Building

One of the most pragmatic takeaways from our work on SNAILS is the idea of a natural view. Instead of adding additional information (e.g., column descriptions, example values, etc.) to a schema description in an LLM prompt, why don't we just make the target schema more natural? If you're building a new schema, that's relatively easy. If you're dealing with an already-existing schema, it's easier said then done. For the case of existing schemas, we offer a view building pipeline that incorporates our classifier and schema renamer.

SNAILS Natural View Creation Workflow:

You will need to follow the steps in the README located in SNAILS_Arifacts/naturalness_modifier/data_dict_reader to get set up for your database. Once the data_dict_reader has been configured, then run:

python ./classify_rename_and_build_view.py --database [your database name]

This will create a score file and a naturalness crosswalk file in ./schema_classifier_renamer_view_builder_output. Review the score file and make necessary corrections, then run:

python ./classify_rename_and_build_view.py --database [your database name] --build_view

This will create a .sql view file in the same directory. You can then run this file over your target database to generate your natural views. Note that it assumes the presence of a db_nl schema, which you should create prior to executing the view creation sql in your database.

If you want to try it out on a SNAILS database, run:

python ./classify_rename_and_build_view.py --database NYSED_SRC2022

and then

python ./classify_rename_and_build_view.py --database NYSED_SRC2022 --build_view

Which will create natural view .sql files for the SNAILS NYSED database using the NYSED PDF metadata and already-created fewshot prompt.

NL-to-SQL Experiment Quick Start

To rerun the NL-to-SQL experiments, you will need to, you may use the notebooks listed below. This does not include the identifier naturalness classification experiments--these are described in more detail later in this document. You can also find more detailed descriptions of these notebooks in the Experiment Notebooks section below.

  1. Generate NL-to-SQL and evaluate: run end-to-end-data-prep-and-prediction.py
  2. Evaluate tokens: tokenizer_analysis.ipynb
  3. Run query-level experiments: end-to-end-prototype-schema-and-query-analysis.ipynb
  4. Run identifier-level tokens: identifier-analysis.ipynb

Gold query profiling tool

In order to ensure a good usage of as many tables and columns as possible in each schema, we use a Gold query writing tool notebook that parses each query to extract tables and columns, and compares this output to all tables and columns in the target database. It gives a view of the tables and columns and indicates which ones have already been used, and which have not. It also provides a histogram plot of the count of different clauses within each query to provide a visual understanding of the distribution of query complexity for each question set.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published