Skip to content

A simple SQLite file analizer supporting basic queries

Notifications You must be signed in to change notification settings

jpcorreia99/sqlite-viewer

Repository files navigation

SQLite viewer

progress-banner

Contents

What?

Full implementation of the codecrafters "Build Your Own SQLite" Challenge.

This challenge consists of building program capable of interpreting the SQLite file format and answering basic queries.

I highly recommend codecrafters as a way to deepend system's knowledge and practice coding in a non-surface level context.

Usage

The sqlite_viewer.sh script supports multiple commands:

  • ./sqlite_viewer.sh <path_to_db> ".dbinfo"
    • Returns basic info about the db, such as page size and number of tables
  • ./sqlite_viewer.sh <path_to_db> ".tables"
    • Returns the name of the existing db tables
  • ./sqlite_viewer.sh <path_to_db> <QUERY>
    • Executes the query and prints the returned rows

Supported queries

NOTE: As the program depends on external libraries, please run pipenv installto setup.

As the name implies, this program only supports READ queries.

This repository includes some sample databases in databases/ for exploration.

List of types of supported queries and examples:

  • Total Count queries
    • ./sqlite_viewer.sh databases/sample.db "SELECT COUNT(*) FROM apples"
  • Column selection with filtering spanning multiple pages
    • ./sqlite_viewer.sh databases/superheroes.db "SELECT id, name FROM superheroes WHERE eye_color = 'Pink Eyes'"
  • Column selection relying on indices
    • ./sqlite_viewer.sh databases/companies.db "SELECT id, name FROM companies WHERE country = 'eritrea'"
    • This query is extra performant due to the existence of an idx_companies_countryindex

Understanding SQLite

There are plenty of very good resources to understand the SQLite file format:

Note: Node and Page are used interchangeably.

I wrote some detailed ramblings in docs/notes.md, but here's the super short TLDR:

  • SQLite stores tables and indices as B+Trees.These trees contain interior and leaf nodes.

  • So in total we have

    • Table Interior nodes
      • When a table span multiple nodes, these nodes point to which leaf nodes contain rows with different Row_Ids
    • Table Leaf nodes
      • Store rows of the tables
    • Index Interior nodes
      • When indices span multiple nodes, points to which nodes are reponsible for each section of the keyspace
    • Index Leaf nodes
      • For a section of the keyspace, point which row_ids satisfy the index

    Pretty much all pages follow the same structure, whith slight variations:

    • Page header: an 8-12 bytes long sequence that indicates the page type, and how many cells (rows or pointer) are in this page
    • Cell pointer array: an array of 2-byte cells pointing to where to search, inside of the page, for the payload
    • Payload array: a variable size array of payloads. In table leaf cells these are rows, everywhere else these are pairs of values + pointers to help navigate the tree/index.

An interesting detail of SQLite is the usage of varints

Varints

(Taken from https://fly.io/blog/sqlite-internals-btree/)

This encoding is used so that we don’t use a huge 8-byte field for every integer.

The high bit is used as a flag to indicate if there are more bytes to be read and the other 7 bits are our actual data.

To represent 1,000, we start with its binary representation split into 7 bit chunks:

0000111 1101000

We add a 1 to signify mnore chunks to come and 0 for the final

10000111 01101000

SQLite goes even further and uses this type to encode not only integers but data types, as per https://www.sqlite.org/fileformat2.html#record_format

About

A simple SQLite file analizer supporting basic queries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published