update README for SuperDB (#5367)

brimdata · Oct 25, 2024 · 56880f0 · 56880f0
1 parent 32552bc
commit 56880f0
Showing 1 changed file with 113 additions and 97 deletions.
diff --git a/README.md b/README.md
@@ -1,109 +1,125 @@
-# Zed [![Tests][tests-img]][tests] [![GoPkg][gopkg-img]][gopkg]
-
-Zed offers a new approach to data that makes it easier to manipulate and manage
-your data.
-
-With Zed's new
-[super-structured data model](https://zed.brimdata.io/docs/formats/#2-zed-a-super-structured-pattern),
-messy JSON data can easily be given the fully-typed precision of relational tables
-without giving up JSON's uncanny ability to represent eclectic data.
-
-Trying out Zed is easy: just [install](https://zed.brimdata.io/docs/#getting-started)
-the command-line tool [`zq`](https://zed.brimdata.io/docs/commands/zq/).
-
-`zq` is a lot like [`jq`](https://stedolan.github.io/jq/)
-but is built from the ground up as a search and analytics engine based
-on the [Zed data model](https://zed.brimdata.io/docs/formats/zed).
-Since Zed data is a proper superset of JSON, `zq` also works natively with JSON.
-
-While `zq` and the Zed data formats are production quality, the Zed project's
-[Zed data lake](https://zed.brimdata.io/docs/commands/zed/#1-the-lake-model)
-is a bit [earlier in development](https://zed.brimdata.io/docs/commands/zed/#status).
-
-For a non-technical user, Zed is as easy to use as web search
-while for a technical user, Zed exposes its technical underpinnings
-in a gradual slope, providing as much detail as desired,
-packaged up in the easy-to-understand
-[ZSON data format](https://zed.brimdata.io/docs/formats/zson) and
-[Zed language](https://zed.brimdata.io/docs/language).
-
-## Why?
-
-We think data is hard and it should be much, much easier.
-
-While _schemas_ are a great way to model and organize your data, they often
-[get in the way](https://github.com/brimdata/sharkfest-21#schemas-a-double-edged-sword)
-when you are just trying to store or transmit your semi-structured data.
-
-Also, why should you have to set up one system
-for search and another completely different system for historical analytics?
-And the same unified search/analytics system that works at cloud scale should run easily as
-a lightweight command-line tool on your laptop.
-
-And rather than having to set up complex ETL pipelines with brittle
-transformation logic, managing your data lake should be as easy as
-[`git`](https://git-scm.com/).
-
-Finally, we believe a lightweight data store that provides easy search and analytics
-would be a great place to store data sets for data science and
-data engineering experiments running in Python and providing easy
-integration with your favorite Python libraries.
-
-## How?
-
-Zed solves all these problems with a new foundational data format called
-[ZSON](https://zed.brimdata.io/docs/formats/zson),
-which is a superset of JSON and the relational models.
-ZSON is syntax-compatible with JSON
-but it has a comprehensive type system that you can use as little or as much as you like.
-Zed types can be used as schemas.
-
-The [Zed language](https://zed.brimdata.io/docs/language) offers a gentle learning curve,
-which spans the gamut from simple
-[keyword search](https://zed.brimdata.io/docs/language/#7-search-expressions)
-to powerful data-transformation operators like
-[lateral sub-queries](https://zed.brimdata.io/docs/language/#8-lateral-subqueries)
-and [shaping](https://zed.brimdata.io/docs/language/#9-shaping).
-
-Zed also has a cloud-based object design that was modeled after
-the `git` design pattern.  Commits to the lake are transactional
-and consistent.
-
-## Quick Start
-
-Check out the [installation page](https://zed.brimdata.io/docs/install/)
-for a quick and easy install.
-
-Detailed documentation for the entire Zed system and language
-is available on the [Zed docs site](https://zed.brimdata.io/docs).
-
-### Zui
-
-The [Zui app](https://github.com/brimdata/zui) is an Electron-based
-desktop app to explore, query, and shape data in your Zed lake.
-
-We originally developed Zui for security-oriented use cases
-(having tight integration with [Zeek](https://zeek.org/),
-[Suricata](https://suricata.io/), and
-[Wireshark](https://www.wireshark.org/)),
-but we are actively extending Zui with UX for handling generic
-data sets to support data science, data engineering, and ETL use cases.
+# SuperDB [![Tests][tests-img]][tests] [![GoPkg][gopkg-img]][gopkg]
+
+SuperDB is a new analytics database that supports relational tables and JSON 
+on an equal footing.  It shines when it comes to data wrangling where
+you need to explore or process large eclectic data sets.  It's also pretty
+decent at analytics and
+[search use cases](https://zed.brimdata.io/docs/language/search-expressions).
+
+Unlike other relational systems that do performance-fragile "schema inference" of JSON,
+SuperDB won't fall over if you throw a bunch of eclectic JSON at it.
+You can easily do
+[schema inference if you want](https://zed.brimdata.io/docs/language/operators/fuse),
+but data is ingested by default in its natural form no matter how much heterogeneity
+it might have.  And unlike systems based on the document data model,
+every value in SuperDB is strongly and dynamically typed thus providing the
+best of both worlds: the flexibility of the document model and
+the efficiency and performance of the relational model.
+
+In SuperDB's SQL dialect, there are no "JSON columns" so there isn't a "relational
+way to do things" and a different "JSON way to do things".  Instead of having
+a relational type system for structured data and completely separate JSON type
+system for semi-structured data,
+all data handled by SuperDB (e.g., JSON, CSV, Parquet files, Arrow streams, relational tables, etc) is automatically massaged into
+[super-structured data](https://zed.brimdata.io/docs/formats/#2-zed-a-super-structured-pattern)
+form.  This super-structured data is then processed by a runtime that simultaneously
+supports the statically-typed relational model and the dynamically-typed 
+JSON data model in a unified compute engine.
+
+## SuperSQL
+
+Here's a SuperSQL query that fetches some data from GitHub Archive,
+computes the set of repos touched by each user, ranks them by number of repos,
+picks the top five, and joins each user with their original `created_at` time
+from the current GitHub API:
+
+```sql
+FROM 'https://data.gharchive.org/2015-01-01-15.json.gz'
+|> SELECT union(repo.name) AS repo, actor.login AS user
+   GROUP BY user
+   ORDER BY len(repo) DESC LIMIT 5
+|> FORK (
+   => FROM f"https://api.github.com/users/${user}" 
+    |> SELECT VALUE {user:login,created_at:time(created_at)}
+   => PASS
+  )
+|> JOIN USING (user)
+```
+
+## Super JSON
+
+Super-structured data is strongly typed and "polymorphic": any value can take on any type 
+and sequences of data need not all conform to a predefined schema.  To this end,
+SuperDB extends the JSON format to support super-structured data in a format called
+[Super JSON](https://zed.brimdata.io/docs/formats/zson) where all JSON values 
+are also Super JSON values.  Similarly,
+the [Super Binary](https://zed.brimdata.io/docs/formats/zng) format is an efficient
+binary representation of Super JSON (a bit like Avro) and the
+[Super Columnar](https://zed.brimdata.io/docs/formats/vng) format is a columnar
+representation of Super JSON (a bit like Parquet).
+
+Even though SuperDB is based on these super-structured data formats, it can read and write
+most common data formats.
+
+## Try It
+
+Trying out SuperDB is super easy: just [install](https://zed.brimdata.io/docs/#getting-started)
+the command-line tool [`super`](https://zed.brimdata.io/docs/commands/zq/).
+
+Detailed documentation for the entire SuperDB system and its piped SQL syntax
+is available on the [SuperDB docs site](https://zed.brimdata.io/docs).
+
+The SuperDB query engine can run locally without a storage engine by accessing
+files, HTTP endpoints, or S3 paths using the `super` command. While
+[earlier in its development](https://zed.brimdata.io/docs/commands/zed/#status),
+SuperDB can also run on a
+[super-structured data lake](https://zed.brimdata.io/docs/commands/zed/#the-lake-model)
+using the `super db` sub-commands.
+
+## Piped Query Syntax
+
+The long-term goal for SuperDB's SQL syntax (SuperSQL) is to be Postgres-compatible and interoperate 
+with BI tools though this is currently a roadmap item.  At the same time, the project
+seeks to forge new ground on the usability of SQL for data exploration.  To this end,
+SuperSQL supports the
+[pipe query syntax](https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md)
+of GoogleSQL, recently described in their
+[VLDB 2024 paper](https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/).
+
+In addition to the GoogleSQL syntax, SuperSQL includes additional pipeline 
+operators to enhance usability, e.g., for search, for traversing 
+highly nested JSON, for data shaping, etc.
+
+To facilitate real-time, data exploration use cases,
+SuperDB supports an abbreviated form of SuperSQL called
+[SuperPipe](https://zed.brimdata.io/docs/language).
+
+SuperPipe provides a large number of shortcuts when typing interactive 
+queries, e.g., implied group-by clauses, dropping keywords,
+implied keyword searches, and so forth.  Even though SuperPipe is simply
+a short-hand form SuperSQL, it sort of looks like the pipeline-style
+languages utilized in search systems.
+
+### SuperDB Desktop - Coming Soon
+
+[SuperDB Desktop](https://github.com/brimdata/zui) is an Electron-based
+desktop app to explore, query, and shape data in a SuperDB data lake.
+It combines a search experience with a SQL query and has some really slick
+design for dealing with complex and large JSON data.
+
+Unlike most JSON browsing tools, it won't slow to a crawl --- or worse crash ---
+if you load it up with ginormous JSON values.
 
 ## Contributing
 
-See the [contributing guide](CONTRIBUTING.md) on how you can help improve Zed!
+See the [contributing guide](CONTRIBUTING.md) on how you can help improve SuperDB!
 
 ## Join the Community
 
 Join our [public Slack](https://www.brimdata.io/join-slack/) workspace for announcements, Q&A, and to trade tips!
 
-## Acknowledgment
-
-We modeled this README after
-Philip O'Toole's brilliantly succinct
-[description of `rqlite`](https://github.com/rqlite/rqlite).
-
 [tests-img]: https://github.com/brimdata/super/workflows/Tests/badge.svg
 [tests]: https://github.com/brimdata/super/actions?query=workflow%3ATests
 [gopkg-img]: https://pkg.go.dev/badge/github.com/brimdata/super
 [gopkg]: https://pkg.go.dev/github.com/brimdata/super
+