Skip to content

Commit 3e409d8

Browse files
committed
docs: Add some details on the SQL interface
1 parent ab19c1c commit 3e409d8

File tree

1 file changed

+82
-0
lines changed

1 file changed

+82
-0
lines changed

docs/implementation/sql-interface.md

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# SQL Queries
2+
3+
**This interface is extremely experimental. There is no guarantee that this
4+
interface will ever be brought to production use. It's solely here to help
5+
evaluate the utility of such an interface**
6+
7+
SQL queries can be issued by posting a JSON document to
8+
`/subgraphs/sql`. The server will respond with a JSON response that
9+
contains the records matching the query in JSON form.
10+
11+
The body of the request must contain the following keys:
12+
13+
* `deployment`: the hash of the deployment against which the query should
14+
be run
15+
* `query`: the SQL query
16+
* `mode`: either `info` or `data`. When the mode is `info` only some
17+
information of the response is reported, with a mode of `data` the query
18+
result is sent in the response
19+
20+
The SQL query can use all the tables of the given subgraph. Table and
21+
attribute names are snake-cased from their form in the GraphQL schema, so
22+
that data for `SomeDailyStuff` is stored in a table `some_daily_stuff`.
23+
24+
The query can use fairly arbitrary SQL, including aggregations and most
25+
functions built into PostgreSQL.
26+
27+
## Example
28+
29+
For a subgraph whose schema defines an entity `Block`, the following query
30+
```json
31+
{
32+
"query": "select number, hash, parent_hash, timestamp from block order by number desc limit 2",
33+
"deployment": "QmSoMeThInG",
34+
"mode": "data"
35+
}
36+
```
37+
38+
might result in this response
39+
```json
40+
{
41+
"data": [
42+
{
43+
"hash": "\\x5f91e535ee4d328725b869dd96f4c42059e3f2728dfc452c32e5597b28ce68d6",
44+
"number": 5000,
45+
"parent_hash": "\\x82e95c1ee3a98cd0646225b5ae6afc0b0229367b992df97aeb669c898657a4bb",
46+
"timestamp": "2015-07-30T20:07:44+00:00"
47+
},
48+
{
49+
"hash": "\\x82e95c1ee3a98cd0646225b5ae6afc0b0229367b992df97aeb669c898657a4bb",
50+
"number": 4999,
51+
"parent_hash": "\\x875c9a0f8215258c3b17fd5af5127541121cca1f594515aae4fbe5a7fbef8389",
52+
"timestamp": "2015-07-30T20:07:36+00:00"
53+
}
54+
]
55+
}
56+
```
57+
58+
## Limitations/Ideas/Disclaimers
59+
60+
Most of these are fairly easy to address:
61+
62+
* queries must finish within `GRAPH_SQL_STATEMENT_TIMEOUT` (unlimited by
63+
default)
64+
* queries are always executed at the subgraph head. It would be easy to add
65+
a way to specify a block at which the query should be executed
66+
* the interface right now pretty much exposes the raw SQL schema for a
67+
subgraph, though system columns like `vid` or `block_range` are made
68+
inaccessible.
69+
* it is not possible to join across subgraphs, though it would be possible
70+
to add that. Implenting that would require some additional plumbing that
71+
hides the effects of sharding.
72+
* JSON as the response format is pretty terrible, and we should change that
73+
to something that isn't so inefficient
74+
* the response contains data that's pretty raw; as the example shows,
75+
binary data uses Postgres' notation for hex strings
76+
* because of how broad the supported SQL is, it is pretty easy to issue
77+
queries that take a very long time. It will therefore not be hard to take
78+
down a `graph-node`, especially when no query timeout is set
79+
80+
Most importantly: while quite a bit of effort has been put into making this
81+
interface safe, in particular, making sure it's not possible to write
82+
through this interface, there's no guarantee that this works without bugs.

0 commit comments

Comments
 (0)