|
| 1 | +# SQL Queries |
| 2 | + |
| 3 | +**This interface is extremely experimental. There is no guarantee that this |
| 4 | +interface will ever be brought to production use. It's solely here to help |
| 5 | +evaluate the utility of such an interface** |
| 6 | + |
| 7 | +SQL queries can be issued by posting a JSON document to |
| 8 | +`/subgraphs/sql`. The server will respond with a JSON response that |
| 9 | +contains the records matching the query in JSON form. |
| 10 | + |
| 11 | +The body of the request must contain the following keys: |
| 12 | + |
| 13 | +* `deployment`: the hash of the deployment against which the query should |
| 14 | + be run |
| 15 | +* `query`: the SQL query |
| 16 | +* `mode`: either `info` or `data`. When the mode is `info` only some |
| 17 | + information of the response is reported, with a mode of `data` the query |
| 18 | + result is sent in the response |
| 19 | + |
| 20 | +The SQL query can use all the tables of the given subgraph. Table and |
| 21 | +attribute names are snake-cased from their form in the GraphQL schema, so |
| 22 | +that data for `SomeDailyStuff` is stored in a table `some_daily_stuff`. |
| 23 | + |
| 24 | +The query can use fairly arbitrary SQL, including aggregations and most |
| 25 | +functions built into PostgreSQL. |
| 26 | + |
| 27 | +## Example |
| 28 | + |
| 29 | +For a subgraph whose schema defines an entity `Block`, the following query |
| 30 | +```json |
| 31 | +{ |
| 32 | + "query": "select number, hash, parent_hash, timestamp from block order by number desc limit 2", |
| 33 | + "deployment": "QmSoMeThInG", |
| 34 | + "mode": "data" |
| 35 | +} |
| 36 | +``` |
| 37 | + |
| 38 | +might result in this response |
| 39 | +```json |
| 40 | +{ |
| 41 | + "data": [ |
| 42 | + { |
| 43 | + "hash": "\\x5f91e535ee4d328725b869dd96f4c42059e3f2728dfc452c32e5597b28ce68d6", |
| 44 | + "number": 5000, |
| 45 | + "parent_hash": "\\x82e95c1ee3a98cd0646225b5ae6afc0b0229367b992df97aeb669c898657a4bb", |
| 46 | + "timestamp": "2015-07-30T20:07:44+00:00" |
| 47 | + }, |
| 48 | + { |
| 49 | + "hash": "\\x82e95c1ee3a98cd0646225b5ae6afc0b0229367b992df97aeb669c898657a4bb", |
| 50 | + "number": 4999, |
| 51 | + "parent_hash": "\\x875c9a0f8215258c3b17fd5af5127541121cca1f594515aae4fbe5a7fbef8389", |
| 52 | + "timestamp": "2015-07-30T20:07:36+00:00" |
| 53 | + } |
| 54 | + ] |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +## Limitations/Ideas/Disclaimers |
| 59 | + |
| 60 | +Most of these are fairly easy to address: |
| 61 | + |
| 62 | +* queries must finish within `GRAPH_SQL_STATEMENT_TIMEOUT` (unlimited by |
| 63 | + default) |
| 64 | +* queries are always executed at the subgraph head. It would be easy to add |
| 65 | + a way to specify a block at which the query should be executed |
| 66 | +* the interface right now pretty much exposes the raw SQL schema for a |
| 67 | + subgraph, though system columns like `vid` or `block_range` are made |
| 68 | + inaccessible. |
| 69 | +* it is not possible to join across subgraphs, though it would be possible |
| 70 | + to add that. Implenting that would require some additional plumbing that |
| 71 | + hides the effects of sharding. |
| 72 | +* JSON as the response format is pretty terrible, and we should change that |
| 73 | + to something that isn't so inefficient |
| 74 | +* the response contains data that's pretty raw; as the example shows, |
| 75 | + binary data uses Postgres' notation for hex strings |
| 76 | +* because of how broad the supported SQL is, it is pretty easy to issue |
| 77 | + queries that take a very long time. It will therefore not be hard to take |
| 78 | + down a `graph-node`, especially when no query timeout is set |
| 79 | + |
| 80 | +Most importantly: while quite a bit of effort has been put into making this |
| 81 | +interface safe, in particular, making sure it's not possible to write |
| 82 | +through this interface, there's no guarantee that this works without bugs. |
0 commit comments