From 6811e2e6de7b17b4782aee1b646f684d030f1f46 Mon Sep 17 00:00:00 2001
From: Scott Twiname <skott.twiname@gmail.com>
Date: Wed, 13 Mar 2024 11:48:55 +1300
Subject: [PATCH] Improve docs on self hosting subquery (#494)

* Improve docs on self hosting subquery

* Self hosted improvements

---------

Co-authored-by: James Bayly <james@bayly.xyz>
---
 docs/run_publish/optimisation.md | 24 +++++++++++++++++++++++-
 docs/run_publish/run.md          | 32 +++++++++++++++++++++++++++++++-
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/docs/run_publish/optimisation.md b/docs/run_publish/optimisation.md
index 3849c603773..d8ce19b71b8 100644
--- a/docs/run_publish/optimisation.md
+++ b/docs/run_publish/optimisation.md
@@ -8,9 +8,25 @@ Use `node worker threads` to move block fetching and block processing into its o
 
 You should also adjust and play around with the various arguments that control how SubQuery uses a store to improve indexing performance by making expensive database operations in bulk. In particular, you can review `--store-cache-threshold`, `--store-get-cache-size`, `--store-cache-async`, and `--store-flush-interval` - read more about these settings in our [references](./references.md#store-cache-threshold).
 
+## Ensure you are monitoring correctly
+
+We have a [monitoring guide](./monitor) for how to check the progress of indexing. You can also setup health check monitors for the individual [indexer node](./run.md#monitoring-indexer-health) and [query service](./run.md#monitoring-query-service-health)
+
+You should also monitor the system resources available such as CPU, Memory and Disk usage, as well as external RPC endpoints that your project connects to. Don't forget to monitor database capacity to ensure you don't run out of disk space and corrupt your store.
+
 ## DDOS Mitigation
 
-SubQuery runs well behind an API gateway or a DDOS mitigation service. For any public project that is run in a production configuration, setting up a gateway, web application firewall, or some other protected endpoint is recommended.
+SubQuery runs well behind an API gateway or a DDOS mitigation service. For any public project that is run in a production configuration, setting up a gateway, web application firewall, or some other protected endpoint is recommended (see more in [Security](#security)).
+
+Add rate limits or other request restrictions using your API gateway
+
+## Security
+
+Ensure docker API is not accessible if you're using docker. Don't run services with the root user. Our docker images already do this.
+
+Use an API gateway like [nginx](https://nginx.org/en/) or [kong](https://konghq.com/) to provide SSL for your query service and ensure you only expose necessary ports.
+
+Only the query service needs to be public and this should be done through the API Gateway.
 
 ## Request Caching
 
@@ -22,6 +38,8 @@ In our own managed service, we've been able to run a number of SubQuery projects
 
 The next step our team will usually carry out is split the database into a read-write replica architecture. One database instance is the writer (that the @subql/node service connects to), while the other is the reader (that the @subql/query service connects to). We will do this before splitting up projects into different databases as it generally makes a huge improvement to database I/O.
 
+Make sure your backup your database regularly, ideally with scheduled automation.
+
 ## Run Multiple Query Services
 
 SubQuery is designed so that you can run multiple query services behind a load balancer for redundancy and performance. Just note that unless you have multiple read replicas of the database, you're performance will quickly become db constrained.
@@ -37,3 +55,7 @@ GraphQL is extremely powerful, but one of the downsides is that it allows users
 - `--unsafe` is a flag that enables some advanced features like [GraphQL aggregations](./query/aggregate.md), these may have performance impacts, [read more here](./references.md#unsafe-query-service)
 
 You should also consider reading this excellent guide from Apollo on how they recommend you secure your [GraphQL API from malicious queries](https://www.apollographql.com/blog/graphql/security/securing-your-graphql-api-from-malicious-queries/).
+
+## Deal with Exceptions and Outages
+
+Ensure services restart automatically either from the service stopping or the system stopping. If you use docker this will be managed for you, if using linux then [systemd](https://systemd.io/) is a good option.
diff --git a/docs/run_publish/run.md b/docs/run_publish/run.md
index 228a586231d..01242293ab3 100644
--- a/docs/run_publish/run.md
+++ b/docs/run_publish/run.md
@@ -5,6 +5,12 @@ Don't want to worry about running your own SubQuery infrastructure? SubQuery pro
 
 **There are two ways to run a project locally, [using Docker](#using-docker) or running the individual components using NodeJS ([indexer node service](#running-an-indexer-subqlnode) and [query service](#running-the-query-service)).**
 
+::: tip Location is everything
+
+Run the services geographically close to one another and where you think most requests will come from. Running the node or query service far away from the DB will massively decrease performance.
+
+:::
+
 ## Using Docker
 
 An alternative solution is to run a **Docker Container**, defined by the `docker-compose.yml` file. For a new project that has been just initialised you won't need to change anything here.
@@ -271,7 +277,7 @@ When the indexer first indexes the chain, fetching single blocks will significan
 SubQuery uses Node.js, by default this will use 4GB of memory. If you are running into memory issues or wish to get the most performance out of indexing you can increase the memory that will be used by setting the following environment variable `export NODE_OPTIONS=--max_old_space_size=<memory-in-MB>`. It's best to make sure this only applies to the node and not the query service.
 :::
 
-#### Check your node health
+#### Monitoring Indexer Health
 
 There are 2 endpoints that you can use to check and monitor the health of a running SubQuery node.
 
@@ -322,6 +328,8 @@ If an incorrect URL is used, a 404 not found error will be returned.
 }
 ```
 
+You should also be [regularly monitoring your query service health](#monitoring-query-service-health).
+
 #### Debug your project
 
 Use the [node inspector](https://nodejs.org/en/docs/guides/debugging-getting-started/) to run the following command.
@@ -365,3 +373,25 @@ subql-query --name <project_name> --playground
 Make sure the project name is the same as the project name when you [initialize the project](../quickstart/quickstart.md#_2-initialise-the-subquery-starter-project). Also, check the environment variables are correct.
 
 After running the subql-query service successfully, open your browser and head to `http://localhost:3000`. You should see a GraphQL playground showing in the Explorer and the schema that is ready to query.
+
+::: warning
+
+The query service will fail to start if the node has not yet created the DB schema for your project. If you are automating the startup of your project, please ensure that the node service always starts and is running healthy first - you can see an example of how we do this in the default `docker-compose.yaml`
+
+:::
+
+### Monitoring Query Service Health
+
+Unlike the indexer node, there is no specific health check route. Instead you can make a simple GraphQL query such as getting the metadata:
+
+```shell
+curl 'http://localhost:3000' -X POST --data-raw '{"query":"{\n  _metadata {\n    chain\n    lastProcessedHeight\n    lastProcessedTimestamp\n  }\n}"}'
+```
+
+## Recommendations for Self Hosting in a Production Environment
+
+If you wish to self host SubQuery in a production manner there are many other things to consider. These can vary greatly depending on how you choose to run SubQuery so we while we might find it hard to support your team, we hope to point you in the right direction.
+
+It is recommended that you are familiar with running web services in production, if this sounds like too much work we provide the [SubQuery Managed Service](https://managedservice.subquery.network) to provide all of this functionality for you.
+
+**You will want to review [Running High Performance SubQuery Infrastructure](./optimisation.md).**