Skip to content

Latest commit

 

History

History
633 lines (423 loc) · 35.1 KB

README.md

File metadata and controls

633 lines (423 loc) · 35.1 KB

Bedsheets

Badge: Github Actions Badge: Docker Image Badge: Node Version Badge: License Badge: Github Stars

bedsheets is a Node.js proxy that lets you turn Google Sheets into a quick and dirty RESTful database. It's intended to be:

  • your simplest database, if you need something more complex look at our suggested alternatives
  • no-maintenance, so that you can deploy it once and forget about things
  • self-hosted, so that you don't have to risk unreliable or insecure 3rd-party services (plus, you can usually host it for free on most cloud providers!)

Google Sheets provides several great database features: built-in version control ⧉, collaborative management ⧉, a great table browser ⧉, powerful data processing functions ⧉, and basic visualization tools ⧉. By following simple conventions Bedsheets lets you introduce table schemas see docs and expose them over HTTP(S) with REST endpoints and JSON payloads.

Bedsheets is not trying to compete with real databases. Instead, it's trying to provide a database with a lower barrier to entry for small projects that you think "wow, it would be nice if this had a database, but I don't want to go through the hassle of deploying/configuring/maintaining the infrastructure for one". For example, @awendland uses it to store temperature data recorded by a Raspberry Pi.

Demo

Screen-recording showing a browser open to a Google Sheet on the left and a terminal open on the right. A curl command is entered into the terminal to perform a GET operation, which returns the data from the Google Sheet in a JSON format. A second curl command is run to POST new data to the sheet, and the Google Sheet on the left updates in real time. A third curl command is run to GET the Google Sheet's data, which includes the newly added data. Then, the Google Sheet is edited to change a column labeled "name" to now be "first_name". The GET curl command is run again, and this time the JSON payload is keyed with "first_name" instead of "name".

The latest version of Bedsheets is auto-deployed to https://demo-1-3gkwpsop5a-uc.a.run.app. A demo spreadsheet 1ajoVZn1zhg3HCF4cRpIZOBFRkNWsfXUC9rwVX_YQ70U is configured with a sample schema (it'll reset every 20 minutes).

Take a look at the spreadsheet and then run the following command (or view in ReqBin) to see Bedsheets in action:

curl -X GET "https://demo-1-3gkwpsop5a-uc.a.run.app/1ajoVZn1zhg3HCF4cRpIZOBFRkNWsfXUC9rwVX_YQ70U/Playground"

The spreadsheet is publicly editable, so feel free to modify it and test out how the schema system works!

Table of Contents

Usage

This assumes you've already deployed Bedsheets (see instructions). A single Bedsheets deployment can work with as many Google Sheets as you want. You can also test things out using the demo deployment.

Brief terminology (see more terminology here):

  • Spreadsheets map to Databases.
  • Sheets (the tabs on the bottom of the spreadsheet) map to Tables.

RESTful API

Bedsheets exposes database CRUD operations are exposed over HTTP(S) in a RESTful manner. HTTP Methods are used to indicate intent (such as GET meaning "retrieve data" and POST meaning "append new data"). All response payloads are serialized as JSON. Error codes are used to indicate issues, such as a bad table name returning 404 (see more information about errors).

Spreadsheet IDs

Each database (ie. spreadsheet) is represented by a spreadsheet ID. These can be retrieved by looking at the portion of the spreadsheet URL after https://docs.google.com/spreadsheets/d/ and before the next /. For example, https://docs.google.com/spreadsheets/d/1Hhd74jl8Mrg5r8ZJAg-Loljd92LLeC0_LepA6NBuUbUA/edit#gid=1747294294 has the ID 1Hhd74jl8Mrg5r8ZJAg-Loljd92LLeC0_LepA6NBuUbUA.

expand to see example screenshot

Google Sheets - Share Link

GET /{spreadsheet_id}/{sheet_name}

Retrieve entries from the {sheet_name} table. Entries will be returned as an array of objects.

Additional parameters:

These should be provided as query parameters ⧉.

  • offset={positive_integer} - Only return results after the first offset results in the table. Defaults to 0.

  • limit={positive_integer} - Constrain the response to only have up to limit entries. Defaults to infinity.

Example:

For a spreadsheet with the following table and data:

Sheet1

name favorite_food age
Rachel Broccoli 23
Shriank Pizza 19

The command:

curl "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1"

Would return:

[
  {
    "name": "Rachel",
    "favorite_food": "Broccoli",
    "age": 23
  },
  {
    "name": "Shriank",
    "favorite_food": "Pizza",
    "age": 19
  }
]

While the command:

curl "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1?offset=1"

Would return:

[
  {
    "name": "Shriank",
    "favorite_food": "Pizza",
    "age": 19
  }
]

GET /{spreadsheet_id}/{sheet_name}/describe

Retrieve information about the {sheet_name} table. This method is primarily intended for debugging, so that you can see how Bedsheets is interpreting your sheet configuration.

Example:

For a spreadsheet with the following table and data:

Sheet1

name favorite_food age
Rachel Broccoli 23
Shriank Pizza 19

The command:

curl "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1/describe"

Would return:

{
  "headers": ["name", "favorite_food", "age"]
}

POST /{spreadsheet_id}/{sheet_name}

Append new entries to the {sheet_name} table. All entries must conform to the table schema otherwise the entire request will reject.

Additional parameters:

These should be provided as query parameters.

  • strict={boolean} - If enabled, request entries must have all keys specified by the table schema and may not have any extra. Defaults to true to improve developer experience when first experimenting with Bedsheets.

Example:

For a spreadsheet with the following table and data:

Sheet1

name age
Rachel 23
Shriank 19

The command:

curl -X POST "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1" --data '[{"name": "Nancy", "age": 35}]"

Would return:

{
  "updatedRange": "Sheet1!A4:B4",
  "updatedRowCount": 1
}

and the table would be updated to look like:

Sheet1

name age
Rachel 23
Shriank 19
Nancy 35

While the command:

curl -X POST "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1" --data '[{"name": "Nancy", "weight": 10}]"

Would return:

{
  "sheet": "Sheet1",
  "malformedEntries": [
    {
      "value": { "name": "Nancy", "weight": 10 },
      "index": 0,
      "fields": {
        "missing": ["age"],
        "extra": ["weight"]
      }
    }
  ]
}

Errors

Missing Spreadsheet (404)

Returned when the provided spreadsheet id does not correspond to a valid spreadsheet.

{
  "spreadsheetId": "some-bad-id"
}
Missing Sheet (404)

Returned when the provided sheet name does not correspond to a valid sheet.

{
  "sheet": "NotARealSheet"
}
Misconfigured Sheet (502)

Returned when the requested sheet has an invalid schema (see instructions for sheet schema). A reason will be included:

  • "NO_HEADERS" occurs when the first row of the sheet is empty
  • "DUPLICATE_HEADERS" occurs when the first row contains the same value at least twice
{
  "sheet": "Sheet1",
  "reason": "NO_HEADERS"
}
Too Many Requests (429)

Returned when the Google Sheets API request quota has been exceeded. See the FAQ for a discussion on these limits.

Bad Data (400)

Returned when the request payload doesn't match the sheet schema.

{
  "sheet": "Sheet1",
  "malformedEntries": [
    {
      "value": { "name": "Patrick", "weight": 10 },
      "index": 0,
      "fields": {
        "missing": ["age"],
        "extra": ["weight"]
      }
    }
  ]
}
Unknown (400)

Any other errors thrown by the underlying googleapis interface will be logged to stderr and will either return the original status code or 400 if no status code was present. They will include the original error message in the response payload to assist with immediate debugging without exposing too much sensitive information.

CORS

The server will respond to requests with an Origin header by reflecting the value back in Access-Control-Allow-Origin. It will similarly reflect values for Access-Control-Allow-Methods and Access-Control-Allow-Headers. OPTIONS requests will be returned with 204 status codes, unless they request invalid URLs (in which case they will be handled like normal).

This should enable browser-based access to Bedsheet's APIs. To disable CORS support pass DISABLE_CORS=true as an environment variable.

Sheets Configuration

Sheet Schema

A simple schema can be enforced for the database (ie. spreadsheet, see terminology).

To create a new table, add a new sheet to the spreadsheet. The sheet's name will be used as the table name.

The table's schema is set by the values in the first row of the sheet. These values define the table's columns. Schemas are parsed during each request, so any changes are reflected immediately in subsequent requests.

The schemas do not support data validation besides ensuring that all columns are present. To treat a column as optional, set the payload value to null or an emptry string "".

Manage Access

To enable Bedsheets to access a spreadsheet you need to invite the Service Account (see instructions) to the spreadsheet. If you add the Service Account as an Editor on a sheet Bedsheets will have write access to, if you add it as a Viewer then Bedsheets will only have have read access.

You invite the Service Account like you'd invite any other user, via the share menu. The Service Account will have an email address in the form ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com.

expand to see example screenshots

Google Sheets with an open share menu containing the service accounts email address

Deploy

The goal of Bedsheets is to be as easy to deploy as possible. The hardest part is preparing the Service Account (step 1.C.) that'll be used to interact with Google Sheets. The actual server deployment should be straightforward thanks to Docker and new serverless offerings.

1. Create a Service Account

1.A. Create a Google Cloud Project

This project will be used to grant access to the Google Sheets API. If you already have a Google Cloud Project, then you can skip this step.

Step 1. Navigate to the Google Cloud Console ⧉ (expand to see a screenshot)

001 - Console - Dashboard

Step 2. Create a new project

002 - Console - New Project

003 - Console - New Project - Created

004 - Console - Project - Dashboard

1.B. Enable the Google Sheets API

The Google Sheets API is disabled by default in Google Cloud Projects so you must enable it before Bedsheets can use it.

Step 1. Navigate to the Google Sheets entry in the API Library (link ⧉)

105 - Console - Library Menu

106 - Console - Library - Search

Step 2. Click "Enable"

107 - Console - Library - Google Sheets

108 - Console - Library - Google Sheets - Enabled

1.C. Create a Service Account

As described by Google Cloud > Security & Identity Products > Service Accounts ⧉:

A service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person. Applications use service accounts to make authorized API calls.

Bedsheets uses a Service Account to perform it's operations. This service account is identified by an email address in the form ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com. You can grant it access to specific spreadsheets by inviting it via its email.

For security, the Service Account will be granted the "Service Account User" ⧉ role, which only allows the Service Account to perform standard user operations (such as editing a Google Sheet it was invited to).

Step 1. Create a new Service Account (link)

200 - Console - Project - Credentials

201 - Console - Project - Credentials - Create Dropdown

202 - Console - Create Service Account

Step 2. Apply the "Service Account User" role

205 - Console - Create Service Account - Role - Service Account User

Step 3. Download the Service Account login credentials as JSON

206 - Console - Create Service Account - Key

207 - Console - Create Service Account - Key - Json

If you are using Google Cloud Run or Cloud Functions then you can skip the credentials download step and instead assign the Service Account directly to the execution environment which Bedsheets will automatically adopt (see the section on Cloud Providers for more information).

2. Deploy to a Cloud Provider

Bedsheets has been packaged as a Docker image for easy deployment (see docker for general instructions).

Google Cloud Run (Preferred)

Follow the instructions at Google Cloud Run: Pre-Built Deploys ⧉ and use the following parameters:

expand to see parameters

These parameters are biased towards reducing costs.

Parameter Recommended Value Commentary
Authentication Allow unauthenticated invocations this enables standard HTTP REST requests w/ IAM credentials
Container image URL gcr.io/bedsheets/rest-server:latest
Container port 8080 the value doesn't matter, since the server will bind to whatever port is in the $PORT env var
Container command leave blank the Docker image's default command will be used
Container arguments leave blank same as above
Service account Set to the account you created in C. Create Service Account the server will automatically use the ambient credentials from this service account
Capacity 80 the server can handle multiple requests, so this value can be set to the max to avoid spinning up extra instances or rejecting requests
Request timeout 300 most requests finish in <1, even when 1000s of rows are being returned
CPU allocated 1 the server isn't performing much processing, so a low value is fine
Memory allocated 256 MiB the server uses about 100 MiB of memory when handling a single request
Maximum number of instances 1 if more than Capacity requests come in concurrently, then they will be rejected if autoscaling is 1, which is likely fine since Google Sheets rate limits at 100 requests per 100 seconds anyways

As of June 2020, Google Cloud Run provides 180,000 vCPU-seconds per month for free (we're focused only on this metric since both the memory and request constraints in the free plan are more generous than this CPU restriction). Since most requests in Bedsheets take ~1 second, this translates to 100,000s requests for free each month!

Google Cloud Functions

TODO: A new adapter needs to be written to support the pre-parsed request model adopted by most FaaS providers.

AWS Fargate

TODO: @bedsheets/rest-server should support the Fargate request model. Needs to be tested. A template should be written so that users can easily get started without needing to draft their own ECS deployment configuration from scratch.

AWS Lambda

TODO: See Google Cloud Functions.

Heroku

TODO: @bedsheets/rest-server should support the Heroku request model. Needs to be tested.

General Deployment Info

Deployment Parameters

Bedsheets has several deployment parameters which can be provided as environment variables.

  • PORT - Which port should Bedsheets bind to? Defaults to 3141.
  • LOG_LEVEL - How verbose should logging be? Options are NONE | ERROR | WARN | INFO | TRACE. Defaults to INFO.

Google Sheets API Credentials

Google Sheets API credentials (ie. the credentials for the Service Account) can be supplied to Bedsheets several ways.

  • Runtime Context - Recommended (when available) - If Bedsheets is being run on Google Cloud Platform then the Service Account can be attached to the runtime environment. Bedsheets will automatically adopt the appropriate credentials from the runtime.
  • Environment Variables - Recommended - To provide credentials to Bedsheets outside of GCP you should use the environment variables GOOGLE_AUTH_CLIENT_EMAIL set to the Service Accounts email and GOOGLE_AUTH_PRIVATE_KEY set to the PEM-encoded private key from the credentials JSON. If set, these values will override other forms of authentication.
  • Credentials File - Not Recommended - Bedsheets will use the information from a Service Account's credentials JSON file if the path to the file is set in the environment variable GOOGLE_APPLICATION_CREDENTIALS and the file is accessible to the Bedsheets process.

Docker

Bedsheets is available as a Docker image on Docker Hub (entry ⧉) under the label bedsheets/rest-server. Master is auto-built and tagged as latest, and git tags are also built and pushed.

To run Bedsheets locally, use:

docker run -e GOOGLE_AUTH_CLIENT_EMAIL=$SA_EMAIL GOOGLE_AUTH_PRIVATE_KEY=$SA_PRIVATE_KEY -p 3141:3141 bedsheets/rest-server

Comparison to Traditional DBs

Terminology

Bedsheets Postgres Description
Spreadsheet ID Database Name / dbname ⧉ The instance/server being accessed
Sheet Table The model/entity being retrieved
Editor/Viewer Read/Write access The user's data permissions
Service Account User The user/credentials used to access the DB

FAQ

What's a Spreadsheet ID or A1 Notation?

See Google's Sheet API Concepts ⧉ for an overview of these Google Sheets (and, in the case of A1 Notation, general spreadsheet) concepts.

How are dates stored in Google Sheets?

See Google's Sheet API Concepts: Date & Time ⧉ section for an overview of how spreadsheets store date & time.

Why am I getting a 403 error when trying to access my sheet?

There are several trouble-shooting steps to work through:

  1. Is the Service Account added as an Editor to the Google Sheet?
  2. Is the spreadsheetId pointing to the correct sheet? Check by replacing https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit with the spreadsheetId that you're providing to Bedsheets and make sure that the correct Google Sheet is appearing.
  3. Does the Service Account have the Service Account User role?

How much data can I fit in a Google Sheet?

According to GSuiteTips ⧉ and SpreadSheetPoint ⧉, Google Sheets supports a maximum of 5 million cells per spreadsheet, up to 200 tabs, and up to 18,278 columns. A default Sheet will have 26 columns while most use cases only need <10, so it might be possible to extend the number of rows you can have by deleting any empty columns (# rows = # max cells / # columns). I haven't tested these limits yet (TODO test these limits!). Apparently, an update can only add 40,000 new rows at a time as well.

How many requests can I make per second?

The Google Sheets API quota will be the bottleneck when trying to make many requests. By default, the Google Sheets API permits 100 requests per 100 seconds ⧉ per service account, with reads and writes tracked separately. This may be increasable to 500 requests per 100 seconds by submitting a support request (see this stackoverflow answer ⧉).

v1.0 Blockers

The following items are currently considered blockers before a v1.0 release will be declared.

  1. Support for PUT (ie. update) operations.
  2. Support for DELETE operations.
  3. An adapter enabling FaaS deployments (eg. AWS Lambda, Google Cloud Functions).
  4. Robust validation of input (consider io-ts ⧉).
  5. Standardized, friendly errors for invalid inputs.
  6. Determine if basic authentication should be added (a per-deployment secret key? per-spreadsheet?).
  7. Consider adding a GraphQL abstraction in addition to REST.

Suggested Alternatives

See awesome-serverless's list of databases ⧉ for a more extensive list of low-maintenance DBs.

Contributing

Project Structure

This project uses lerna with yarn workspaces to manage a variety of packages. To kick things off, run yarn in the repo root. To build all packages, run yarn build:all. To execute all tests, run yarn test:all.

Currently, the project is composed of the following packages:

  • @bedsheets/google-sheets-dal - This is a data access layer for Google Sheets that abstracts the sheets_v4.Sheets API in the googleapis package to provide a simpler object array interface for working with Google Sheets.
  • @bedsheets/rest-server - This is an HTTP server that exposes the operations from @bedsheets/google-sheets-dal in a RESTful way (eg. translating the append operation into POST requests).
  • @bedsheets/test-helpers - This is an internal module which provides shared infrastructure for writing integration tests, such as a tool for seeding spreadsheets for testing.

Intraproject Package Dependencies

Yarn Workspaces enables packages to depend directly on each other. These packages still need to be compiled before they can be used though, since they are still consumed through node_modules with each package.json defining how they should be imported (i.e. consuming them intra-project is the same as an end-user consuming them from the npm registry).

Dependency Management

Any dependency that isn't needed for end-user package functionality (such as testing, compilation, or other developer niceties) should be installed as a dev dependency (yarn add -D package_name).

Each package should declare all packages that it needs, it should not rely on their ambient availability from the root package.json. If a dev dependency is being used by multiple packages, it should be installed via the root package.json (using yarn add -D package_name -W) and the packages that require it should specify the version as * (e.g. see the typescript declaration under devDependencies in each packages/*/package.json).

Tests

Tests should be stored under a tests folder in each package's folder (e.g. ./packages/google-sheets-dal/tests/). Tests should be further subdivided into unit and integration, where unit tests do NOT interact with the internet or other services on the host, and integration tests do rely on external services. Tests should be written in Typescript, and will be run using jest and ts-jest.

Environment Compatibility

Node

To ensure that these packages can run across all targeted Node versions @types/node should be set to the oldest supported version.

Furthermore, CI should run against the oldest supported version (with the assumption that Node had no breaking changes since).

Typescript

TODO: figure out how to test .d.ts files against older Typescript versions to ensure that they use syntax that isn't too new.

CI

Github Actions is used to provide continuous integration services. See .github/workflows for details on what's being executed. All CI must pass before PRs will be merged.

Non-Goals

Bedsheets is aiming to be a maximally simple database for quick/hacky/hobby projects. Therefore, it has several non-goals to avoid feature creep which may impair these core targets:

  • Configurability - Besides defining table columns Bedsheets is not trying to support any other type of configuration. Bedsheets should not support additional payload validation or column types.
  • Optimized performance - Bedsheets should not attempt to cache responses from the Google Sheets API in order to reduce latency (such as saving schema to improve append operations). Furthermore, the Bedsheets source code should be easy to read so it's acceptable to use a more beginner friendly approach instead of a more performant one when doing things such as data transformations.