bedsheets
is a Node.js proxy that lets you turn Google Sheets into a quick and dirty RESTful database. It's intended to be:
- your simplest database, if you need something more complex look at our suggested alternatives
- no-maintenance, so that you can deploy it once and forget about things
- self-hosted, so that you don't have to risk unreliable or insecure 3rd-party services (plus, you can usually host it for free on most cloud providers!)
Google Sheets provides several great database features: built-in version control ⧉, collaborative management ⧉, a great table browser ⧉, powerful data processing functions ⧉, and basic visualization tools ⧉. By following simple conventions Bedsheets lets you introduce table schemas see docs and expose them over HTTP(S) with REST endpoints and JSON payloads.
Bedsheets is not trying to compete with real databases. Instead, it's trying to provide a database with a lower barrier to entry for small projects that you think "wow, it would be nice if this had a database, but I don't want to go through the hassle of deploying/configuring/maintaining the infrastructure for one". For example, @awendland uses it to store temperature data recorded by a Raspberry Pi.
The latest version of Bedsheets is auto-deployed to https://demo-1-3gkwpsop5a-uc.a.run.app. A demo spreadsheet 1ajoVZn1zhg3HCF4cRpIZOBFRkNWsfXUC9rwVX_YQ70U is configured with a sample schema (it'll reset every 20 minutes).
Take a look at the spreadsheet and then run the following command (or view in ReqBin) to see Bedsheets in action:
curl -X GET "https://demo-1-3gkwpsop5a-uc.a.run.app/1ajoVZn1zhg3HCF4cRpIZOBFRkNWsfXUC9rwVX_YQ70U/Playground"
The spreadsheet is publicly editable, so feel free to modify it and test out how the schema system works!
- Usage
- Deploy
- Comparison to Traditional DBs
- FAQ
- v1.0 Blockers
- Suggested Alternatives
- Contributing
This assumes you've already deployed Bedsheets (see instructions). A single Bedsheets deployment can work with as many Google Sheets as you want. You can also test things out using the demo deployment.
Brief terminology (see more terminology here):
- Spreadsheets map to Databases.
- Sheets (the tabs on the bottom of the spreadsheet) map to Tables.
Bedsheets exposes database CRUD operations are exposed over HTTP(S) in a RESTful manner. HTTP Methods are used to indicate intent (such as GET
meaning "retrieve data" and POST
meaning "append new data"). All response payloads are serialized as JSON. Error codes are used to indicate issues, such as a bad table name returning 404
(see more information about errors).
Each database (ie. spreadsheet) is represented by a spreadsheet ID. These can be retrieved by looking at the portion of the spreadsheet URL after https://docs.google.com/spreadsheets/d/
and before the next /
. For example, https://docs.google.com/spreadsheets/d/1Hhd74jl8Mrg5r8ZJAg-Loljd92LLeC0_LepA6NBuUbUA/edit#gid=1747294294
has the ID 1Hhd74jl8Mrg5r8ZJAg-Loljd92LLeC0_LepA6NBuUbUA
.
Retrieve entries from the {sheet_name}
table. Entries will be returned as an array of objects.
Additional parameters:
These should be provided as query parameters ⧉.
-
offset={positive_integer}
- Only return results after the firstoffset
results in the table. Defaults to0
. -
limit={positive_integer}
- Constrain the response to only have up tolimit
entries. Defaults toinfinity
.
Example:
For a spreadsheet with the following table and data:
Sheet1
name favorite_food age Rachel Broccoli 23 Shriank Pizza 19
The command:
curl "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1"
Would return:
[
{
"name": "Rachel",
"favorite_food": "Broccoli",
"age": 23
},
{
"name": "Shriank",
"favorite_food": "Pizza",
"age": 19
}
]
While the command:
curl "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1?offset=1"
Would return:
[
{
"name": "Shriank",
"favorite_food": "Pizza",
"age": 19
}
]
Retrieve information about the {sheet_name}
table. This method is primarily intended for debugging, so that you can see how Bedsheets is interpreting your sheet configuration.
Example:
For a spreadsheet with the following table and data:
Sheet1
name favorite_food age Rachel Broccoli 23 Shriank Pizza 19
The command:
curl "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1/describe"
Would return:
{
"headers": ["name", "favorite_food", "age"]
}
Append new entries to the {sheet_name}
table. All entries must conform to the table schema otherwise the entire request will reject.
Additional parameters:
These should be provided as query parameters.
strict={boolean}
- If enabled, request entries must have all keys specified by the table schema and may not have any extra. Defaults totrue
to improve developer experience when first experimenting with Bedsheets.
Example:
For a spreadsheet with the following table and data:
Sheet1
name age Rachel 23 Shriank 19
The command:
curl -X POST "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1" --data '[{"name": "Nancy", "age": 35}]"
Would return:
{
"updatedRange": "Sheet1!A4:B4",
"updatedRowCount": 1
}
and the table would be updated to look like:
Sheet1
name age Rachel 23 Shriank 19 Nancy 35
While the command:
curl -X POST "$DEMO_HOST/$DEMO_SPREADSHEET/Sheet1" --data '[{"name": "Nancy", "weight": 10}]"
Would return:
{
"sheet": "Sheet1",
"malformedEntries": [
{
"value": { "name": "Nancy", "weight": 10 },
"index": 0,
"fields": {
"missing": ["age"],
"extra": ["weight"]
}
}
]
}
Returned when the provided spreadsheet id does not correspond to a valid spreadsheet.
{
"spreadsheetId": "some-bad-id"
}
Returned when the provided sheet name does not correspond to a valid sheet.
{
"sheet": "NotARealSheet"
}
Returned when the requested sheet has an invalid schema (see instructions for sheet schema). A reason will be included:
- "NO_HEADERS" occurs when the first row of the sheet is empty
- "DUPLICATE_HEADERS" occurs when the first row contains the same value at least twice
{
"sheet": "Sheet1",
"reason": "NO_HEADERS"
}
Returned when the Google Sheets API request quota has been exceeded. See the FAQ for a discussion on these limits.
Returned when the request payload doesn't match the sheet schema.
{
"sheet": "Sheet1",
"malformedEntries": [
{
"value": { "name": "Patrick", "weight": 10 },
"index": 0,
"fields": {
"missing": ["age"],
"extra": ["weight"]
}
}
]
}
Any other errors thrown by the underlying googleapis
interface will be logged to stderr
and will either return the original status code or 400
if no status code was present. They will include the original error message
in the response payload to assist with immediate debugging without exposing too much sensitive information.
The server will respond to requests with an Origin
header by reflecting the value back in Access-Control-Allow-Origin
. It will similarly reflect values for Access-Control-Allow-Methods
and Access-Control-Allow-Headers
. OPTIONS
requests will be returned with 204
status codes, unless they request invalid URLs (in which case they will be handled like normal).
This should enable browser-based access to Bedsheet's APIs. To disable CORS support pass DISABLE_CORS=true
as an environment variable.
A simple schema can be enforced for the database (ie. spreadsheet, see terminology).
To create a new table, add a new sheet to the spreadsheet ⧉. The sheet's name will be used as the table name.
The table's schema is set by the values in the first row of the sheet. These values define the table's columns. Schemas are parsed during each request, so any changes are reflected immediately in subsequent requests.
The schemas do not support data validation besides ensuring that all columns are present. To treat a column as optional, set the payload value to null
or an emptry string ""
.
To enable Bedsheets to access a spreadsheet you need to invite the Service Account (see instructions) to the spreadsheet. If you add the Service Account as an Editor on a sheet Bedsheets will have write access to, if you add it as a Viewer then Bedsheets will only have have read access.
You invite the Service Account like you'd invite any other user, via the share menu. The Service Account will have an email address in the form ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
.
The goal of Bedsheets is to be as easy to deploy as possible. The hardest part is preparing the Service Account (step 1.C.) that'll be used to interact with Google Sheets. The actual server deployment should be straightforward thanks to Docker and new serverless offerings.
This project will be used to grant access to the Google Sheets API. If you already have a Google Cloud Project, then you can skip this step.
Step 1. Navigate to the Google Cloud Console ⧉ (expand to see a screenshot)
The Google Sheets API is disabled by default in Google Cloud Projects so you must enable it before Bedsheets can use it.
As described by Google Cloud > Security & Identity Products > Service Accounts ⧉:
A service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person. Applications use service accounts to make authorized API calls.
Bedsheets uses a Service Account to perform it's operations. This service account is identified by an email address in the form ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
. You can grant it access to specific spreadsheets by inviting it via its email.
For security, the Service Account will be granted the "Service Account User" ⧉ role, which only allows the Service Account to perform standard user operations (such as editing a Google Sheet it was invited to).
If you are using Google Cloud Run or Cloud Functions then you can skip the credentials download step and instead assign the Service Account directly to the execution environment which Bedsheets will automatically adopt (see the section on Cloud Providers for more information).
Bedsheets has been packaged as a Docker image for easy deployment (see docker for general instructions).
Follow the instructions at Google Cloud Run: Pre-Built Deploys ⧉ and use the following parameters:
expand to see parameters
These parameters are biased towards reducing costs.
Parameter | Recommended Value | Commentary |
---|---|---|
Authentication | Allow unauthenticated invocations | this enables standard HTTP REST requests w/ IAM credentials |
Container image URL | gcr.io/bedsheets/rest-server:latest |
|
Container port | 8080 | the value doesn't matter, since the server will bind to whatever port is in the $PORT env var |
Container command | leave blank | the Docker image's default command will be used |
Container arguments | leave blank | same as above |
Service account | Set to the account you created in C. Create Service Account | the server will automatically use the ambient credentials from this service account |
Capacity | 80 | the server can handle multiple requests, so this value can be set to the max to avoid spinning up extra instances or rejecting requests |
Request timeout | 300 | most requests finish in <1, even when 1000s of rows are being returned |
CPU allocated | 1 | the server isn't performing much processing, so a low value is fine |
Memory allocated | 256 MiB | the server uses about 100 MiB of memory when handling a single request |
Maximum number of instances | 1 | if more than Capacity requests come in concurrently, then they will be rejected if autoscaling is 1, which is likely fine since Google Sheets rate limits at 100 requests per 100 seconds anyways |
As of June 2020, Google Cloud Run provides 180,000 vCPU-seconds per month for free (we're focused only on this metric since both the memory and request constraints in the free plan are more generous than this CPU restriction). Since most requests in Bedsheets take ~1 second, this translates to 100,000s requests for free each month!
TODO: A new adapter needs to be written to support the pre-parsed request model adopted by most FaaS providers.
TODO: @bedsheets/rest-server
should support the Fargate request model. Needs to be tested. A template should be written so that users can easily get started without needing to draft their own ECS deployment configuration from scratch.
TODO: See Google Cloud Functions.
TODO: @bedsheets/rest-server
should support the Heroku request model. Needs to be tested.
Bedsheets has several deployment parameters which can be provided as environment variables.
PORT
- Which port should Bedsheets bind to? Defaults to3141
.LOG_LEVEL
- How verbose should logging be? Options areNONE | ERROR | WARN | INFO | TRACE
. Defaults toINFO
.
Google Sheets API credentials (ie. the credentials for the Service Account) can be supplied to Bedsheets several ways.
- Runtime Context - Recommended (when available) - If Bedsheets is being run on Google Cloud Platform then the Service Account can be attached to the runtime environment. Bedsheets will automatically adopt the appropriate credentials from the runtime.
- Environment Variables - Recommended - To provide credentials to Bedsheets outside of GCP you should use the environment variables
GOOGLE_AUTH_CLIENT_EMAIL
set to the Service Accounts email andGOOGLE_AUTH_PRIVATE_KEY
set to the PEM-encoded private key from the credentials JSON. If set, these values will override other forms of authentication. - Credentials File - Not Recommended - Bedsheets will use the information from a Service Account's credentials JSON file if the path to the file is set in the environment variable
GOOGLE_APPLICATION_CREDENTIALS
and the file is accessible to the Bedsheets process.
Bedsheets is available as a Docker image on Docker Hub (entry ⧉) under the label bedsheets/rest-server
. Master is auto-built and tagged as latest
, and git tags are also built and pushed.
To run Bedsheets locally, use:
docker run -e GOOGLE_AUTH_CLIENT_EMAIL=$SA_EMAIL GOOGLE_AUTH_PRIVATE_KEY=$SA_PRIVATE_KEY -p 3141:3141 bedsheets/rest-server
Bedsheets | Postgres | Description |
---|---|---|
Spreadsheet ID | Database Name / dbname ⧉ | The instance/server being accessed |
Sheet | Table | The model/entity being retrieved |
Editor/Viewer | Read/Write access | The user's data permissions |
Service Account | User | The user/credentials used to access the DB |
See Google's Sheet API Concepts ⧉ for an overview of these Google Sheets (and, in the case of A1 Notation, general spreadsheet) concepts.
See Google's Sheet API Concepts: Date & Time ⧉ section for an overview of how spreadsheets store date & time.
There are several trouble-shooting steps to work through:
- Is the Service Account added as an Editor to the Google Sheet?
- Is the
spreadsheetId
pointing to the correct sheet? Check by replacinghttps://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit
with thespreadsheetId
that you're providing to Bedsheets and make sure that the correct Google Sheet is appearing. - Does the Service Account have the
Service Account User
role?
According to GSuiteTips ⧉ and SpreadSheetPoint ⧉, Google Sheets supports a maximum of 5 million cells per spreadsheet, up to 200 tabs, and up to 18,278 columns. A default Sheet will have 26 columns while most use cases only need <10, so it might be possible to extend the number of rows you can have by deleting any empty columns (# rows = # max cells / # columns). I haven't tested these limits yet (TODO test these limits!). Apparently, an update can only add 40,000 new rows at a time as well.
The Google Sheets API quota will be the bottleneck when trying to make many requests. By default, the Google Sheets API permits 100 requests per 100 seconds ⧉ per service account, with reads and writes tracked separately. This may be increasable to 500 requests per 100 seconds by submitting a support request (see this stackoverflow answer ⧉).
The following items are currently considered blockers before a v1.0 release will be declared.
- Support for
PUT
(ie. update) operations. - Support for
DELETE
operations. - An adapter enabling FaaS deployments (eg. AWS Lambda, Google Cloud Functions).
- Robust validation of input (consider io-ts ⧉).
- Standardized, friendly errors for invalid inputs.
- Determine if basic authentication should be added (a per-deployment secret key? per-spreadsheet?).
- Consider adding a GraphQL abstraction in addition to REST.
See awesome-serverless's list of databases ⧉ for a more extensive list of low-maintenance DBs.
This project uses lerna
with yarn workspaces
to manage a variety of packages. To kick things off, run yarn
in the repo root. To build all packages, run yarn build:all
. To execute all tests, run yarn test:all
.
Currently, the project is composed of the following packages:
@bedsheets/google-sheets-dal
- This is a data access layer for Google Sheets that abstracts thesheets_v4.Sheets
API in thegoogleapis
package to provide a simpler object array interface for working with Google Sheets.@bedsheets/rest-server
- This is an HTTP server that exposes the operations from@bedsheets/google-sheets-dal
in a RESTful way (eg. translating theappend
operation intoPOST
requests).@bedsheets/test-helpers
- This is an internal module which provides shared infrastructure for writing integration tests, such as a tool for seeding spreadsheets for testing.
Yarn Workspaces enables packages to depend directly on each other. These packages still need to be compiled before they can be used though, since they are still consumed through node_modules
with each package.json
defining how they should be imported (i.e. consuming them intra-project is the same as an end-user consuming them from the npm registry).
Any dependency that isn't needed for end-user package functionality (such as testing, compilation, or other developer niceties) should be installed as a dev dependency (yarn add -D package_name
).
Each package should declare all packages that it needs, it should not rely on their ambient availability from the root package.json
. If a dev dependency is being used by multiple packages, it should be installed via the root package.json
(using yarn add -D package_name -W
) and the packages that require it should specify the version as *
(e.g. see the typescript
declaration under devDependencies
in each packages/*/package.json
).
Tests should be stored under a tests
folder in each package's folder (e.g. ./packages/google-sheets-dal/tests/
). Tests should be further subdivided into unit
and integration
, where unit tests do NOT interact with the internet or other services on the host, and integration tests do rely on external services. Tests should be written in Typescript, and will be run using jest
and ts-jest
.
To ensure that these packages can run across all targeted Node versions @types/node
should be set to the oldest supported version.
Furthermore, CI should run against the oldest supported version (with the assumption that Node had no breaking changes since).
TODO: figure out how to test .d.ts
files against older Typescript versions to ensure that they use syntax that isn't too new.
Github Actions is used to provide continuous integration services. See .github/workflows
for details on what's being executed. All CI must pass before PRs will be merged.
Bedsheets is aiming to be a maximally simple database for quick/hacky/hobby projects. Therefore, it has several non-goals to avoid feature creep which may impair these core targets:
- Configurability - Besides defining table columns Bedsheets is not trying to support any other type of configuration. Bedsheets should not support additional payload validation or column types.
- Optimized performance - Bedsheets should not attempt to cache responses from the Google Sheets API in order to reduce latency (such as saving schema to improve append operations). Furthermore, the Bedsheets source code should be easy to read so it's acceptable to use a more beginner friendly approach instead of a more performant one when doing things such as data transformations.