From 27667530f1218d1b9d24a97d5fedbd119992c0e7 Mon Sep 17 00:00:00 2001 From: jterry64 Date: Thu, 26 Sep 2024 16:55:40 -0700 Subject: [PATCH] Add architecture section --- README.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index d05d575..8668da2 100644 --- a/README.md +++ b/README.md @@ -347,7 +347,19 @@ Response: } ``` -### Endpoints +### Architecture + +To support fast on-the-fly analytics for both small and large areas, we use a serverless AWS Lambda-based architecture. Lambdas allow us to scale up massive parallel processing very quickly, since requests usually come sporadically and may require very variable workloads. + +image + +We have three different lambas. + +**tiled-raster-analysis**: This is the entrypoint function. This function checks the size of the geometry, and splits it up into many chunks if the geometry is large. The chunk size depends on the resolution: 1.25x1.25 degrees for 30m resolution data, and 0.5x0.5 degrees for 10m resolution data. It then invokes a lambda function for each chunk, and waits for the results to be written to a DynamoDB table. Each lambda invocation has 256 KB payload limit, so it may compress or simplify the geometry chunks if the geometry is too complicated. Once all the results are in, it will aggregate the results and return them to the client. + +**fanout**: This lambda is optional. One of the bottlenecks for this architecture is the lambda invocations themselves. If a geometry requires 100+ processing lambdas, it can take some time to invoke it all from one lambda. If more than 10 lambdas need to be invoked, the `tiled-raster-analysis` function will actually split the lambdas into groups of 10, and send each to a `fanout` lambda. All the fanout lambda does is invoke those 10 lambdas. This massively speeds up the invocation time for huge amounts of lambdas. + +**raster-analysis**: This is the core analysis function. This processes each geometry chunk, reads all necessary rasters, runs the query execution, and writes the results out to DynamoDB. Each of these currently runs with 3 GB of RAM. ### Assumptions and limitations