Skip to content

Commit

Permalink
try github-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
c4lm committed Dec 20, 2023
1 parent b84b9af commit efd1e10
Show file tree
Hide file tree
Showing 188 changed files with 2,303 additions and 3,353 deletions.
Binary file added docs/assets/images/favicon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
205 changes: 205 additions & 0 deletions docs/css/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
:root {
/*--main-text-color: #212121;*/
--md-primary-fg-color: #1976d2;
--brand-blue: #1976d2;
--brand-dark-blue: #242A36;
--caption-color: #4f4f4f;
--brand-lt-blue: #f0f5fb;
--brand-gray: rgb(118, 118, 118);
--brand-lt-gray: rgb(203,204,207);
--brand-red: #e50914;
}

/* Grid */
.row {
display: flex;
flex-direction: row;
}
.col-4 {
flex: 0 0 33.3333333333%;
max-width: 33.3333333333%;
}
.col-6 {
flex: 0 0 50%;
max-width: 50%;
}



/* Navbar */
.md-header {
background-color: white !important;
color: var(--brand-dark-blue);
}
.md-header__title {
visibility: hidden;
}
.md-logo img{
height: 38px !important;
}
.home {
margin-bottom: -1.2rem !important;
}
.md-search__form {
transition: none !important;
}
.md-search__input:hover {
background-color: #00000042 !important;
}
.md-search__input.focus-visible:hover {
background-color: #fff !important;
}

/* Fonts */
body {
color: var(--brand-dark-blue);
font-family: "Roboto", sans-serif !important;
font-weight: 400 !important;
}

.md-content h1 {
font-family: "Inter", sans-serif !important;
color: var(--brand-dark-blue) !important;
font-size: 32px !important;
font-weight: 700 !important;
}

.md-content h2 {
font-family: "Inter", sans-serif !important;
color: var(--brand-dark-blue) !important;
font-size: 24px !important;
font-weight: 700 !important;
}

.md-content h3 {
font-family: "Roboto", sans-serif !important;
color: var(--brand-dark-blue) !important;
font-size: 20px !important;
font-weight: 500 !important;
}

.md-content h4 {
font-family: "Roboto", sans-serif !important;
color: var(--brand-dark-blue) !important;
font-size: 18px !important;
font-weight: 400 !important;
}

.btn {
font-family: "Roboto", sans-serif;
font-size: 14px;
border-radius: 0.25rem;
}
.btn-primary {
background: #1976D2;
border: none;
color: white !important;
}

.hero {
padding-top: 100px;
padding-bottom: 100px;
}

.hero .heading {
font-size: 56px;
font-weight: 900;
line-height: 68px;
}

.hero .btn {
font-size: 16px;
padding: 10px 20px;
}

.hero .illustration {
margin-left: 35px;
}


.bullets .heading, .module .heading {
font-family: "Inter", sans-serif;
font-size: 26px;
font-weight: 700;
}
.bullets .row {
margin-bottom: 60px;
}
.bullets .caption {
padding-top: 10px;
padding-right: 30px;
}
.icon {
height: 25px !important;
margin-right: 5px;
vertical-align: -3px;
}

.caption {
font-weight: 400;
font-size: 17px;
line-height: 24px;
color: var(--caption-color);
}

.module {
margin-top: 80px;
margin-bottom: 80px;
padding-top: 50px;
padding-bottom: 50px;
}

.module .caption {
padding-top: 10px;
padding-right: 80px;
}
.module .screenshot {
width: 600px;
height: 337px;
box-shadow:inset 0 1px 0 rgba(255,255,255,.6), 0 22px 70px 4px rgba(0,0,0,0.56), 0 0 0 1px rgba(0, 0, 0, 0.0);
border-radius: 5px;
background-size: cover;
}

/* Footer */
.md-copyright__highlight {
background-image: url('/img/netflix-oss.png');
background-size: contain;
background-repeat: no-repeat;
color: rgba(0,0,0,0);
height: 60px;
}

/* Comparison block */
.compare {
background-color: var(--brand-lt-blue);
padding-top: 80px;
padding-bottom: 80px;
margin: 0px -1000px;
text-align: center;
}
.compare .container {
max-width: 61rem;
margin-left: auto;
margin-right: auto;
}

.compare .heading {
margin-bottom: 30px;
margin-top: 0px;
}
.compare .bubble {
background: #fff;
border-radius: 10px;
padding: 30px;
height: 100%;
}

.compare .caption {
font-size: 15px;
line-height: 22px;
}

.compare .row {
margin: 0 0.8rem;
}
Binary file added docs/devguide/architecture/PollTimeoutSeconds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,19 @@ A graph is "a collection of vertices (or point) and edges (or lines) that indica

By this definition, this is a graph - just not exactly correct in the context of DAGs:

<p align="center"><img src="/img/pirate_graph.gif" alt="pirate vs global warming graph" width="500" style={{paddingBottom: 40, paddingTop: 40}} /></p>
![pirate vs global warming graph](pirate_graph.gif)

But in the context of workflows, we're thinking of a graph more like this:

<p align="center"><img src="/img/regular_graph.png" alt="a regular graph (source: wikipedia)" width="500" style={{paddingBottom: 40, paddingTop: 40}} /></p>
![a regular graph (source: wikipedia)](regular_graph.png)

Imagine each vertex as a microservice, and the lines are how the microservices are connected together. However, this graph is not a directed graph - as there is no direction given to each connection.

### Directed

A directed graph means that there is a direction to each connection. For example, this graph is directed:

<p align="center"><img src="/img/directed_graph.png" alt="directed graph" width="500" style={{paddingBottom: 40, paddingTop: 40}} /></p>
![directed graph](directed_graph.png)

Each arrow has a direction, Point "N" can proceed directly to "B", but "B" cannot proceed to "N" in the opposite direction.

Expand All @@ -34,13 +34,13 @@ So a Directed Acyclic Graph is a set of vertices where the connections are direc

Since a Conductor workflow is a series of vertices that can connect in only a specific direction and cannot loop, a Conductor workflow is thus a directed acyclic graph:

<p align="center"><img src="/img/dag_workflow2.png" alt="Conductor Dag" width="300" style={{paddingBottom: 40, paddingTop: 40}} /></p>
![Conductor Dag](dag_workflow.png)

### Can a workflow have loops and still be a DAG?

Yes. For example, Conductor workflows have Do-While loops:

<p align="center"><img src="/img/dag_workflow.png" alt="Conductor Dag" width="300" style={{paddingBottom: 40, paddingTop: 40}} /></p>
![Conductor Dag](dag_workflow2.png)

This is still a DAG, because the loop is just shorthand for running the tasks inside the loop over and over again. For example, if the 2nd loop in the above image is run 3 times, the workflow path will be:

Expand Down
File renamed without changes
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Overview
# Architecture Overview

![Architecture diagram](/img/conductor-architecture.png)
![Architecture diagram](conductor-architecture.png)

The API and storage layers are pluggable and provide ability to work with different backends and queue service providers.

## Runtime Model
Conductor follows RPC based communication model where workers are running on a separate machine from the server. Workers communicate with server over HTTP based endpoints and employs polling model for managing work queues.

![Runtime Model of Conductor](/img/overview.png)
![Runtime Model of Conductor](overview.png)

**Notes**
## Notes

* Workers are remote systems that communicate over HTTP with the conductor servers.
* Task Queues are used to schedule tasks for workers. We use [dyno-queues][1] internally but it can easily be swapped with SQS or similar pub-sub mechanism.
Expand Down
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Original file line number Diff line number Diff line change
@@ -1,47 +1,59 @@
# Task Lifecycle

## Task state transitions

The figure below depicts the state transitions that a task can go through within a workflow execution.

![Task_States](/img/task_states.png)
![Task States](task_states.png)

## Retries and Failure Scenarios

### Task failure and retries
Retries for failed task executions of each task can be configured independently. retryCount, retryDelaySeconds and retryLogic can be used to configure the retry mechanism.

![Task Failure](/img/TaskFailure.png)
Retries for failed task executions of each task can be configured independently. `retryCount`, `retryDelaySeconds` and `retryLogic` can be used to configure the retry mechanism.

![Task Failure](TaskFailure.png)

1. Worker (W1) polls for task T1 from the Conductor server and receives the task.
2. Upon processing this task, the worker determines that the task execution is a failure and reports this to the server with FAILED status after 10 seconds.
3. The server will persist this FAILED execution of T1. A new execution of task T1 will be created and scheduled to be polled. This task will be available to be polled after 5 (retryDelaySeconds) seconds.

### Poll Timeout Seconds

Poll timeout is the maximum amount of time by which a worker needs to poll a task, else the task will be marked as `TIMED_OUT`.

![Task Poll Timeout](PollTimeoutSeconds.png)

In the figure above, task T1 does not get polled by the worker within 60 seconds, so Conductor marks it as `TIMED_OUT`.

### Timeout seconds
Timeout is the maximum amount of time that the task must reach a terminal state in, else the task will be marked as TIMED_OUT.

![Task Timeout](/img/TimeoutSeconds.png)
Timeout is the maximum amount of time that the task must reach a terminal state in, else it will be marked as `TIMED_OUT`.

![Task Timeout](TimeoutSeconds.png)

**0 seconds** -> Worker polls for task T1 from the Conductor server and receives the task. T1 is put into IN_PROGRESS status by the server.
Worker starts processing the task but is unable to process the task at this time. Worker updates the server with T1 set to IN_PROGRESS status and a callback of 9 seconds.
**0 seconds** -> Worker polls for task T1 from the Conductor server and receives the task. T1 is put into `IN_PROGRESS` status by the server.
Worker starts processing the task but is unable to process the task at this time. Worker updates the server with T1 set to `IN_PROGRESS` status and a callback of 9 seconds.
Server puts T1 back in the queue but makes it invisible and the worker continues to poll for the task but does not receive T1 for 9 seconds.

**9,18 seconds** -> Worker receives T1 from the server and is still unable to process the task and updates the server with a callback of 9 seconds.

**27 seconds** -> Worker polls and receives task T1 from the server and is now able to process this task.

**30 seconds** (T1 timeout) -> Server marks T1 as TIMED_OUT because it is not in a terminal state after first being moved to IN_PROGRESS status. Server schedules a new task based on the retry count.

**32 seconds** -> Worker completes processing of T1 and updates the server with COMPLETED status. Server will ignore this update since T1 has already been moved to a terminal status (TIMED_OUT).
**30 seconds** (T1 timeout) -> Server marks T1 as `TIMED_OUT` because it is not in a terminal state after first being moved to `IN_PROGRESS` status. Server schedules a new task based on the retry count.

**32 seconds** -> Worker completes processing of T1 and updates the server with `COMPLETED` status. Server will ignore this update since T1 has already been moved to a terminal status (`TIMED_OUT`).

### Response timeout seconds

Response timeout is the time within which the worker must respond to the server with an update for the task, else the task will be marked as TIMED_OUT.

![Response Timeout](/img/ResponseTimeoutSeconds.png)
![Response Timeout](ResponseTimeoutSeconds.png)

**0 seconds** -> Worker polls for the task T1 from the Conductor server and receives the task. T1 is put into IN_PROGRESS status by the server.
**0 seconds** -> Worker polls for the task T1 from the Conductor server and receives the task. T1 is put into `IN_PROGRESS` status by the server.

Worker starts processing the task but the worker instance dies during this execution.

**20 seconds** (T1 responseTimeout) -> Server marks T1 as TIMED_OUT since the task has not been updated by the worker within the configured responseTimeoutSeconds (20). A new instance of task T1 is scheduled as per the retry configuration.
**20 seconds** (T1 responseTimeout) -> Server marks T1 as `TIMED_OUT` since the task has not been updated by the worker within the configured responseTimeoutSeconds (20). A new instance of task T1 is scheduled as per the retry configuration.

**25 seconds** -> The retried instance of T1 is available to be polled by the worker, after the retryDelaySeconds (5) has elapsed.
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ The proto models are auto-generated at compile time using this ProtoGen library.


### Cassandra Persistence
The Cassandra persistence layer currently provides a partial implementation of the ExecutionDAO that supports all the CRUD operations for tasks and workflow execution. The data modelling is done in a denormalized manner and stored in two tables. The workflows table houses all the information for a workflow execution including all its tasks and is the source of truth for all the information regarding a workflow and its tasks. The task_lookup table, as the name suggests stores a lookup of taskIds to workflowId. This table facilitates the fast retrieval of task data given a taskId.
The Cassandra persistence layer currently provides a partial implementation of the ExecutionDAO that supports all the CRUD operations for tasks and workflow execution. The data modelling is done in a denormalized manner and stored in two tables. The "workflows" table houses all the information for a workflow execution including all its tasks and is the source of truth for all the information regarding a workflow and its tasks. The "task_lookup" table, as the name suggests stores a lookup of taskIds to workflowId. This table facilitates the fast retrieval of task data given a taskId.
All the datastore operations that are used during the critical execution path of a workflow have been implemented currently. Few of the operational abilities of the ExecutionDAO are yet to be implemented. This module also does not provide implementations for QueueDAO, PollDataDAO and RateLimitingDAO. We envision using the Cassandra DAO with an external queue implementation, since implementing a queuing recipe on top of Cassandra is an anti-pattern that we want to stay away from.


### External Payload Storage
The implementation of this feature is such that the externalization of payloads is fully transparent and automated to the user. Conductor operators can configure the usage of this feature and is completely abstracted and hidden from the user, thereby allowing the operators full control over the barrier limits. Currently, only AWS S3 is supported as a storage system, however, as with all other Conductor components, this is pluggable and can be extended to enable any other object store to be used as an external payload storage system.
The externalization of payloads is enforced using two kinds of [barriers](/externalpayloadstorage.html). Soft barriers are used when the payload size is warranted enough to be stored as part of workflow execution. These payloads will be stored in external storage and used during execution. Hard barriers are enforced to safeguard against voluminous data, and such payloads are rejected and the workflow execution is failed.
The externalization of payloads is enforced using two kinds of [barriers](../../documentation/advanced/externalpayloadstorage.md). Soft barriers are used when the payload size is warranted enough to be stored as part of workflow execution. These payloads will be stored in external storage and used during execution. Hard barriers are enforced to safeguard against voluminous data, and such payloads are rejected and the workflow execution is failed.
The payload size is evaluated in the client before being sent over the wire to the server. If the payload size exceeds the configured soft limit, the client makes a request to the server for the location at which the payload is to be stored. In this case where S3 is being used, the server returns a signed url for the location and the client uploads the payload using this signed url. The relative path to the payload object is then stored in the workflow/task metadata. The server can then download this payload from this path and use as needed during execution. This allows the server to control access to the S3 bucket, thereby making the user applications where the worker processes are run completely agnostic of the permissions needed to access this location.


Expand Down
2 changes: 2 additions & 0 deletions docs/docs/bestpractices.md → docs/devguide/bestpractices.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Best Practices

## Response Timeout
- Configure the responseTimeoutSeconds of each task to be > 0.
- Should be less than or equal to timeoutSeconds.
Expand Down
17 changes: 17 additions & 0 deletions docs/devguide/concepts/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Basic Concepts
Conductor allows you to build a complex application using simple and granular tasks that do not
need to be aware of or keep track of the state of your application's execution flow. Conductor keeps track of the state,
calls tasks in the right order (sequentially or in parallel, as defined by you), retry calls if needed, handle failure
scenarios gracefully, and outputs the final result.


![Workflow screnshot](../../home/devex.png)

Leveraging workflows in Conductor enables developers to truly focus on their core mission - building their application
code in the languages of their choice. Conductor does the heavy lifting associated with ensuring high
reliability, transactional consistency, and long durability of their workflows. Simply put, wherever your application's
component lives and whichever languages they were written in, you can build a workflow in Conductor to orchestrate their
execution in a reliable & scalable manner.

[Workflows](workflows.md) and [Tasks](tasks.md) are the two key concepts that underlie the Conductor system.

Loading

0 comments on commit efd1e10

Please sign in to comment.