motivation.tex

\section{Background \& Motivation}
\label{sec:motivation}

\newcommand{\cmark}{\ding{51}}%
\newcommand{\xmark}{\ding{55}}%
\begin{figure}
\begin{tikzpicture}
    
\draw [stealth-](-0.6\textwidth-20,1.5) --  node[midway,fill=none, align=left, anchor=north]{\textit{T}}(-\textwidth-60, 1.5);
\node (c1) [
    minimum size=0.2cm,
    xshift=-\textwidth-40,
    align=center,
    yshift=3cm,
    fill=none
] {$C_i:$};

\node (c2) [
    minimum size=0.2cm,
    xshift=-\textwidth-40,
    align=center,
    yshift=1.9cm,
    fill=none
] {$C_j:$};

% o1
\draw [-](-\textwidth,3) --  node[midway,fill=white, align=center]{\textit{$o_1$}}(-\textwidth+60, 3);
\draw [-](-\textwidth,3-0.2) --  node[midway,fill=none, align=left, anchor=south]{\textit{inv}}(-\textwidth,3+0.2);
\draw [-](-\textwidth+60, 3-0.2) --  node[midway,fill=none, align=left, anchor=south]{\textit{resp}}(-\textwidth+60, 3+0.2);

% o2
\draw [-](-\textwidth+100,2) --  node[midway,fill=white, align=center]{\textit{$o_2$}}(-\textwidth+160, 2);
\draw [-](-\textwidth+100,2-0.2) --  node[midway,fill=none, align=left, anchor=south]{\textit{inv}}(-\textwidth+100,2+0.2);
\draw [-](-\textwidth+160, 2-0.2) --  node[midway,fill=none, align=left, anchor=south]{\textit{resp}}(-\textwidth+160, 2+0.2);


%bottom half
\draw [stealth-](-0.6\textwidth-20,-2.3) --  node[midway,fill=none, align=left, anchor=north]{\textit{T}}(-\textwidth-60, -2.3);
\node (c1) [
    minimum size=0.2cm,
    xshift=-\textwidth-40,
    align=center,
    yshift=0cm,
    fill=none
] {$C_1:$};

\node (c2) [
    minimum size=0.2cm,
    xshift=-\textwidth-40,
    align=center,
    yshift=-0.6cm,
    fill=none
] {$C_2:$};

\node (c3) [
    minimum size=0.2cm,
    xshift=-\textwidth-40,
    align=center,
    yshift=-1.2cm,
    fill=none
] {$C_3:$};

\node (c4) [
    minimum size=0.2cm,
    xshift=-\textwidth-40,
    align=center,
    yshift=-1.8cm,
    fill=none
] {$C_4:$};


% o1
\draw [-](-\textwidth,0) --  node[midway,fill=white, align=center]{\textit{$W_1(x=1)$}}(-\textwidth+60, 0);
\draw [-](-\textwidth,0-0.2) --  node[midway,fill=none, align=left, anchor=south]{}(-\textwidth,0+0.2);
\draw [-](-\textwidth+60, 0-0.2) --  node[midway,fill=none, align=left, anchor=south]{}(-\textwidth+60, 0+0.2);
% o2
\draw [-](-\textwidth-20,-0.6) --  node[midway,fill=white, align=center]{\textit{$W_2(x=2)$}}(-\textwidth+80, -0.6);
\draw [-](-\textwidth-20,-0.6-0.2) --  node[midway,fill=none, align=left, anchor=south]{}(-\textwidth-20,-0.6+0.2);
\draw [-](-\textwidth+80, -0.6-0.2) --  node[midway,fill=none, align=left, anchor=south]{}(-\textwidth+80, -0.6+0.2);
% o3
\draw [-](-\textwidth+10,-1.2) --  node[midway,fill=white, align=center]{\textit{$W_3(x=3)$}}(-\textwidth+70, -1.2);
\draw [-](-\textwidth+10,-1.2-0.2) --  node[midway,fill=none, align=left, anchor=south]{}(-\textwidth+10,-1.2+0.2);
\draw [-](-\textwidth+70, -1.2-0.2) --  node[midway,fill=none, align=left, anchor=south]{}(-\textwidth+70, -1.2+0.2);

% o4
\draw [-](-\textwidth+110,-1.8) --  node[midway,fill=white, align=center]{\textit{$R(x)=?$}}(-\textwidth+170, -1.8);
\draw [-](-\textwidth+110,-1.8-0.2) --  node[midway,fill=none, align=left, anchor=south]{}(-\textwidth+110,-1.8+0.2);
\draw [-](-\textwidth+170, -1.8-0.2) --  node[midway,fill=none, align=left, anchor=south]{}(-\textwidth+170, -1.8+0.2);
\end{tikzpicture}

\centering
\caption{The 3 write operations are all concurrent with respect to one another, because it is not the case that any of the writes has a response event before any of the other two have an invocation event. Any interleaving of these operations would satisfy a correct total ordering under linearizeability. The read operation, however, is ordered in real time after the 3 writes, so it must return a value for $x$ consistent with all 3 writes taking place, ie it cannot return the initial value for $x$ before the writes took place. It could return 1, 2, or 3, demonstrating it is difficult to reason about the ordering of concurrent operations. }
\label{fig:lin_basics}
\end{figure}


This section provides the necessary background on distributed applications, consistency models, Linearizability, and the needs and behaviors of applications that work atop Linearizability.
% It then explains why naive parellelization of operations to a Linearizable system is error-prone and fragile.

% \begin{figure*}[t!]
% \begin{subfigure}[t]
% \begin{minted}[escapeinside=||]{php}
% PostSequential
% |\colorbox{green}{(1) \$postid = \$r->incr("next\_post\_id");}|
% |\colorbox{yellow}{(2) \$r->hmset("post:\$postid","user\_id",\$User['id'],"body",\$status);}|
% |\colorbox{green}{(3) \$followers = \$r->zrange("followers:".\$User['id'],0,-1);}|
% |\colorbox{yellow}{(4) foreach(\$followers as \$fid) { \$r->lpush("posts:\$fid",\$postid); }}|
% (5) # Push the post on the timeline, and trim the timeline to the newest 1000 elements.
% |\colorbox{yellow}{(6) \$r->lpush("timeline",\$postid);}|
% |\colorbox{yellow}{(7) \$r->ltrim("timeline",0,1000);}|
% \end{minted}
% \end{subfigure}

% \begin{subfigure}[t]
% \begin{minted}

% PostIOCTransformed
% |\colorbox{green}{(1) [\$postid, \$followers] = IOCIssue([\$r->incr("next\_post\_id"), \$r->zrange("followers:".\$User['id'],0,-1)]);}|
% (2) $ops = array();
% (3) array_push($ops, $r>hmset("post:$postid","user_id",$User['id'],"body",$status));
% (4) foreach($followers as $fid) { array_push($ops, $r->lpush("posts:$fid",$postid)); }
% (5) # Push the post on the timeline, and trim the timeline to the newest 1000 elements.
% (6) array_push($ops, $r->lpush("timeline",$postid));
% (7) array_push($ops, $r->ltrim("timeline",0,1000));
% |\colorbox{yellow}{(8) IOCIssue(\$ops);}|
% \end{minted}
% \end{subfigure}
% \caption{The Retwis post function creates a new post and adds a reference to it in each of the timelines of a the poster's followers, as well as a global timeline. It also truncates the global timeline to the 1000 most recent posts. As written, each operation must complete before initiating the next, resulting in many round trips to the datastore. Most of these operations have no data- or control-flow dependencies on each other. However, they cannot be naively parallelized, as many of them \emph{must} occur in a particular order. For example, the post id must appear in any timelines until the post is created.}
% \label{fig:retwis-post2}
% \end{figure*}


\begin{figure*}[t!]
    \centering
    \begin{subfigure}[b]{\textwidth}
        \begin{minted}[escapeinside=||]{php}
|\colorbox{green}{(1) \$postid = \$r->incr("next\_post\_id");}|
|\colorbox{yellow}{(2) \$r->hmset("post:\$postid","user\_id",\$User['id'],"body",\$status);}|
|\colorbox{green}{(3) \$followers = \$r->zrange("followers:".\$User['id'],0,-1);}|
|\colorbox{yellow}{(4) foreach(\$followers as \$fid) { \$r->lpush("posts:\$fid",\$postid); }}|
(5) # Push the post on the timeline, and trim the timeline to the newest 1000 elements.
|\colorbox{yellow}{(6) \$r->lpush("timeline",\$postid);}|
|\colorbox{yellow}{(7) \$r->ltrim("timeline",0,1000);}|
\end{minted}
        \caption{PostSequential}
        \label{fig:retwissequential}
    \end{subfigure}%
    
    \begin{subfigure}[b]{\textwidth}
        \begin{minted}[escapeinside=||]{php}
|\colorbox{pink}{(1) \$runtime = new \\parallel\\Runtime;}|
(2) $future1 = $runtime->run($r->incr("next\_post\_id"));
(3) $future2 = $runtime->run($r->zrange("followers:".\$User['id'],0,-1));
|\colorbox{green}{(4) \$postid = \$future1->value();}|
(5) $runtime->run($r->hmset("post:$postid","user_id",$User['id'],"body",$status));
|\colorbox{green}{(6) \$followers = \$future2->value();}|
(7) $runtime->run(foreach($followers as $fid) { $r->lpush("posts:$fid",$postid); })
(8) # Push the post on the timeline, and trim the timeline to the newest 1000 elements.
(9) $runtime->run($r->lpush("timeline",$postid));
(10) $future3 = $runtime->run($r->ltrim("timeline",0,1000));
|\colorbox{yellow}{(11) \$future3->value();}|

\end{minted}
        \caption{PostIOCTransformed}
        \label{fig:retwistransform}
    \end{subfigure}
    \caption{The Retwis post function creates a new post and adds a reference to it in each of the timelines of a the poster's followers, as well as a global timeline. It also truncates the global timeline to the 1000 most recent posts. As written, each operation must complete before initiating the next, resulting in many round trips to the datastore. Most of these operations have no data- or control-flow dependencies on each other. However, they cannot be naively parallelized, as many of them \emph{must} occur in a particular order. For example, the post id must appear in any timelines until the post is created.}
    \label{fig:retwis-post2}
\end{figure*}

\paragraph{Background and Terminology.}
We consider here the common model of an application that stores its shared state in a separate backend system.
Users interact with the application by sending \textit{requests}, e.g., a HTTP request, to one of many concurrent \textit{application processes} that implement the application logic. Application processes are stateless and process requests on behalf of many users. Application processes send \textit{operations} to a backend \textit{system} that provides the concurrent application processes with access to shared, persistent state. This shared state is composed of the \textit{objects} on which the operations are invoked.

\ak{cut this paragraph?} An application's processing of a user's request will result in some number of operations to the backend system, which we refer to as the request's \textit{fanout}. Our goal is to minimize the \textit{end-to-end latency} of an application processing a user's request. For instance, the \texttt{PostSequential} request shown in Figure ~\ref{fig:retwis-post2} has a fanout of 5 more than the number of followers of the user posting. Our goal is to minimize the latency of the \texttt{post} function, which would overall speed up the Retwis application.

A system's \textit{consistency model} provides guarantees about the observable return values for a set of operations over the objects the back-end stores.
Programmers rely on a system's consistency model while building their applications. By defining the system's set of allowed behaviors and constraints on the return values, a consistency model permits programmers to ignore the system's implementation and instead focus on ensuring their applications are correct.

\newcommand{\myp}[1]{\underline{\quad$p_{#1}$\quad}}
% \begin{figure*}
% \begin{minipage}{0.7\linewidth}
% \begin{subfigure}[t]{.1\linewidth}
% \begin{center}
% \begin{tabular}{ l }
%  \myp{0} \\
%  w($a$=1) \\  
%  r($a$)     
% \end{tabular}
% \caption{}
% \label{subfig:one_cli}
% \end{center}
% \end{subfigure}\hfill%
% \begin{subfigure}[t]{.175\linewidth}
% \begin{center}
% \begin{tabular}{ l l }
%  \myp{1} & \myp{2} \\
%  w($a=1$) & r($b$) \\  
%  w($b=1$) & r($a$)    
% \end{tabular}
% \caption{}
% \label{subfig:wa_wb}
% \end{center}
% \end{subfigure}\hfill%
% \begin{subfigure}[t]{.45\linewidth}
% \begin{center}
% \begin{tabular}{ l l }
%  \underline{\hspace{9ex}$p_3$\hspace{9ex}} &
%  \underline{\hspace{9ex}$p_4$\hspace{9ex}} \\
%  test\_and\_set($c==0?1$) & test\_and\_set($c==0?2$) \\  
%  test\_and\_set($d==0?1$) & test\_and\_set($d==0?2$)   
% \end{tabular}
% \caption{}
% \label{subfig:test_and_set}
% \end{center}
% \end{subfigure}\\
% \begin{subfigure}[t]{.175\linewidth}
% \begin{center}
% \begin{tabular}{ l l }
%  \myp{5} & \myp{6} \\
%  rmw($e+=2$) & rmw($f+=2$) \\  
%  rmw($f*=2$) & rmw($e*=2$)    
% \end{tabular}
% \caption{}
% \label{subfig:rmws}
% \end{center}
% \end{subfigure}\hfill%
% \begin{subfigure}[t]{.25\linewidth}
% \begin{center}
% \begin{tabular}{ l l l }
%  \myp{7} & \myp{8} & \myp{9} \\
%  w($e=1$) & w($f=1$) & w($g=1$) \\  
%  r($g$) & r($e$)    & \textcolor{gray}{r($f$)}
% \end{tabular}
% \caption{}
% \label{subfig:read_breaks}
% \end{center}
% \end{subfigure}\hfill%
% % \begin{subfigure}[t]{.2\linewidth}
% % \begin{center}
% % \begin{tabular}{ l l }
% %  \myp{8} & \myp{9} \\
% %  w(h=1) & r(h) \\  
% %  \textcolor{gray}{if(r(i) > 2):} & r(j) \\
% %  \quad{}w(j=1) & \\
% %  \textcolor{gray}{else:} & \\
% %  \quad{}\textcolor{gray}{w(j=1)}
% % \end{tabular}
% % \caption{}
% % \label{subfig:optimize_breaks}
% % \end{center}
% % \end{subfigure}
% % ~
% \begin{subfigure}[t]{.2\linewidth}
% \begin{center}
% \begin{tabular}{ l l }
%  \myp{{10}} & \myp{{11}} \\
%  w($ka=1$) & r($kb$) \\  
%  w($kb=1$) & r($ka$)    
% \end{tabular}
% \caption{}
% \label{subfig:resharding_breaks}
% \end{center}
% \end{subfigure}
% \end{minipage}
% \hfill\begin{minipage}{.175\linewidth}
% \vspace{6.5ex}
% \begin{subfigure}[t]{.175\linewidth}
% \begin{center}
% \begin{tabular}{ l l }
%  \myp{{12}} & \myp{{13}} \\
%  w($m=1$) & \\  
%  w($n=1$) & \\
%         & r($n$)\\
%         & r($m$)\\
% \end{tabular}
% \caption{}%*{\hspace{10ex}(g)}
% \label{subfig:suffix_closed_breaks}
% \end{center}
% \end{subfigure}
% \end{minipage}
% \caption{Examples that show how naive parallelization is error prone. Each example shows operations from application processes with their issue order given by their vertical positions. The initial value for all state is 0. W, r, rmw mean write, read, and read-modify-write, respectively. In each example naive parallelization can return values that are not possible for Linearizability. (a--d) are unsafe with naive parallelization.  (e) is safe without the gray operation but unsafe with it. (f) is safe for some implementations of Linearizability but not others (see text). (g) is unsafe despite no concurrency between $p_{12}$ and $p_{13}$ because naive parallelization does not provide suffix-closed failures. All examples would need to use single-dispatch to be safe under Linearizability. All examples
% are safe with multi-dispatch under \mdl{}.}
% \label{fig:naive_breaks_stuff}
% \end{figure*}

\paragraph{Linearizability is a Nearly Ideal Consistency Model.}
% \textit{Linearizability} is a `strong' consistency model which guarantees that return values for operations to a system are equivalent to some sequential total-order of operations that respects real-time precedence---all processes observe the same order of operations, and if one operation finishes before another begins, the former is ordered before the later~\cite{herlihy1987linearizability, herlihy1990linearizability}.
% Its strong guarantees make it simple for application programmers to reason about. It reduces the behavior of a complex, highly concurrent distributed system to the equivalent of a single machine that processes operations one at time in the order they arrive over the network.

Linearizability is a `strong' consistency model with strong guarantees that make it simple
for application programmers to reason about. 
Specifically, it guarantees that all return values reflect some legal total order of the operations taking place over the objects in the back-end, and that this order respects \textit{real-time} precedence. These \textit{real-time} guarantees mean if one operation finishes before another begins, the former must be ordered before the latter~\cite{herlihy1987linearizability, herlihy1990linearizability}. For example, in figure ~\ref{fig:lin_basics}, $o_1$ is ordered in real time before $o_2$ because the response of $o_1$ happens in real time before the invocation of $o_2$. Therefore, a linearizeable back-end guarantees the return value of $o_2$ is consistent with an execution where $o_1$ takes effect before it.

Linearizability is one of a handful of dominant consistency models and is the consistency model provided by many fundamental protocols---e.g., Paxos~\cite{lamport1998paxos}, RAFT~\cite{ongaro2014raft}, PBFT~\cite{castro1999pbft}---and the many foundational systems that implement them---e.g., Chubby~\cite{burrows2006chubby}, ZippyDB~\cite{zippyblog}, etcd~\cite{etcd}. It is a popular choice because it reduces the behavior of a complex, highly concurrent distributed system to the equivalent of a single machine that processes operations one at time in the order they arrive over the network, where globally, all processes observe the same order of operations.
%ABD~\cite{abd}, primary-backup~\cite{primary-backup}, etc.\@---
% (ensured by the sequential ordering) 
% (ensured by the real-time ordering)


\paragraph{Concurrency}
While linearizeable systems provide understandable guarantees about operations with real-time relationships, they provide no ordering guarantees on concurrent operations, which by definition have no real-time relationships. This is because linearizability provides \textit{some} legal total order of operations (that all the clients will see reflected in their return values), but it is difficult to reason about what specific total order that will be. Figure ~\ref{fig:lin_basics} shows an example of 3 concurrent write operations all dispatched to the same object. In this case, the operations could take effect in any order, all would be correct under linearizability. 

When concurrent operations come from separate, independent application processes, linearizability's no ordering guarantees is quite suitable for system behavior. As shown in figure ~\ref{fig:lin_basics}, if the 3 clients issuing the concurrent writes are 3 application processes unaware of the other writes sent to the system, then allowing all possible return values for $R(x)$ is appropriate. In fact, it reduces complexity so that application developers don't have to reason about potentially concurrent operations sent to the system from other applications they don't care about. 

In contrast, when concurrent operations come from a single application process, linearizability's no ordering guarantees can lead to unexpected and unwanted application behavior.  Consider \texttt{PostSequential} in Figure ~\ref{fig:retwissequential} which shows the \texttt{post} function from Retwis where each operation is issued sequentially. A new version that naively introduces concurrency could issue lines 1 and 3 (highlighted in green) concurrently and then, after those return, issue lines 2, 4, 6, 7 (yellow) concurrently. Note, the green batch of lines must return sequentially before the yellow batch, due to the data dependencies on the \texttt{\$postid} and \texttt{\$followers} variables. With no ordering guarantees, it is possible that line 4 takes effect before line 2; in this case, if a follower happens to look at their timeline after the post is referenced but before it is created, then they would see an object for the post but no post content. Similarly, if line 6 is ordered before line 4, then the creator of the post might see similar strange behavior. Worse yet, if the post creation fails (line 2), then the users could see this strange object appear and subsequently disappear in the future when some failure recovery scheme removes it from their timelines. These types of behaviors are not standard for most applications and developers should want to prevent them from occurring.

% \begin{figure}

% \begin{tabular}{ |c|c|c| } 
% \hline
%  & Concurrent & sequential \\
% \hline
% $n$ processes & \cmark & \cmark \\ 
% 1 process & \xmark & \cmark \\ 
% \hline
% \end{tabular}
%     \centering
%     \caption{Caption}
%     \label{fig:cases}
% \end{figure}

% You have to globally reason about things. Are these two things allowed to be ordered independently? That depends….. If thse two pieces of data aren’t touched.. But you don’t know that by just looking at the function in isolation.
It can be difficult for a programmer to tell whether an application process requires ordering guarantees or even what the right set is.
% The ordering guarantees of a correct concurrent \texttt{post} function are line 2 << line 4 << line 4 << line 7. 
While a developer might write a concurrent application function that looks correct in isolation, such as the modified version of \texttt{PostSequential}, they actually must analyze concurrent application code \textit{globally} (system-wide) for new potential behaviors created by a rich set of possible operation interleavings.
Moreover, the need for ordering guarantees changes and is fragile to changes in the underlying system. While it is possible in some cases for programmers to continuously reason about complex interactions between their code, the system, and failures (where the time and expertise are available), it is not a general solution and violates the fundamental purpose of strong consistency models, which is to make the behavior of the system easy to reason about. 

% When ordering guarantees across separate application processes are needed, then synchronization mechanisms external to the back-end system can be used to meet specific needs ~\ref{} (thinking message passing? barriers? distributed locks?). 
Individual processes can easily reclaim ordering guarantees when interacting with linearizable systems by issuing operations \textit{sequentially}. \textit{Sequentially} issued operations are operations with real-time ordering between them to ensure they take effect in sequential order over a linearizable back-end. If $C_i == C_j$ in figure ~\ref{fig:lin_basics}, then that individual processes would effectively be using real-time ordering to impose ordering guarantees. Sequentially issued operations impose sequential ordering across all operations issued by a process, making it simple to write correct applications. For example, the original version of \texttt{PostSequential} avoids all the issues that arise with the concurrent version.

\paragraph{Performance}
While there is a strong case for individual processes to avoid concurrency to prevent strange program behavior caused by no ordering guarantees, sequential application processes can incur large latency overheads, especially as their application-level requests fanout into multiple system-level operations. For example, consider an application request with fanout 100, where each operation takes 5\,ms. A single application process which issues operations sequentially results in a latency of 500\,ms compared to the concurrent process with a latency of 5\,ms.
Modern applications can easily have fanouts of 10, 100, or even 1000~\cite{dean2013tail}. For instance, Meta reports that processing a single Web request results in thousands of system operations~\cite{ajoux2015challenges}. 

For performance reasons, application processes will issue operations that seemingly don't need ordering guarantees \textit{concurrently} rather than \textit{sequentially}, either in multiple threads or via asynchronous libraries for operations. However, as shown, using concurrency can lead to incorrect application behavior. Linearizability thus forces developers to either sacrifice performance by writing sequential programs or it introduces developer burden if programmers want to write safe concurrent programs. We now introduce \mdl{}, a new consistency model that adds ordering guarantees for single-processes exhibiting concurrent behavior
while also redeeming the performance benefits from concurrency.

\ak{in the next section we just kinda jump right to invocation order... meanwhile in this section we only discussed ordering guarantees. should we motivate invocation order specifically???}
% \wl{Should we talk about the situation where single-dispatch is sufficient, i.e., fanout 1 (or 0), and places where the fanout is the same as the depth of data dependencies. The FB stat of 1000s of operations with sequential chains at most dozens deep suggests that fanout is the dominant factor here, so we should add that tidbit if we do.}