forked from CakeML/regression
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Taken from CakeML/cakeml@7a6c12f
- Loading branch information
Showing
7 changed files
with
1,391 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
Automated regression test infrastructure for CakeML. | ||
|
||
[api.sml](api.sml): | ||
Implements the server-side regression-test API as a CGI program. | ||
The API is for workers to view and manipulate the job queues. | ||
|
||
[design.txt](design.txt): | ||
Notes on the design of the automated regression test infrastructure. | ||
|
||
[poll.sml](poll.sml): | ||
Implements automatic refreshing of the job queues. | ||
If there are new jobs on GitHub, they will be added to the waiting queue. | ||
If there are stale jobs, they will be removed. | ||
|
||
[regressionLib.sml](regressionLib.sml): | ||
Code shared between the pieces of the regression test suite. | ||
|
||
[worker.sml](worker.sml): | ||
Worker that claims and runs regression test jobs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
TODO: Change (text) API endpoint to regression.cgi/api/ | ||
|
||
TODO: Write HTML interface for status output | ||
Thus regression.cgi/{,active,waiting,stopped,etc.} will return a nicely | ||
presented webpage (with links) for viewing the current status of the queues. | ||
|
||
TODO: Write help/usage interface for worker | ||
Make the program more self-documenting. | ||
|
||
TODO: Make server send email to [email protected] when a job is stopped | ||
|
||
TODO: Update GitHub commit status for jobs in the system | ||
Probably this should be done by the server or poller. | ||
When a job is claimed: mark its commit as being tested | ||
When a job is stopped: mark its commit as passed/failed | ||
|
||
TODO: Use GitHub webhooks or similar to avoid polling for changes | ||
Thus the poller won't be run in polling mode, but instead will be the target | ||
of a webhook that GitHub calls to update our queues. | ||
|
||
TODO: Add interface for disowning jobs (1) | ||
Currently, you can manually stop a job that is supposedly active and claimed | ||
by a worker that has gone rogue or died. It might be nice to support this | ||
programmatically as an option to the worker, so it can make the appropriate | ||
API calls. | ||
|
||
TODO: Make poller stop jobs that have been running too long | ||
|
||
TODO: Sort the id lists returned by {active,waiting,stopped}? | ||
Prioritise pull requests? | ||
Pull requests could be assigned lower ids by default | ||
|
||
TODO: Be smarter about assigning job ids? | ||
Currently, we only assign new ids that a greater than all existing ones. | ||
Would it be nice to fill in gaps instead? Maybe... When the system is running | ||
well, every job will be either properly stopped or superseded, so the only | ||
way for there to be a gap is when a waiting job gets superseded. | ||
|
||
Gaps can be filled manually by renaming the highest waiting job into the | ||
first gap. Indeed, that strategy could be automated (rather than assigning | ||
job ids correctly in the first place). But assigning correctly is probably | ||
just as easy. | ||
|
||
TODO: Replace calls to the OS shell with SML Basis functionality | ||
In regressionLib.sml there is a call to OS.Process.system that passes quite a | ||
lot of work off to the shell: setting environment variables, executing a | ||
subprocess, and redirecting standard streams. This could all be implemented | ||
in the SML Basis, in particular using Posix structure, instead. Doing so | ||
might be more reliable? | ||
|
||
TODO: Add interface for disowning jobs (2)? | ||
After "disowning" (and stopping) a job, another similar job will eventually | ||
be added to the queue if necessary. But would it be nice to instead be able | ||
to re-use the existing job number? I.e., do not stop the job after disowning | ||
but rather return it to the waiting queue? This would require a new API | ||
action. | ||
|
||
(* | ||
server = api.sml | ||
poller = poll.sml | ||
worker = worker.sml | ||
*) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
(* | ||
Implements the server-side regression-test API as a CGI program. | ||
The API is for workers to view and manipulate the job queues. | ||
*) | ||
use "regressionLib.sml"; | ||
|
||
open regressionLib | ||
|
||
val text_response_header = "Content-Type:text/plain\n\n" | ||
|
||
fun text_response s = | ||
let | ||
val () = TextIO.output(TextIO.stdOut, text_response_header) | ||
val () = TextIO.output(TextIO.stdOut, s) | ||
in () end | ||
|
||
fun cgi_die ls = | ||
(List.app (fn s => TextIO.output(TextIO.stdOut, s)) | ||
(text_response_header::"Error:\n"::ls); | ||
TextIO.output(TextIO.stdOut,"\n"); | ||
OS.Process.exit OS.Process.success; | ||
raise (Fail "impossible")) | ||
|
||
fun cgi_assert b ls = if b then () else cgi_die ls | ||
|
||
val waiting = read_list cgi_die "waiting" | ||
val active = read_list cgi_die "active" | ||
val stopped = read_list cgi_die "stopped" | ||
|
||
fun job id = | ||
let | ||
val f = Int.toString id | ||
val q = queue_of_job cgi_die f | ||
in | ||
OS.Path.concat(q,f) | ||
end | ||
|
||
fun claim id name = | ||
let | ||
val f = Int.toString id | ||
val old = OS.Path.concat("waiting",f) | ||
val new = OS.Path.concat("active",f) | ||
val () = | ||
if OS.FileSys.access(old,[OS.FileSys.A_READ]) then | ||
if OS.FileSys.access(new,[OS.FileSys.A_READ]) then | ||
cgi_die ["job ",f, " is both waiting and active"] | ||
else OS.FileSys.rename{old = old, new = new} | ||
else cgi_die ["job ",f," is not waiting to be claimed"] | ||
val out = TextIO.openAppend new | ||
in | ||
print_claimed out (name,Date.fromTimeUniv(Time.now())) before TextIO.closeOut out | ||
end | ||
|
||
fun append id line = | ||
let | ||
val f = Int.toString id | ||
val p = OS.Path.concat("active",f) | ||
val out = TextIO.openAppend p handle e as IO.Io _ => (cgi_die ["job ",f," is not active: cannot append"]; raise e) | ||
in | ||
print_log_entry out (Date.fromTimeUniv(Time.now()),line) before TextIO.closeOut out | ||
end | ||
|
||
fun log id data = | ||
let | ||
val f = Int.toString id | ||
val p = OS.Path.concat("active",f) | ||
val out = TextIO.openAppend p handle e as IO.Io _ => (cgi_die ["job ",f," is not active: cannot log"]; raise e) | ||
in | ||
TextIO.output(out,data) before TextIO.closeOut out | ||
end | ||
|
||
fun stop id = | ||
let | ||
val f = Int.toString id | ||
val old = OS.Path.concat("active",f) | ||
val new = OS.Path.concat("stopped",f) | ||
val () = | ||
if OS.FileSys.access(old,[OS.FileSys.A_READ]) then | ||
if OS.FileSys.access(new,[OS.FileSys.A_READ]) then | ||
cgi_die ["job ",f, " is both active and stopped"] | ||
else OS.FileSys.rename{old = old, new = new} | ||
else cgi_die ["job ",f," is not active: cannot stop"] | ||
in | ||
() (* TODO: send email *) | ||
end | ||
|
||
fun retry id = | ||
let | ||
val f = Int.toString id | ||
val old = OS.Path.concat("stopped",f) | ||
val () = cgi_assert (OS.FileSys.access(old,[OS.FileSys.A_READ])) ["job ",f," is not stopped: cannot retry"] | ||
val id = next_job_id [waiting,active,stopped] | ||
val new = OS.Path.concat("waiting",Int.toString id) | ||
val inp = TextIO.openIn old | ||
val out = TextIO.openOut new | ||
fun loop last = | ||
case TextIO.inputLine inp of NONE => cgi_die ["stopped job ",f," has invalid file format"] | ||
| SOME line => (TextIO.output(out,line); | ||
if last then () else | ||
loop (String.isPrefix "HOL: " line)) | ||
val () = loop false | ||
val () = TextIO.closeOut out | ||
val () = TextIO.closeIn inp | ||
in id end | ||
|
||
datatype request_api = Get of api | Post of id * string | ||
|
||
fun get_api () = | ||
case (OS.Process.getEnv "PATH_INFO", | ||
OS.Process.getEnv "REQUEST_METHOD") of | ||
(SOME path_info, SOME "GET") | ||
=> Option.map Get (api_from_string path_info (OS.Process.getEnv "QUERY_STRING")) | ||
| (SOME path_info, SOME "POST") | ||
=> (case String.tokens (equal #"/") path_info of | ||
["log",n] => | ||
(Option.mapPartial | ||
(fn len => | ||
Option.compose | ||
((fn id => Post(id,TextIO.inputN(TextIO.stdIn,len))), | ||
id_from_string) n) | ||
(Option.composePartial(Int.fromString,OS.Process.getEnv) "CONTENT_LENGTH")) | ||
| _ => NONE) | ||
| _ => NONE | ||
|
||
local | ||
fun id_list ids = String.concatWith " " (List.map Int.toString ids) | ||
in | ||
fun dispatch api = | ||
text_response ( | ||
case api of | ||
Waiting => id_list (waiting()) | ||
| Active => id_list (active()) | ||
| Stopped => id_list (stopped()) | ||
| Job id => file_to_string (job id) | ||
| Claim(id,name) => (claim id name; claim_response) | ||
| Append(id,line) => (append id line; append_response) | ||
| Stop id => (stop id; stop_response) | ||
| Retry id => String.concat["retried as job ",Int.toString(retry id),"\n"] | ||
) handle e => cgi_die [exnMessage e] | ||
end | ||
|
||
fun dispatch_log id data = | ||
text_response (log id data; log_response) | ||
handle e => cgi_die [exnMessage e] | ||
|
||
fun dispatch_req (Get api) = dispatch api | ||
| dispatch_req (Post (id,data)) = dispatch_log id data | ||
|
||
fun main () = | ||
let | ||
val () = ensure_queue_dirs cgi_die | ||
in | ||
case get_api () of | ||
NONE => cgi_die ["bad usage"] | ||
| SOME req => | ||
let | ||
val fd = acquire_lock () | ||
in | ||
dispatch_req req before | ||
Posix.IO.close fd | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
Notes on the design of the automated regression test infrastructure. | ||
|
||
Server API: | ||
|
||
waiting: | ||
returns space-separated list of ids of waiting jobs | ||
|
||
active: | ||
returns space-separated list of ids of active jobs | ||
|
||
stopped: | ||
returns space-separated list of ids of stopped jobs | ||
|
||
job id: | ||
returns information on job <id> | ||
including: | ||
- commits (including pull request integration, if any) | ||
- (worker) name and time started (if any) | ||
- output so far | ||
|
||
claim id name: | ||
worker <name> claims job <id>. | ||
<name> is in the query string. | ||
returns "claimed" | ||
fails if <id> is not currently waiting | ||
|
||
append id line: | ||
record <line> as additional output for job <id> with a timestamp. | ||
<line> is in the query string. | ||
returns "appended" | ||
fails if <id> is not currently active | ||
|
||
log id data: | ||
append <data> as additional output for job <id>. | ||
<data> is POST data. | ||
returns "logged" | ||
fails if <id> is not currently active | ||
|
||
stop id: | ||
mark job <id> as stopped | ||
returns "stopped" | ||
sends email with output | ||
fails if <id> is not currently active | ||
|
||
retry id: | ||
create a new job with the same commits as job <id> | ||
returns "retried as job <new id>" | ||
fails if <id> is not currently stopped | ||
fails if there is already an active or waiting job for the same commits | ||
|
||
all failures return text starting with "Error:" | ||
|
||
each job is on exactly one list: waiting, active, stopped | ||
if a job changes list, it can only move to the right | ||
|
||
Poller automatic behaviours: | ||
|
||
1. add a job to the waiting list: | ||
- create a new job id | ||
- the commits for the job satisfy the following: | ||
- they are the current commits (on GitHub) for a particular target | ||
- there are no other jobs with the same commits | ||
|
||
2. remove a job from the waiting list: | ||
- if it does not have the current commits (on GitHub) for any target, or | ||
- if there are other active or waiting jobs for the same commits | ||
(this should never happen -- not checking it currently) | ||
(we don't count duplicate stopped jobs since | ||
they might have been retried) | ||
- the removed job's id number could be re-used (but is that worthwhile?) | ||
- the race with workers trying to obtain this job is handled by the global | ||
lock on queues. either the job is claimed before it can be removed, or | ||
removed before it can be claimed. | ||
|
||
3. move a job from active to stopped: | ||
- if the time since it started is too long | ||
- adds a note, "timed out", to the output | ||
- does stop API actions | ||
|
||
Targets: | ||
|
||
CakeML: | ||
- branch "master" | ||
- each open, mergeable pull request | ||
in this case, there are two commits: | ||
- the pull request commit | ||
- the master commit to merge into | ||
but they move together (as captured by the state of the PR on GitHub) | ||
HOL: | ||
- branch "master" | ||
|
||
---- OLD STUFF ---- | ||
|
||
CakeML Regression Test | ||
|
||
Organised as two programs: | ||
|
||
Queue Manager (Server) | ||
- 1 instance, runs on cakeml.org | ||
- Tracks commits on GitHub over the GitHub API | ||
- Manages a queue of jobs | ||
- Allocates jobs to workers as they are available | ||
- Records which jobs have run or are running | ||
- Stores all its state in the filesystem/database: | ||
can be killed and restarted without problems. | ||
|
||
Worker (Client) | ||
- Runs regression test(s) as designated by the queue manager | ||
- Can send the status (generated output) of any currently running job | ||
- Interrupts the currently running job if requested | ||
- May reuse a HOL build if the desired commit has already been built | ||
- Assumes Poly/ML and git are installed | ||
- Is almost stateless: either it's running a job, or it's ready for a job. | ||
The state of HOL builds is kept in the filesystem. | ||
|
||
Scheduling is based on | ||
- "targets" identified by branch name or pull request number | ||
in the CakeML repository. | ||
- "snapshots" representing a specific combination of commits (in all | ||
relevant repositories, e.g., CakeML and HOL). | ||
- "jobs" representing a snapshot and a job state. | ||
- "workers" representing instances of the worker program, each of | ||
which has a worker state. | ||
Each snapshot will correspond to some target. | ||
A target may have many corresponding snapshots. | ||
The possible job states are as follows: | ||
- queued | ||
- active (running, has a corresponding worker) | ||
- finished | ||
- dead (cancelled or superseded) | ||
The possible worker states are as follows: | ||
- testing (running) | ||
- ready | ||
|
||
Jobs can be queued, superseded, allocated (made active and assigned a | ||
worker), or cancelled manually. | ||
|
||
A job will be automatically queued if: | ||
- The job's snapshot has the most recent commits (on GitHub) for its target, and | ||
- No active or finished job has the same snapshot | ||
|
||
A job will be automatically superseded if: | ||
- Either of the conditions for queueing above is false, and | ||
- The job was not queued manually | ||
|
||
A job will be automatically allocated if: | ||
- It is currently queued | ||
- A worker is ready | ||
|
||
The current policy for targets is defined as follows: | ||
- each open pull request is a target | ||
- the branch "master" is a target | ||
- the branch "master" in HOL is assumed |
Oops, something went wrong.