This repository has been archived by the owner on Apr 10, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathrts.tex
66 lines (59 loc) · 2.58 KB
/
rts.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
\chapter{Runtime System Improvements}
\label{chap:rts}
\status{This chapter is finished!}
Early in the project
we tested two manually parallelised programs:
a raytracer and a mandelbrot image generator.
Both programs have a single significant loop
whose iterations are independent of one another.
We expect that a good automatic parallelisation system will parallelise this
loop as it is the best place to introduce parallelism.
When we parallelised this loop manually,
we did not get the speedups that we expected.
Therefore,
we chose to address the performance problems
before we worked on automatic parallelism.
Throughout this chapter
we continue to use these two benchmarks, along with a naive Fibonacci
program (Page~\pageref{page:fibs}).
These benchmarks are not diverse and they all create a lot of
AND-parallelism,
most of which is independent.
We use these benchmarks deliberately to test that our runtime system can
handle large amounts of parallelism efficiently.
In this chapter we investigate and correct these performance problems.
We start with the garbage collector in Section~\ref{sec:rts_gc};
we analyse the collector's effects on performance and tune its parameters
to improve performance.
In Section~\ref{sec:rts_original_scheduling} we describe how the existing runtime
system schedules sparks,
and provide background material for
Section~\ref{sec:rts_original_scheduling_performance},
which benchmarks the runtime system and describes two significant problems with
spark scheduling.
We address one of these problems by introducing work stealing in
Section~\ref{sec:rts_work_stealing}.
Then in Section~\ref{sec:rts_reorder} we reorder conjuncts in independent
parallel conjunctions to work around the second spark scheduling problem.
Finally, in Section~\ref{sec:rts_work_stealing2} we make further
improvements to
work stealing and change the data structures and algorithms used to manage
idle engines,
including how idle engines look for work, sleep and are woken up.
\input{rts_gc}
\input{rts_original_scheduling}
\input{rts_original_scheduling_performance}
\input{rts_work_stealing}
\input{rts_reorder}
%\input{rts_thread_pinning}
\input{rts_work_stealing2}
%\section{Proposed scheduling tweaks}
%\label{sec:proposed_tweaks}
%\status{Not written, May move to TS chapter}
%
%I really think that this section will move to the \tscope chapter,
%it will have more in common with that chapter and more data will be
%available.
%Secondly, \tscope can be used with micro-benchmarks to measure the
%average costs of certain operations in the RTS.
%I will not write it until at least the rest of this chapter is finished.