diff --git a/README.md b/README.md
index 8d6d955aa..185761704 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@ compiler for Standard ML which implements support for
nested (fork-join) parallelism. MPL generates executables with
excellent multicore performance, utilizing a novel approach to
memory management based on the theory of disentanglement
-[[1](#rmab16),[2](#gwraf18),[3](#wyfa20),[4](#awa21),[5](#waa22)].
+[[1](#rmab16),[2](#gwraf18),[3](#wyfa20),[4](#awa21),[5](#waa22),[6](#awa23)].
MPL is research software and is being actively developed.
@@ -25,7 +25,7 @@ $ docker run -it shwestrick/mpl /bin/bash
...# examples/bin/primes @mpl procs 4 --
```
-If you want to try out MPL by writing and compiling your own code, we recommend
+To write and compile your own code, we recommend
mounting a local directory inside the container. For example, here's how you
can use MPL to compile and run your own `main.mlb` in the current directory.
(To mount some other directory, replace `$(pwd -P)` with a different path.)
@@ -38,46 +38,36 @@ $ docker run -it -v $(pwd -P):/root/mycode shwestrick/mpl /bin/bash
...# ./main @mpl procs 4 --
```
+## Benchmark Suite
-## Build and Install (from source)
+The [Parallel ML benchmark suite](https://github.com/MPLLang/parallel-ml-bench)
+provides many examples of sophisticated parallel algorithms and
+applications in MPL, as well as cross-language performance comparisons with
+C++, Go, Java,
+and multicore OCaml.
-### Requirements
+## Libraries
-MPL has only been tested on Linux with x86-64. The following software is
-required.
- * [GCC](http://gcc.gnu.org)
- * [GMP](http://gmplib.org) (GNU Multiple Precision arithmetic library)
- * [GNU Make](http://savannah.gnu.org/projects/make), [GNU Bash](http://www.gnu.org/software/bash/)
- * binutils (`ar`, `ranlib`, `strip`, ...)
- * miscellaneous Unix utilities (`diff`, `find`, `grep`, `gzip`, `patch`, `sed`, `tar`, `xargs`, ...)
- * Standard ML compiler and tools:
- - Recommended: [MLton](http://mlton.org) (`mlton`, `mllex`, and `mlyacc`). Pre-built binary packages for MLton can be installed via an OS package manager or (for select platforms) obtained from http://mlton.org.
- - Supported but not recommended: [SML/NJ](http://www.smlnj.org) (`sml`, `ml-lex`, `ml-yacc`).
+We recommend using the [smlpkg](https://github.com/diku-dk/smlpkg) package
+manager. MPL supports the full SML language, so existing libraries for
+SML can be used.
-### Instructions
+In addition, here are a few libraries that make use of MPL for parallelism:
+ * [`github.com/MPLLang/mpllib`](https://github.com/MPLLang/mpllib): implements
+ a variety of data structures (sequences, sets, dictionaries, graphs, matrices, meshes,
+ images, etc.) and parallel algorithms (map, reduce, scan, filter, sorting,
+ search, tokenization, graph processing, computational geometry, etc.). Also
+ includes basic utilies (e.g. parsing command-line arguments) and
+ benchmarking infrastructure.
+ * [`github.com/shwestrick/sml-audio`](https://github.com/shwestrick/sml-audio):
+ a library for audio processing with I/O support for `.wav` files.
-The following builds the compiler at `build/bin/mpl`.
-```
-$ make all
-```
-
-After building, MPL can then be installed to `/usr/local`:
-```
-$ make install
-```
-or to a custom directory with the `PREFIX` option:
-```
-$ make PREFIX=/opt/mpl install
-```
## Parallel and Concurrent Extensions
MPL extends SML with a number of primitives for parallelism and concurrency.
Take a look at `examples/` to see these primitives in action.
-**Note**: Before writing any of your own code, make sure to read the section
-"Disentanglement" below.
-
### The `ForkJoin` Structure
```
val par: (unit -> 'a) * (unit -> 'b) -> 'a * 'b
@@ -163,8 +153,6 @@ by default.
* `-debug true -debug-runtime true -keep g` For debugging, keeps the generated
C files and uses the debug version of the runtime (with assertions enabled).
The resulting executable is somewhat peruse-able with tools like `gdb`.
-* `-detect-entanglement true` enables the dynamic entanglement detector.
-See below for more information.
For example:
```
@@ -198,60 +186,13 @@ argument `bar` using 4 pinned processors.
$ foo @mpl procs 4 set-affinity -- bar
```
-## Disentanglement
-
-Currently, MPL only supports programs that are **disentangled**, which
-(roughly speaking) is the property that concurrent threads remain oblivious
-to each other's allocations [[3](#wyfa20)].
-
-Here are a number of different ways to guarantee that your code is
-disentangled.
-- (Option 1) Use only purely functional data (no `ref`s or `array`s). This is
-the simplest but most restrictive approach.
-- (Option 2) If using mutable data, use only non-pointer data. MPL guarantees
-that simple types (`int`, `word`, `char`, `real`, etc.) are never
-indirected through a
-pointer, so for example it is safe to use `int array`. Other types such as
-`int list array` and `int array array` should be avoided. This approach
-is very easy to check and is surprisingly general. Data races are fine!
-- (Option 3) Make sure that your program is race-free. This can be
-tricky to check but allows you to use any type of data. Many of our example
-programs are race-free.
-
-## Entanglement Detection
-
-Whenever a thread acquires a reference
-to an object allocated concurrently by some other thread, then we say that
-the two threads are **entangled**. This is a violation of disentanglement,
-which MPL currently does not allow.
-
-MPL has a built-in dynamic entanglement detector which is enabled by default.
-The entanglement detector monitors individual reads and writes during execution;
-if entanglement is found, the program will terminate with an error message.
-
-The entanglement detector is both "sound" and "complete": there are neither
-false negatives nor false positives. In other words, the detector always raises
-an alarm when entanglement occurs, and never raises an alarm otherwise. Note
-however that entanglement (and therefore also entanglement detection) can
-be execution-dependent: if your program is non-deterministic (e.g. racy),
-then entanglement may or may not occur depending on the outcome of a race
-condition. Similarly, entanglement could be input-dependent.
-
-Entanglement detection is highly optimized, and typically has negligible
-overhead (see [[5](#waa22)]). It can be disabled at compile-time by passing
-`-detect-entanglement false`; however, we recommend against doing so. MPL
-relies on entanglement detection to ensure memory safety. We recommend leaving
-entanglement detection enabled at all times.
## Bugs and Known Issues
### Basis Library
-In general, the basis library has not yet been thoroughly scrubbed, and many
-functions may not be safe for parallelism
+The basis library is inherited from (sequential) SML. It has not yet been
+thoroughly scrubbed, and some functions may not be safe for parallelism
([#41](https://github.com/MPLLang/mpl/issues/41)).
-Some known issues:
-* `Int.toString` is racy when called in parallel.
-* `Real.fromString` may throw an error when called in parallel.
### Garbage Collection
* ([#115](https://github.com/MPLLang/mpl/issues/115)) The GC is currently
@@ -274,6 +215,61 @@ unsupported, including (but not limited to):
* `Weak`
* `World`
+
+## Build and Install (from source)
+
+### Requirements
+
+MPL has only been tested on Linux with x86-64. The following software is
+required.
+ * [GCC](http://gcc.gnu.org)
+ * [GMP](http://gmplib.org) (GNU Multiple Precision arithmetic library)
+ * [GNU Make](http://savannah.gnu.org/projects/make), [GNU Bash](http://www.gnu.org/software/bash/)
+ * binutils (`ar`, `ranlib`, `strip`, ...)
+ * miscellaneous Unix utilities (`diff`, `find`, `grep`, `gzip`, `patch`, `sed`, `tar`, `xargs`, ...)
+ * Standard ML compiler and tools:
+ - Recommended: [MLton](http://mlton.org) (`mlton`, `mllex`, and `mlyacc`). Pre-built binary packages for MLton can be installed via an OS package manager or (for select platforms) obtained from http://mlton.org.
+ - Supported but not recommended: [SML/NJ](http://www.smlnj.org) (`sml`, `ml-lex`, `ml-yacc`).
+ * (If using [`mpl-switch`](https://github.com/mpllang/mpl-switch)): Python 3, and `git`.
+
+### Installation with `mpl-switch`
+
+The [`mpl-switch`](https://github.com/mpllang/mpl-switch) utility makes it
+easy to install multiple versions of MPL on the same system and switch
+between them. After setting up `mpl-switch`, you can install MPL as follows:
+```
+$ mpl-switch install v0.4
+$ mpl-switch select v0.4
+```
+
+You can use any commit hash or tag name from the MPL repo to pick a
+particular version of MPL. Installed versions are stored in `~/.mpl/`; this
+folder is safe to delete at any moment, as it can always be regenerated. To
+see what versions of MPL are currently installed, do:
+```
+$ mpl-switch list
+```
+
+### Manual Instructions
+
+Alternatively, you can manually build `mpl` by cloning this repo and then
+performing the following.
+
+**Build the executable**. This produces an executable at `build/bin/mpl`:
+```
+$ make
+```
+
+**Put it where you want it**. After building, MPL can then be installed to
+`/usr/local`:
+```
+$ make install
+```
+or to a custom directory with the `PREFIX` option:
+```
+$ make PREFIX=/opt/mpl install
+```
+
## References
[1]
@@ -300,3 +296,8 @@ POPL 2021.
[Entanglement Detection with Near-Zero Cost](http://www.cs.cmu.edu/~swestric/22/icfp-detect.pdf).
Sam Westrick, Jatin Arora, and Umut A. Acar.
ICFP 2022.
+
+[6]
+[Efficient Parallel Functional Programming with Effects](https://www.cs.cmu.edu/~swestric/23/epfpe.pdf).
+Jatin Arora, Sam Westrick, and Umut A. Acar.
+PLDI 2023.
diff --git a/basis-library/mlton/thread.sig b/basis-library/mlton/thread.sig
index 90b9caaaa..9b25da5b9 100644
--- a/basis-library/mlton/thread.sig
+++ b/basis-library/mlton/thread.sig
@@ -42,6 +42,8 @@ signature MLTON_THREAD =
structure HierarchicalHeap :
sig
type thread = Basic.t
+ type clear_set
+ type finished_clear_set_grain
(* The level (depth) of a thread's heap in the hierarchy. *)
val getDepth : thread -> int
@@ -69,6 +71,16 @@ signature MLTON_THREAD =
(* Move all chunks at the current depth up one level. *)
val promoteChunks : thread -> unit
+ val clearSuspectsAtDepth: thread * int -> unit
+ val numSuspectsAtDepth: thread * int -> int
+ val takeClearSetAtDepth: thread * int -> clear_set
+ val numChunksInClearSet: clear_set -> int
+ val processClearSetGrain: clear_set * int * int -> finished_clear_set_grain
+ val commitFinishedClearSetGrain: thread * finished_clear_set_grain -> unit
+ val deleteClearSet: clear_set -> unit
+
+ val updateBytesPinnedEntangledWatermark: unit -> unit
+
(* "put a new thread in the hierarchy *)
val moveNewThreadToDepth : thread * int -> unit
diff --git a/basis-library/mlton/thread.sml b/basis-library/mlton/thread.sml
index 4b3b98659..1e7bf8c5c 100644
--- a/basis-library/mlton/thread.sml
+++ b/basis-library/mlton/thread.sml
@@ -73,6 +73,9 @@ struct
type thread = Basic.t
type t = MLtonPointer.t
+ type clear_set = MLtonPointer.t
+ type finished_clear_set_grain = MLtonPointer.t
+
fun forceLeftHeap (myId, t) = Prim.forceLeftHeap(Word32.fromInt myId, t)
fun forceNewChunk () = Prim.forceNewChunk (gcState ())
fun registerCont (kl, kr, k, t) = Prim.registerCont(kl, kr, k, t)
@@ -90,6 +93,30 @@ struct
Prim.moveNewThreadToDepth (t, Word32.fromInt d)
fun checkFinishedCCReadyToJoin () =
Prim.checkFinishedCCReadyToJoin (gcState ())
+
+ fun clearSuspectsAtDepth (t, d) =
+ Prim.clearSuspectsAtDepth (gcState (), t, Word32.fromInt d)
+
+ fun numSuspectsAtDepth (t, d) =
+ Word64.toInt (Prim.numSuspectsAtDepth (gcState (), t, Word32.fromInt d))
+
+ fun takeClearSetAtDepth (t, d) =
+ Prim.takeClearSetAtDepth (gcState (), t, Word32.fromInt d)
+
+ fun numChunksInClearSet c =
+ Word64.toInt (Prim.numChunksInClearSet (gcState (), c))
+
+ fun processClearSetGrain (c, start, stop) =
+ Prim.processClearSetGrain (gcState (), c, Word64.fromInt start, Word64.fromInt stop)
+
+ fun commitFinishedClearSetGrain (t, fcsg) =
+ Prim.commitFinishedClearSetGrain (gcState (), t, fcsg)
+
+ fun deleteClearSet c =
+ Prim.deleteClearSet (gcState (), c)
+
+ fun updateBytesPinnedEntangledWatermark () =
+ Prim.updateBytesPinnedEntangledWatermark (gcState ())
end
structure Disentanglement =
diff --git a/basis-library/mpl/gc.sig b/basis-library/mpl/gc.sig
index c1c596aa4..1b4c5cabb 100644
--- a/basis-library/mpl/gc.sig
+++ b/basis-library/mpl/gc.sig
@@ -17,12 +17,15 @@ sig
*)
val numberDisentanglementChecks: unit -> IntInf.int
- (* How many times entanglement has been detected at a read barrier.
- *)
- val numberEntanglementsDetected: unit -> IntInf.int
+ (* How many times the entanglement is detected *)
+ val numberEntanglements: unit -> IntInf.int
+
+ val approxRaceFactor: unit -> Real32.real
val numberSuspectsMarked: unit -> IntInf.int
val numberSuspectsCleared: unit -> IntInf.int
+ val bytesPinnedEntangled: unit -> IntInf.int
+ val bytesPinnedEntangledWatermark: unit -> IntInf.int
val getControlMaxCCDepth: unit -> int
@@ -43,6 +46,8 @@ sig
val localBytesReclaimed: unit -> IntInf.int
val localBytesReclaimedOfProc: int -> IntInf.int
+ val bytesInScopeForLocal: unit -> IntInf.int
+
val numLocalGCs: unit -> IntInf.int
val numLocalGCsOfProc: int -> IntInf.int
@@ -52,21 +57,28 @@ sig
val promoTime: unit -> Time.time
val promoTimeOfProc: int -> Time.time
+ val numCCs: unit -> IntInf.t
+ val numCCsOfProc: int -> IntInf.t
+
+ val ccBytesReclaimed: unit -> IntInf.int
+ val ccBytesReclaimedOfProc: int -> IntInf.int
+
+ val bytesInScopeForCC: unit -> IntInf.int
+
+ val ccTime: unit -> Time.time
+ val ccTimeOfProc: int -> Time.time
+
+ (* DEPRECATED *)
val rootBytesReclaimed: unit -> IntInf.int
val rootBytesReclaimedOfProc: int -> IntInf.int
-
val internalBytesReclaimed: unit -> IntInf.int
val internalBytesReclaimedOfProc: int -> IntInf.int
-
val numRootCCs: unit -> IntInf.int
val numRootCCsOfProc: int -> IntInf.int
-
val numInternalCCs: unit -> IntInf.int
val numInternalCCsOfProc: int -> IntInf.int
-
val rootCCTime: unit -> Time.time
val rootCCTimeOfProc: int -> Time.time
-
val internalCCTime: unit -> Time.time
val internalCCTimeOfProc: int -> Time.time
end
diff --git a/basis-library/mpl/gc.sml b/basis-library/mpl/gc.sml
index 83f0a9d48..5f2417e04 100644
--- a/basis-library/mpl/gc.sml
+++ b/basis-library/mpl/gc.sml
@@ -24,24 +24,27 @@ struct
GC.getCumulativeStatisticsBytesAllocatedOfProc (gcState (), Word32.fromInt p)
fun getCumulativeStatisticsLocalBytesReclaimedOfProc p =
GC.getCumulativeStatisticsLocalBytesReclaimedOfProc (gcState (), Word32.fromInt p)
- fun getNumRootCCsOfProc p =
- GC.getNumRootCCsOfProc (gcState (), Word32.fromInt p)
- fun getNumInternalCCsOfProc p =
- GC.getNumInternalCCsOfProc (gcState (), Word32.fromInt p)
- fun getRootCCMillisecondsOfProc p =
- GC.getRootCCMillisecondsOfProc (gcState (), Word32.fromInt p)
- fun getInternalCCMillisecondsOfProc p =
- GC.getInternalCCMillisecondsOfProc (gcState (), Word32.fromInt p)
- fun getRootCCBytesReclaimedOfProc p =
- GC.getRootCCBytesReclaimedOfProc (gcState (), Word32.fromInt p)
- fun getInternalCCBytesReclaimedOfProc p =
- GC.getInternalCCBytesReclaimedOfProc (gcState (), Word32.fromInt p)
+ fun getNumCCsOfProc p =
+ GC.getNumCCsOfProc (gcState (), Word32.fromInt p)
+ fun getCCMillisecondsOfProc p =
+ GC.getCCMillisecondsOfProc (gcState (), Word32.fromInt p)
+ fun getCCBytesReclaimedOfProc p =
+ GC.getCCBytesReclaimedOfProc (gcState (), Word32.fromInt p)
+
+ fun bytesInScopeForLocal () =
+ C_UIntmax.toLargeInt (GC.bytesInScopeForLocal (gcState ()))
+
+ fun bytesInScopeForCC () =
+ C_UIntmax.toLargeInt (GC.bytesInScopeForCC (gcState ()))
fun numberDisentanglementChecks () =
C_UIntmax.toLargeInt (GC.numberDisentanglementChecks (gcState ()))
- fun numberEntanglementsDetected () =
- C_UIntmax.toLargeInt (GC.numberEntanglementsDetected (gcState ()))
+ fun numberEntanglements () =
+ C_UIntmax.toLargeInt (GC.numberEntanglements (gcState ()))
+
+ fun approxRaceFactor () =
+ (GC.approxRaceFactor (gcState ()))
fun getControlMaxCCDepth () =
Word32.toInt (GC.getControlMaxCCDepth (gcState ()))
@@ -51,6 +54,12 @@ struct
fun numberSuspectsCleared () =
C_UIntmax.toLargeInt (GC.numberSuspectsCleared (gcState ()))
+
+ fun bytesPinnedEntangled () =
+ C_UIntmax.toLargeInt (GC.bytesPinnedEntangled (gcState ()))
+
+ fun bytesPinnedEntangledWatermark () =
+ C_UIntmax.toLargeInt (GC.bytesPinnedEntangledWatermark (gcState ()))
end
exception NotYetImplemented of string
@@ -92,34 +101,19 @@ struct
; millisecondsToTime (getPromoMillisecondsOfProc p)
)
- fun numRootCCsOfProc p =
- ( checkProcNum p
- ; C_UIntmax.toLargeInt (getNumRootCCsOfProc p)
- )
-
- fun numInternalCCsOfProc p =
- ( checkProcNum p
- ; C_UIntmax.toLargeInt (getNumInternalCCsOfProc p)
- )
-
- fun rootCCTimeOfProc p =
+ fun numCCsOfProc p =
( checkProcNum p
- ; millisecondsToTime (getRootCCMillisecondsOfProc p)
+ ; C_UIntmax.toLargeInt (getNumCCsOfProc p)
)
- fun internalCCTimeOfProc p =
+ fun ccTimeOfProc p =
( checkProcNum p
- ; millisecondsToTime (getInternalCCMillisecondsOfProc p)
+ ; millisecondsToTime (getCCMillisecondsOfProc p)
)
- fun rootBytesReclaimedOfProc p =
+ fun ccBytesReclaimedOfProc p =
( checkProcNum p
- ; C_UIntmax.toLargeInt (getRootCCBytesReclaimedOfProc p)
- )
-
- fun internalBytesReclaimedOfProc p =
- ( checkProcNum p
- ; C_UIntmax.toLargeInt (getInternalCCBytesReclaimedOfProc p)
+ ; C_UIntmax.toLargeInt (getCCBytesReclaimedOfProc p)
)
fun sumAllProcs (f: 'a * 'a -> 'a) (perProc: int -> 'a) =
@@ -148,28 +142,39 @@ struct
fun promoTime () =
millisecondsToTime (sumAllProcs C_UIntmax.+ getPromoMillisecondsOfProc)
- fun numRootCCs () =
- C_UIntmax.toLargeInt
- (sumAllProcs C_UIntmax.+ getNumRootCCsOfProc)
-
- fun numInternalCCs () =
+ fun numCCs () =
C_UIntmax.toLargeInt
- (sumAllProcs C_UIntmax.+ getNumInternalCCsOfProc)
-
- fun rootCCTime () =
- millisecondsToTime
- (sumAllProcs C_UIntmax.+ getRootCCMillisecondsOfProc)
+ (sumAllProcs C_UIntmax.+ getNumCCsOfProc)
- fun internalCCTime () =
+ fun ccTime () =
millisecondsToTime
- (sumAllProcs C_UIntmax.+ getInternalCCMillisecondsOfProc)
-
- fun rootBytesReclaimed () =
- C_UIntmax.toLargeInt
- (sumAllProcs C_UIntmax.+ getRootCCBytesReclaimedOfProc)
+ (sumAllProcs C_UIntmax.+ getCCMillisecondsOfProc)
- fun internalBytesReclaimed () =
+ fun ccBytesReclaimed () =
C_UIntmax.toLargeInt
- (sumAllProcs C_UIntmax.+ getInternalCCBytesReclaimedOfProc)
+ (sumAllProcs C_UIntmax.max getCCBytesReclaimedOfProc)
+
+
+ (* ======================================================================
+ * DEPRECATED
+ *)
+
+ exception Deprecated of string
+
+ fun d name (_: 'a) : 'b =
+ raise Deprecated ("MPL.GC." ^ name)
+
+ val rootBytesReclaimed = d "rootBytesReclaimed"
+ val rootBytesReclaimedOfProc = d "rootBytesReclaimedOfProc"
+ val internalBytesReclaimed = d "internalBytesReclaimed"
+ val internalBytesReclaimedOfProc = d "internalBytesReclaimedOfProc"
+ val numRootCCs = d "numRootCCs"
+ val numRootCCsOfProc = d "numRootCCsOfProc"
+ val numInternalCCs = d "numInternalCCs"
+ val numInternalCCsOfProc = d "numInternalCCsOfProc"
+ val rootCCTime = d "rootCCTime"
+ val rootCCTimeOfProc = d "rootCCTimeOfProc"
+ val internalCCTime = d "internalCCTime"
+ val internalCCTimeOfProc = d "internalCCTimeOfProc"
end
diff --git a/basis-library/primitive/prim-mlton.sml b/basis-library/primitive/prim-mlton.sml
index e6d7021d6..54376834c 100644
--- a/basis-library/primitive/prim-mlton.sml
+++ b/basis-library/primitive/prim-mlton.sml
@@ -159,20 +159,29 @@ structure GC =
val getCumulativeStatisticsLocalBytesReclaimedOfProc = _import
"GC_getCumulativeStatisticsLocalBytesReclaimedOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
- val getNumRootCCsOfProc = _import "GC_getNumRootCCsOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
- val getNumInternalCCsOfProc = _import "GC_getNumInternalCCsOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
- val getRootCCMillisecondsOfProc = _import "GC_getRootCCMillisecondsOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
- val getInternalCCMillisecondsOfProc = _import "GC_getInternalCCMillisecondsOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
- val getRootCCBytesReclaimedOfProc = _import "GC_getRootCCBytesReclaimedOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
- val getInternalCCBytesReclaimedOfProc = _import "GC_getInternalCCBytesReclaimedOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
+ val bytesInScopeForLocal =
+ _import "GC_bytesInScopeForLocal" runtime private:
+ GCState.t -> C_UIntmax.t;
+
+ val bytesInScopeForCC =
+ _import "GC_bytesInScopeForCC" runtime private:
+ GCState.t -> C_UIntmax.t;
+
+ val getNumCCsOfProc = _import "GC_getNumCCsOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
+ val getCCMillisecondsOfProc = _import "GC_getCCMillisecondsOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
+ val getCCBytesReclaimedOfProc = _import "GC_getCCBytesReclaimedOfProc" runtime private: GCState.t * Word32.word -> C_UIntmax.t;
val numberDisentanglementChecks = _import "GC_numDisentanglementChecks" runtime private: GCState.t -> C_UIntmax.t;
+ val numberEntanglements = _import "GC_numEntanglements" runtime private: GCState.t -> C_UIntmax.t;
- val numberEntanglementsDetected = _import "GC_numEntanglementsDetected" runtime private: GCState.t -> C_UIntmax.t;
+ val approxRaceFactor = _import "GC_approxRaceFactor" runtime private: GCState.t -> Real32.real;
val numberSuspectsMarked = _import "GC_numSuspectsMarked" runtime private: GCState.t -> C_UIntmax.t;
val numberSuspectsCleared = _import "GC_numSuspectsCleared" runtime private: GCState.t -> C_UIntmax.t;
+
+ val bytesPinnedEntangled = _import "GC_bytesPinnedEntangled" runtime private: GCState.t -> C_UIntmax.t;
+ val bytesPinnedEntangledWatermark = _import "GC_bytesPinnedEntangledWatermark" runtime private: GCState.t -> C_UIntmax.t;
end
structure HM =
@@ -368,6 +377,24 @@ structure Thread =
val setMinLocalCollectionDepth = _import "GC_HH_setMinLocalCollectionDepth" runtime private: thread * Word32.word -> unit;
val mergeThreads = _import "GC_HH_mergeThreads" runtime private: thread * thread -> unit;
val promoteChunks = _import "GC_HH_promoteChunks" runtime private: thread -> unit;
+ val clearSuspectsAtDepth = _import "GC_HH_clearSuspectsAtDepth" runtime private:
+ GCState.t * thread * Word32.word -> unit;
+ val numSuspectsAtDepth = _import "GC_HH_numSuspectsAtDepth" runtime private:
+ GCState.t * thread * Word32.word -> Word64.word;
+ val takeClearSetAtDepth = _import "GC_HH_takeClearSetAtDepth" runtime private:
+ GCState.t * thread * Word32.word -> Pointer.t;
+ val numChunksInClearSet = _import "GC_HH_numChunksInClearSet" runtime private:
+ GCState.t * Pointer.t -> Word64.word;
+ val processClearSetGrain = _import "GC_HH_processClearSetGrain" runtime private:
+ GCState.t * Pointer.t * Word64.word * Word64.word -> Pointer.t;
+ val commitFinishedClearSetGrain = _import "GC_HH_commitFinishedClearSetGrain" runtime private:
+ GCState.t * thread * Pointer.t -> unit;
+ val deleteClearSet = _import "GC_HH_deleteClearSet" runtime private:
+ GCState.t * Pointer.t -> unit;
+
+ val updateBytesPinnedEntangledWatermark =
+ _import "GC_updateBytesPinnedEntangledWatermark" runtime private:
+ GCState.t -> unit;
val decheckFork = _import "GC_HH_decheckFork" runtime private:
GCState.t * Word64.word ref * Word64.word ref -> unit;
diff --git a/basis-library/schedulers/shh/CumulativePerProcTimer.sml b/basis-library/schedulers/shh/CumulativePerProcTimer.sml
new file mode 100644
index 000000000..4f0202981
--- /dev/null
+++ b/basis-library/schedulers/shh/CumulativePerProcTimer.sml
@@ -0,0 +1,73 @@
+functor CumulativePerProcTimer(val timerName: string):
+sig
+ val start: unit -> unit
+ val tick: unit -> unit (* Essentially the same as (stop();start()) *)
+ val stop: unit -> unit
+
+ val isStarted: unit -> bool
+
+ val cumulative: unit -> Time.time
+end =
+struct
+
+ val numP = MLton.Parallel.numberOfProcessors
+ fun myId () = MLton.Parallel.processorNumber ()
+
+ fun die strfn =
+ ( print (Int.toString (myId ()) ^ ": " ^ strfn ())
+ ; OS.Process.exit OS.Process.failure
+ )
+
+ val totals = Array.array (numP, Time.zeroTime)
+ val starts = Array.array (numP, Time.zeroTime)
+ val isRunning = Array.array (numP, false)
+
+ fun isStarted () =
+ Array.sub (isRunning, myId())
+
+ fun start () =
+ let
+ val p = myId()
+ in
+ if Array.sub (isRunning, p) then
+ die (fn _ => "timer \"" ^ timerName ^ "\": start after start")
+ else
+ ( Array.update (isRunning, p, true)
+ ; Array.update (starts, p, Time.now ())
+ )
+ end
+
+ fun tick () =
+ let
+ val p = myId()
+ val tnow = Time.now ()
+ val delta = Time.- (tnow, Array.sub (starts, p))
+ in
+ if not (Array.sub (isRunning, p)) then
+ die (fn _ => "timer \"" ^ timerName ^ "\": tick while stopped")
+ else
+ ( Array.update (totals, p, Time.+ (Array.sub (totals, p), delta))
+ ; Array.update (starts, p, tnow)
+ )
+ end
+
+ fun stop () =
+ let
+ val p = myId()
+ val tnow = Time.now ()
+ val delta = Time.- (tnow, Array.sub (starts, p))
+ in
+ if not (Array.sub (isRunning, p)) then
+ die (fn _ => "timer \"" ^ timerName ^ "\": stop while stopped")
+ else
+ ( Array.update (isRunning, p, false)
+ ; Array.update (totals, p, Time.+ (Array.sub (totals, p), delta))
+ )
+ end
+
+ fun cumulative () =
+ ( if isStarted () then tick () else ()
+ ; Array.foldl Time.+ Time.zeroTime totals
+ )
+
+end
\ No newline at end of file
diff --git a/basis-library/schedulers/shh/DummyTimer.sml b/basis-library/schedulers/shh/DummyTimer.sml
new file mode 100644
index 000000000..ff2f17b7c
--- /dev/null
+++ b/basis-library/schedulers/shh/DummyTimer.sml
@@ -0,0 +1,19 @@
+functor DummyTimer(val timerName: string):
+sig
+ val start: unit -> unit
+ val tick: unit -> unit (* Essentially the same as (stop();start()) *)
+ val stop: unit -> unit
+
+ val isStarted: unit -> bool
+
+ val cumulative: unit -> Time.time
+end =
+struct
+
+ fun isStarted () = false
+ fun start () = ()
+ fun tick () = ()
+ fun stop () = ()
+ fun cumulative () = Time.zeroTime
+
+end
\ No newline at end of file
diff --git a/basis-library/schedulers/shh/FORK_JOIN.sig b/basis-library/schedulers/shh/FORK_JOIN.sig
index ddc867801..c733a0fa7 100644
--- a/basis-library/schedulers/shh/FORK_JOIN.sig
+++ b/basis-library/schedulers/shh/FORK_JOIN.sig
@@ -8,9 +8,7 @@ sig
(* synonym for par *)
val fork: (unit -> 'a) * (unit -> 'b) -> 'a * 'b
- (* other scheduler hooks *)
- val communicate: unit -> unit
- val getIdleTime: int -> Time.time
-
+ val idleTimeSoFar: unit -> Time.time
+ val workTimeSoFar: unit -> Time.time
val maxForkDepthSoFar: unit -> int
end
diff --git a/basis-library/schedulers/shh/Scheduler.sml b/basis-library/schedulers/shh/Scheduler.sml
index 5c3179d05..57517c441 100644
--- a/basis-library/schedulers/shh/Scheduler.sml
+++ b/basis-library/schedulers/shh/Scheduler.sml
@@ -100,33 +100,11 @@ struct
fun dbgmsg' _ = ()
(* ========================================================================
- * IDLENESS TRACKING
+ * TIMERS
*)
- val idleTotals = Array.array (P, Time.zeroTime)
- fun getIdleTime p = arraySub (idleTotals, p)
- fun updateIdleTime (p, deltaTime) =
- arrayUpdate (idleTotals, p, Time.+ (getIdleTime p, deltaTime))
-
-(*
- val timerGrain = 256
- fun startTimer myId = (myId, 0, Time.now ())
- fun tickTimer (p, count, t) =
- if count < timerGrain then (p, count+1, t) else
- let
- val t' = Time.now ()
- val diff = Time.- (t', t)
- val _ = updateIdleTime (p, diff)
- in
- (p, 0, t')
- end
- fun stopTimer (p, _, t) =
- (tickTimer (p, timerGrain, t); ())
-*)
-
- fun startTimer _ = ()
- fun tickTimer _ = ()
- fun stopTimer _ = ()
+ structure IdleTimer = CumulativePerProcTimer(val timerName = "idle")
+ structure WorkTimer = CumulativePerProcTimer(val timerName = "work")
(** ========================================================================
* MAXIMUM FORK DEPTHS
@@ -218,8 +196,6 @@ struct
Queue.tryPopTop queue
end
- fun communicate () = ()
-
fun push x =
let
val myId = myWorkerId ()
@@ -273,9 +249,6 @@ struct
Finished x => x
| Raised e => raise e
- val communicate = communicate
- val getIdleTime = getIdleTime
-
(* Must be called from a "user" thread, which has an associated HH *)
fun parfork thread depth (f : unit -> 'a, g : unit -> 'b) =
let
@@ -339,6 +312,8 @@ struct
( HH.promoteChunks thread
; HH.setDepth (thread, depth)
; DE.decheckJoin (tidLeft, tidRight)
+ ; maybeParClearSuspectsAtDepth (thread, depth)
+ ; if depth <> 1 then () else HH.updateBytesPinnedEntangledWatermark ()
(* ; dbgmsg' (fn _ => "join fast at depth " ^ Int.toString depth) *)
(* ; HH.forceNewChunk () *)
; let
@@ -362,6 +337,8 @@ struct
HH.setDepth (thread, depth);
DE.decheckJoin (tidLeft, tidRight);
setQueueDepth (myWorkerId ()) depth;
+ maybeParClearSuspectsAtDepth (thread, depth);
+ if depth <> 1 then () else HH.updateBytesPinnedEntangledWatermark ();
(* dbgmsg' (fn _ => "join slow at depth " ^ Int.toString depth); *)
case HM.refDerefNoBarrier rightSideResult of
NONE => die (fn _ => "scheduler bug: join failed: missing result")
@@ -374,8 +351,85 @@ struct
(extractResult fr, extractResult gr)
end
+
+ and simpleParFork thread depth (f: unit -> unit, g: unit -> unit) : unit =
+ let
+ val rightSideThread = ref (NONE: Thread.t option)
+ val rightSideResult = ref (NONE: unit result option)
+ val incounter = ref 2
+
+ val (tidLeft, tidRight) = DE.decheckFork ()
+
+ fun g' () =
+ let
+ val () = DE.copySyncDepthsFromThread (thread, depth+1)
+ val () = DE.decheckSetTid tidRight
+ val gr = result g
+ val t = Thread.current ()
+ in
+ rightSideThread := SOME t;
+ rightSideResult := SOME gr;
+ if decrementHitsZero incounter then
+ ( setQueueDepth (myWorkerId ()) (depth+1)
+ ; threadSwitch thread
+ )
+ else
+ returnToSched ()
+ end
+ val _ = push (NormalTask g')
+ val _ = HH.setDepth (thread, depth + 1)
+ (* NOTE: off-by-one on purpose. Runtime depths start at 1. *)
+ val _ = recordForkDepth depth
+
+ val _ = DE.decheckSetTid tidLeft
+ val fr = result f
+ val tidLeft = DE.decheckGetTid thread
- fun forkGC thread depth (f : unit -> 'a, g : unit -> 'b) =
+ val gr =
+ if popDiscard () then
+ ( HH.promoteChunks thread
+ ; HH.setDepth (thread, depth)
+ ; DE.decheckJoin (tidLeft, tidRight)
+ ; maybeParClearSuspectsAtDepth (thread, depth)
+ ; if depth <> 1 then () else HH.updateBytesPinnedEntangledWatermark ()
+ (* ; dbgmsg' (fn _ => "join fast at depth " ^ Int.toString depth) *)
+ (* ; HH.forceNewChunk () *)
+ ; let
+ val gr = result g
+ in
+ (* (gr, DE.decheckGetTid thread) *)
+ gr
+ end
+ )
+ else
+ ( clear () (* this should be safe after popDiscard fails? *)
+ ; if decrementHitsZero incounter then () else returnToSched ()
+ ; case HM.refDerefNoBarrier rightSideThread of
+ NONE => die (fn _ => "scheduler bug: join failed")
+ | SOME t =>
+ let
+ val tidRight = DE.decheckGetTid t
+ in
+ HH.mergeThreads (thread, t);
+ HH.promoteChunks thread;
+ HH.setDepth (thread, depth);
+ DE.decheckJoin (tidLeft, tidRight);
+ setQueueDepth (myWorkerId ()) depth;
+ maybeParClearSuspectsAtDepth (thread, depth);
+ if depth <> 1 then () else HH.updateBytesPinnedEntangledWatermark ();
+ (* dbgmsg' (fn _ => "join slow at depth " ^ Int.toString depth); *)
+ case HM.refDerefNoBarrier rightSideResult of
+ NONE => die (fn _ => "scheduler bug: join failed: missing result")
+ | SOME gr => gr
+ end
+ )
+ in
+ (extractResult fr, extractResult gr);
+ ()
+ end
+
+
+ and forkGC thread depth (f : unit -> 'a, g : unit -> 'b) =
let
val heapId = ref (HH.getRoot thread)
val gcTaskTuple = (thread, heapId)
@@ -398,7 +452,8 @@ struct
val _ = HH.setDepth (thread, depth + 1)
val _ = HH.forceLeftHeap(myWorkerId(), thread)
(* val _ = dbgmsg' (fn _ => "fork CC at depth " ^ Int.toString depth) *)
- val result = fork' {ccOkayAtThisDepth=false} (f, g)
+ val res =
+ result (fn () => fork' {ccOkayAtThisDepth=false} (f, g))
val _ =
if popDiscard() then
@@ -416,9 +471,11 @@ struct
val _ = HH.promoteChunks thread
val _ = HH.setDepth (thread, depth)
+ val _ = maybeParClearSuspectsAtDepth (thread, depth)
+ val _ = if depth <> 1 then () else HH.updateBytesPinnedEntangledWatermark ()
(* val _ = dbgmsg' (fn _ => "join CC at depth " ^ Int.toString depth) *)
in
- result
+ extractResult res
end
end
@@ -437,7 +494,55 @@ struct
(f (), g ())
end
- fun fork (f, g) = fork' {ccOkayAtThisDepth=true} (f, g)
+ and fork (f, g) = fork' {ccOkayAtThisDepth=true} (f, g)
+
+ and simpleFork (f, g) =
+ let
+ val thread = Thread.current ()
+ val depth = HH.getDepth thread
+ in
+ (* if ccOkayAtThisDepth andalso depth = 1 then *)
+ if depth < Queue.capacity andalso depthOkayForDECheck depth then
+ simpleParFork thread depth (f, g)
+ else
+ (* don't let us hit an error, just sequentialize instead *)
+ (f (); g ())
+ end
+
+ and maybeParClearSuspectsAtDepth (t, d) =
+ if HH.numSuspectsAtDepth (t, d) <= 10000 then
+ HH.clearSuspectsAtDepth (t, d)
+ else
+ let
+ val cs = HH.takeClearSetAtDepth (t, d)
+ val count = HH.numChunksInClearSet cs
+ val grainSize = 20
+ val numGrains = 1 + (count-1) div grainSize
+ val results = ArrayExtra.alloc numGrains
+ fun start i = i*grainSize
+ fun stop i = Int.min (grainSize + start i, count)
+
+ fun processLoop i j =
+ if j-i = 1 then
+ Array.update (results, i, HH.processClearSetGrain (cs, start i, stop i))
+ else
+ let
+ val mid = i + (j-i) div 2
+ in
+ simpleFork (fn _ => processLoop i mid, fn _ => processLoop mid j)
+ end
+
+ fun commitLoop i =
+ if i >= numGrains then () else
+ ( HH.commitFinishedClearSetGrain (t, Array.sub (results, i))
+ ; commitLoop (i+1)
+ )
+ in
+ processLoop 0 numGrains;
+ commitLoop 0;
+ HH.deleteClearSet cs;
+ maybeParClearSuspectsAtDepth (t, d) (* need to go again, just in case *)
+ end
end
(* ========================================================================
@@ -473,22 +578,25 @@ struct
in if other < myId then other else other+1
end
- fun request idleTimer =
+ fun stealLoop () =
let
- fun loop tries it =
+ fun loop tries =
if tries = P * 100 then
- (OS.Process.sleep (Time.fromNanoseconds (LargeInt.fromInt (P * 100)));
- loop 0 (tickTimer idleTimer))
+ ( IdleTimer.tick ()
+ ; OS.Process.sleep (Time.fromNanoseconds (LargeInt.fromInt (P * 100)))
+ ; loop 0 )
else
let
val friend = randomOtherId ()
in
case trySteal friend of
- NONE => loop (tries+1) (tickTimer idleTimer)
- | SOME (task, depth) => (task, depth, tickTimer idleTimer)
+ NONE => loop (tries+1)
+ | SOME (task, depth) => (task, depth)
end
+
+ val result = loop 0
in
- loop 0 idleTimer
+ result
end
(* ------------------------------------------------------------------- *)
@@ -499,33 +607,44 @@ struct
| SOME (thread, hh) =>
( (*dbgmsg' (fn _ => "back in sched; found GC task")
;*) setGCTask myId NONE
+ ; IdleTimer.stop ()
+ ; WorkTimer.start ()
; HH.collectThreadRoot (thread, !hh)
; if popDiscard () then
- ( (*dbgmsg' (fn _ => "resume task thread")
- ;*) threadSwitch thread
+ ( threadSwitch thread
+ ; WorkTimer.stop ()
+ ; IdleTimer.start ()
; afterReturnToSched ()
)
else
- ()
+ ( WorkTimer.stop ()
+ ; IdleTimer.start ()
+ )
)
fun acquireWork () : unit =
let
- val idleTimer = startTimer myId
- val (task, depth, idleTimer') = request idleTimer
- val _ = stopTimer idleTimer'
+ val (task, depth) = stealLoop ()
in
case task of
GCTask (thread, hh) =>
- ( HH.collectThreadRoot (thread, !hh)
+ ( IdleTimer.stop ()
+ ; WorkTimer.start ()
+ ; HH.collectThreadRoot (thread, !hh)
+ ; WorkTimer.stop ()
+ ; IdleTimer.start ()
; acquireWork ()
)
| Continuation (thread, depth) =>
( (*dbgmsg' (fn _ => "stole continuation (" ^ Int.toString depth ^ ")")
; dbgmsg' (fn _ => "resume task thread")
;*) Queue.setDepth myQueue depth
+ ; IdleTimer.stop ()
+ ; WorkTimer.start ()
; threadSwitch thread
+ ; WorkTimer.stop ()
+ ; IdleTimer.start ()
; afterReturnToSched ()
; Queue.setDepth myQueue 1
; acquireWork ()
@@ -541,7 +660,11 @@ struct
HH.setDepth (taskThread, depth+1);
setTaskBox myId t;
(* dbgmsg' (fn _ => "switch to new task thread"); *)
+ IdleTimer.stop ();
+ WorkTimer.start ();
threadSwitch taskThread;
+ WorkTimer.stop ();
+ IdleTimer.start ();
afterReturnToSched ();
Queue.setDepth myQueue 1;
acquireWork ()
@@ -560,6 +683,7 @@ struct
let
val (_, acquireWork) = setupSchedLoop ()
in
+ IdleTimer.start ();
acquireWork ();
die (fn _ => "scheduler bug: scheduler exited acquire-work loop")
end
@@ -570,7 +694,7 @@ struct
if HH.getDepth originalThread = 0 then ()
else die (fn _ => "scheduler bug: root depth <> 0")
val _ = HH.setDepth (originalThread, 1)
- val _ = HH.forceLeftHeap (myWorkerId (), originalThread)
+ val _ = HH.forceLeftHeap (myWorkerId(), originalThread)
(* implicitly attaches worker child heaps *)
val _ = MLton.Parallel.initializeProcessors ()
@@ -596,7 +720,10 @@ struct
let
val (afterReturnToSched, acquireWork) = setupSchedLoop ()
in
+ WorkTimer.start ();
threadSwitch originalThread;
+ WorkTimer.stop ();
+ IdleTimer.start ();
afterReturnToSched ();
setQueueDepth (myWorkerId ()) 1;
acquireWork ();
@@ -635,5 +762,7 @@ struct
ArrayExtra.Raw.unsafeToArray a
end
+ val idleTimeSoFar = Scheduler.IdleTimer.cumulative
+ val workTimeSoFar = Scheduler.WorkTimer.cumulative
val maxForkDepthSoFar = Scheduler.maxForkDepthSoFar
-end
+end
\ No newline at end of file
diff --git a/basis-library/schedulers/shh/sources.mlb b/basis-library/schedulers/shh/sources.mlb
index 100700dcd..8fb08583e 100644
--- a/basis-library/schedulers/shh/sources.mlb
+++ b/basis-library/schedulers/shh/sources.mlb
@@ -22,6 +22,8 @@ local
FORK_JOIN.sig
SimpleRandom.sml
queue/DequeABP.sml
+ DummyTimer.sml
+ CumulativePerProcTimer.sml
Scheduler.sml
in
structure ForkJoin
diff --git a/basis-library/util/one.sml b/basis-library/util/one.sml
index a4b11da1b..f04b82dba 100644
--- a/basis-library/util/one.sml
+++ b/basis-library/util/one.sml
@@ -1,5 +1,6 @@
(* Copyright (C) 2006-2006 Henry Cejtin, Matthew Fluet, Suresh
* Jagannathan, and Stephen Weeks.
+ * Copyright (C) 2023 Sam Westrick.
*
* MLton is released under a HPND-style license.
* See the file MLton-LICENSE for details.
@@ -13,28 +14,44 @@ structure One:
val use: 'a t * ('a -> 'b) -> 'b
end =
struct
+
+ (* SAM_NOTE: using Word8 instead of bool here to work around the
+ * compilation bug with primitive compareAndSwap... (The compilation
+ * passes splitTypes1 and splitTypes2 cause a crash when compareAndSwap
+ * is used on certain data types, including bool.)
+ *
+ * Here I use 0w0 for false, and 0w1 for true
+ *
+ * When we fix compilation for compareAndSwap, we can switch back
+ * to using bool.
+ *)
+
datatype 'a t = T of {more: unit -> 'a,
static: 'a,
- staticIsInUse: bool ref}
+ staticIsInUse: Primitive.Word8.word ref}
fun make f = T {more = f,
static = f (),
- staticIsInUse = ref false}
+ staticIsInUse = ref 0w0}
+
+ val cas = Primitive.MLton.Parallel.compareAndSwap
fun use (T {more, static, staticIsInUse}, f) =
let
val () = Primitive.MLton.Thread.atomicBegin ()
- val b = ! staticIsInUse
+ val claimed =
+ (!staticIsInUse) = 0w0
+ andalso
+ 0w0 = cas (staticIsInUse, 0w0, 0w1)
val d =
- if b then
+ if not claimed then
(Primitive.MLton.Thread.atomicEnd ();
more ())
else
- (staticIsInUse := true;
- Primitive.MLton.Thread.atomicEnd ();
+ (Primitive.MLton.Thread.atomicEnd ();
static)
in
DynamicWind.wind (fn () => f d,
- fn () => if b then () else staticIsInUse := false)
+ fn () => if claimed then staticIsInUse := 0w0 else ())
end
end
diff --git a/include/c-chunk.h b/include/c-chunk.h
index 9f7df8cec..27641c1b8 100644
--- a/include/c-chunk.h
+++ b/include/c-chunk.h
@@ -56,7 +56,7 @@
extern void Assignable_writeBarrier(CPointer, Objptr, Objptr*, Objptr);
extern Objptr Assignable_readBarrier(CPointer, Objptr, Objptr*);
-extern void Assignable_decheckObjptr(Objptr);
+extern Objptr Assignable_decheckObjptr(Objptr, Objptr);
static inline
Real64 ArrayR64_cas(Real64* a, Word64 i, Real64 x, Real64 y) {
@@ -65,6 +65,13 @@ Real64 ArrayR64_cas(Real64* a, Word64 i, Real64 x, Real64 y) {
return *((Real64*)&result);
}
+static inline
+Real32 ArrayR32_cas(Real32* a, Word64 i, Real32 x, Real32 y) {
+ Word32 result =
+ __sync_val_compare_and_swap(((Word32*)a) + i, *((Word32*)&x), *((Word32*)&y));
+ return *((Real32*)&result);
+}
+
#define RefW8_cas(r, x, y) __sync_val_compare_and_swap((Word8*)(r), (x), (y))
#define RefW16_cas(r, x, y) __sync_val_compare_and_swap((Word16*)(r), (x), (y))
#define RefW32_cas(r, x, y) __sync_val_compare_and_swap((Word32*)(r), (x), (y))
@@ -78,9 +85,8 @@ Real64 ArrayR64_cas(Real64* a, Word64 i, Real64 x, Real64 y) {
static inline
Objptr RefP_cas(Objptr* r, Objptr x, Objptr y) {
- Objptr result = __sync_val_compare_and_swap(r, x, y);
- Assignable_decheckObjptr(result);
- return result;
+ Objptr res = __sync_val_compare_and_swap(r, x, y);
+ return Assignable_decheckObjptr(r, res);
}
#define ArrayW8_cas(a, i, x, y) __sync_val_compare_and_swap(((Word8*)(a)) + (i), (x), (y))
@@ -88,7 +94,7 @@ Objptr RefP_cas(Objptr* r, Objptr x, Objptr y) {
#define ArrayW32_cas(a, i, x, y) __sync_val_compare_and_swap(((Word32*)(a)) + (i), (x), (y))
#define ArrayW64_cas(a, i, x, y) __sync_val_compare_and_swap(((Word64*)(a)) + (i), (x), (y))
-#define ArrayR32_cas(a, i, x, y) __sync_val_compare_and_swap(((Real32*)(a)) + (i), (x), (y))
+// #define ArrayR32_cas(a, i, x, y) __sync_val_compare_and_swap(((Real32*)(a)) + (i), (x), (y))
// #define ArrayR64_cas(a, i, x, y) __sync_val_compare_and_swap(((Real64*)(a)) + (i), (x), (y))
// #define ArrayP_cas(a, i, x, y) __sync_val_compare_and_swap(((Objptr*)(a)) + (i), (x), (y))
@@ -96,9 +102,8 @@ Objptr RefP_cas(Objptr* r, Objptr x, Objptr y) {
static inline
Objptr ArrayP_cas(Objptr* a, Word64 i, Objptr x, Objptr y) {
- Objptr result = __sync_val_compare_and_swap(a + i, x, y);
- Assignable_decheckObjptr(result);
- return result;
+ Objptr res = __sync_val_compare_and_swap(a + i, x, y);
+ return Assignable_decheckObjptr(a, res);
}
static inline void GC_writeBarrier(CPointer s, Objptr obj, CPointer dst, Objptr src) {
diff --git a/mlton/ssa/simplify.fun b/mlton/ssa/simplify.fun
index 4646565b3..7ba3d4710 100644
--- a/mlton/ssa/simplify.fun
+++ b/mlton/ssa/simplify.fun
@@ -55,7 +55,10 @@ val ssaPassesDefault =
{name = "localFlatten1", doit = LocalFlatten.transform, execute = true} ::
{name = "constantPropagation", doit = ConstantPropagation.transform, execute = true} ::
{name = "duplicateGlobals1", doit = DuplicateGlobals.transform, execute = false} ::
- {name = "splitTypes1", doit = SplitTypes.transform, execute = true} ::
+ (* SAM_NOTE: disabling splitTypes1 because it does not yet support primitive
+ * polymorphic CAS. We should update the pass and then re-enable.
+ *)
+ {name = "splitTypes1", doit = SplitTypes.transform, execute = false} ::
(* useless should run
* - after constant propagation because constant propagation makes
* slots of tuples that are constant useless
@@ -67,7 +70,10 @@ val ssaPassesDefault =
{name = "loopUnroll1", doit = LoopUnroll.transform, execute = false} ::
{name = "removeUnused2", doit = RemoveUnused.transform, execute = true} ::
{name = "duplicateGlobals2", doit = DuplicateGlobals.transform, execute = true} ::
- {name = "splitTypes2", doit = SplitTypes.transform, execute = true} ::
+ (* SAM_NOTE: disabling splitTypes2 because it does not yet support primitive
+ * polymorphic CAS. We should update the pass and then re-enable.
+ *)
+ {name = "splitTypes2", doit = SplitTypes.transform, execute = false} ::
{name = "simplifyTypes", doit = SimplifyTypes.transform, execute = true} ::
(* polyEqual should run
* - after types are simplified so that many equals are turned into eqs
diff --git a/runtime/gc.c b/runtime/gc.c
index e4508c1bd..df3fe1100 100644
--- a/runtime/gc.c
+++ b/runtime/gc.c
@@ -75,6 +75,8 @@ extern C_Pthread_Key_t gcstate_key;
#include "gc/heap.c"
#include "gc/hierarchical-heap.c"
#include "gc/hierarchical-heap-collection.c"
+#include "gc/ebr.c"
+#include "gc/entangled-ebr.c"
#include "gc/hierarchical-heap-ebr.c"
#include "gc/init-world.c"
#include "gc/init.c"
@@ -92,8 +94,10 @@ extern C_Pthread_Key_t gcstate_key;
#include "gc/pin.c"
#include "gc/pointer.c"
#include "gc/profiling.c"
+#include "gc/concurrent-list.c"
#include "gc/remembered-set.c"
#include "gc/rusage.c"
+#include "gc/sampler.c"
#include "gc/sequence-allocate.c"
#include "gc/sequence.c"
#include "gc/share.c"
diff --git a/runtime/gc.h b/runtime/gc.h
index a5ca9aa5c..47a23a8a5 100644
--- a/runtime/gc.h
+++ b/runtime/gc.h
@@ -28,6 +28,7 @@ typedef GC_state GCState_t;
#include "gc/debug.h"
#include "gc/logger.h"
+#include "gc/sampler.h"
#include "gc/block-allocator.h"
#include "gc/tls-objects.h"
@@ -84,12 +85,15 @@ typedef GC_state GCState_t;
#include "gc/processor.h"
#include "gc/pin.h"
#include "gc/hierarchical-heap.h"
+#include "gc/ebr.h"
#include "gc/hierarchical-heap-ebr.h"
+#include "gc/entangled-ebr.h"
#include "gc/hierarchical-heap-collection.h"
#include "gc/entanglement-suspects.h"
#include "gc/local-scope.h"
#include "gc/local-heap.h"
#include "gc/assign.h"
+#include "gc/concurrent-list.h"
#include "gc/remembered-set.h"
#include "gc/gap.h"
// #include "gc/deferred-promote.h"
diff --git a/runtime/gc/assign.c b/runtime/gc/assign.c
index 0a118ba26..1d5560a59 100644
--- a/runtime/gc/assign.c
+++ b/runtime/gc/assign.c
@@ -6,59 +6,95 @@
* MLton is released under a HPND-style license.
* See the file MLton-LICENSE for details.
*/
-void Assignable_decheckObjptr(objptr op)
+#ifdef DETECT_ENTANGLEMENT
+
+objptr Assignable_decheckObjptr(objptr dst, objptr src)
{
GC_state s = pthread_getspecific(gcstate_key);
s->cumulativeStatistics->numDisentanglementChecks++;
- decheckRead(s, op);
+ objptr new_src = src;
+ pointer dstp = objptrToPointer(dst, NULL);
+ HM_HierarchicalHeap dstHH = HM_getLevelHead(HM_getChunkOf(dstp));
+
+ if (!isObjptr(src) || HM_HH_getDepth(dstHH) == 0 || !ES_contains(NULL, dst))
+ {
+ return src;
+ }
+
+ // HM_EBR_leaveQuiescentState(s);
+ if (!decheck(s, src))
+ {
+ assert (isMutable(s, dstp));
+ s->cumulativeStatistics->numEntanglements++;
+ new_src = manage_entangled(s, src, getThreadCurrent(s)->decheckState);
+ assert (isPinned(new_src));
+ }
+ // HM_EBR_enterQuiescentState(s);
+ assert (!hasFwdPtr(objptrToPointer(new_src, NULL)));
+ return new_src;
}
objptr Assignable_readBarrier(
GC_state s,
- ARG_USED_FOR_ASSERT objptr obj,
+ objptr obj,
objptr *field)
{
+// can't rely on obj header becaues it may be forwarded.
-#if ASSERT
- assert(isObjptr(obj));
- // check that field is actually inside this object
+ s->cumulativeStatistics->numDisentanglementChecks++;
+ objptr ptr = *field;
pointer objp = objptrToPointer(obj, NULL);
- GC_header header = getHeader(objp);
- GC_objectTypeTag tag;
- uint16_t bytesNonObjptrs;
- uint16_t numObjptrs;
- bool hasIdentity;
- splitHeader(s, header, &tag, &hasIdentity, &bytesNonObjptrs, &numObjptrs);
- pointer objend = objp;
- if (!hasIdentity) {
- DIE("read barrier: attempting to read immutable object "FMTOBJPTR, obj);
- }
- if (NORMAL_TAG == tag) {
- objend += bytesNonObjptrs + (numObjptrs * OBJPTR_SIZE);
- }
- else if (SEQUENCE_TAG == tag) {
- size_t dataBytes = getSequenceLength(objp) * (bytesNonObjptrs + (numObjptrs * OBJPTR_SIZE));
- objend += alignWithExtra (s, dataBytes, GC_SEQUENCE_METADATA_SIZE);
+ HM_HierarchicalHeap objHH = HM_getLevelHead(HM_getChunkOf(objp));
+ if (!isObjptr(ptr) || HM_HH_getDepth(objHH) == 0 || !ES_contains(NULL, obj))
+ {
+ return ptr;
}
- else {
- DIE("read barrier: cannot handle tag %u", tag);
+ // HM_EBR_leaveQuiescentState(s);
+ if (!decheck(s, ptr))
+ {
+ assert (ES_contains(NULL, obj));
+ // assert (isMutable(s, obj));
+ // if (!ES_contains(NULL, obj))
+ // {
+ // if (!decheck(s, obj)) {
+ // assert (false);
+ // }
+ // assert(isPinned(ptr));
+ // assert(!hasFwdPtr(ptr));
+ // assert(pinType(getHeader(ptr)) == PIN_ANY);
+ // }
+ s->cumulativeStatistics->numEntanglements++;
+ ptr = manage_entangled(s, ptr, getThreadCurrent(s)->decheckState);
}
- pointer fieldp = (pointer)field;
- ASSERTPRINT(
- objp <= fieldp && fieldp + OBJPTR_SIZE <= objend,
- "read barrier: objptr field %p outside object "FMTOBJPTR" of size %zu",
- (void*)field,
- obj,
- (size_t)(objend - objp));
-#endif
- assert(ES_contains(NULL, obj));
- s->cumulativeStatistics->numDisentanglementChecks++;
- objptr ptr = *field;
- decheckRead(s, ptr);
+ // HM_EBR_enterQuiescentState(s);
+ assert (!hasFwdPtr(objptrToPointer(ptr, NULL)));
return ptr;
}
+#else
+
+objptr Assignable_decheckObjptr(objptr dst, objptr src) {
+ (void) dst;
+ return src;
+}
+
+objptr Assignable_readBarrier(
+ GC_state s,
+ objptr obj,
+ objptr *field) {
+ (void)s;
+ (void)obj;
+ return *field;
+}
+#endif
+
+static inline bool decheck_opt_fast (GC_state s, pointer p) {
+ HM_HierarchicalHeap hh = HM_getLevelHead(HM_getChunkOf(p));
+ return (hh->depth <= 1) || hh == getThreadCurrent(s)->hierarchicalHeap;
+}
+
+
void Assignable_writeBarrier(
GC_state s,
objptr dst,
@@ -67,39 +103,43 @@ void Assignable_writeBarrier(
{
assert(isObjptr(dst));
pointer dstp = objptrToPointer(dst, NULL);
+ pointer srcp = objptrToPointer(src, NULL);
-#if ASSERT
- // check that field is actually inside this object
- GC_header header = getHeader(dstp);
- GC_objectTypeTag tag;
- uint16_t bytesNonObjptrs;
- uint16_t numObjptrs;
- bool hasIdentity;
- splitHeader(s, header, &tag, &hasIdentity, &bytesNonObjptrs, &numObjptrs);
- pointer objend = dstp;
- if (!hasIdentity) {
- DIE("write barrier: attempting to modify immutable object "FMTOBJPTR, dst);
- }
- if (NORMAL_TAG == tag) {
- objend += bytesNonObjptrs + (numObjptrs * OBJPTR_SIZE);
- }
- else if (SEQUENCE_TAG == tag) {
- size_t dataBytes = getSequenceLength(dstp) * (bytesNonObjptrs + (numObjptrs * OBJPTR_SIZE));
- objend += alignWithExtra (s, dataBytes, GC_SEQUENCE_METADATA_SIZE);
- }
- else {
- DIE("write barrier: cannot handle tag %u", tag);
- }
- pointer fieldp = (pointer)field;
- ASSERTPRINT(
- dstp <= fieldp && fieldp + OBJPTR_SIZE <= objend,
- "write barrier: objptr field %p outside object "FMTOBJPTR" of size %zu",
- (void*)field,
- dst,
- (size_t)(objend - dstp));
-#endif
-
- HM_HierarchicalHeap dstHH = HM_getLevelHeadPathCompress(HM_getChunkOf(dstp));
+ assert (!hasFwdPtr(dstp));
+ assert (!isObjptr(src) || !hasFwdPtr(srcp));
+
+// #if ASSERT
+// // check that field is actually inside this object
+// GC_header header = getHeader(dstp);
+// GC_objectTypeTag tag;
+// uint16_t bytesNonObjptrs;
+// uint16_t numObjptrs;
+// bool hasIdentity;
+// splitHeader(s, header, &tag, &hasIdentity, &bytesNonObjptrs, &numObjptrs);
+// pointer objend = dstp;
+// if (!hasIdentity) {
+// DIE("write barrier: attempting to modify immutable object "FMTOBJPTR, dst);
+// }
+// if (NORMAL_TAG == tag) {
+// objend += bytesNonObjptrs + (numObjptrs * OBJPTR_SIZE);
+// }
+// else if (SEQUENCE_TAG == tag) {
+// size_t dataBytes = getSequenceLength(dstp) * (bytesNonObjptrs + (numObjptrs * OBJPTR_SIZE));
+// objend += alignWithExtra (s, dataBytes, GC_SEQUENCE_METADATA_SIZE);
+// }
+// else {
+// DIE("write barrier: cannot handle tag %u", tag);
+// }
+// pointer fieldp = (pointer)field;
+// ASSERTPRINT(
+// dstp <= fieldp && fieldp + OBJPTR_SIZE <= objend,
+// "write barrier: objptr field %p outside object "FMTOBJPTR" of size %zu",
+// (void*)field,
+// dst,
+// (size_t)(objend - dstp));
+// #endif
+
+ HM_HierarchicalHeap dstHH = HM_getLevelHead(HM_getChunkOf(dstp));
objptr readVal = *field;
if (dstHH->depth >= 1 && isObjptr(readVal) && s->wsQueueTop!=BOGUS_OBJPTR) {
@@ -120,13 +160,126 @@ void Assignable_writeBarrier(
}
/* deque down-pointers are handled separately during collection. */
- if (dst == s->wsQueue)
+ if (dst == s->wsQueue) {
return;
+ }
- struct HM_remembered remElem_ = {.object = src, .from = dst};
- HM_remembered remElem = &remElem_;
+ HM_HierarchicalHeap srcHH = HM_getLevelHead(HM_getChunkOf(srcp));
+ if (srcHH == dstHH) {
+ /* internal pointers are always traced */
+ return;
+ }
+
+ uint32_t dd = dstHH->depth;
+ bool src_de = decheck_opt_fast(s, srcp) || decheck(s, src);
+ if (src_de) {
+ bool dst_de = decheck_opt_fast(s, dstp) || decheck(s, dst);
+ if (dst_de) {
+ uint32_t sd = srcHH->depth;
+ /* up pointer (snapshotted by the closure)
+ * or internal (within a chain) pointer to a snapshotted heap
+ */
+ if(dd > sd ||
+ ((HM_HH_getConcurrentPack(srcHH)->ccstate != CC_UNREG) &&
+ dd == sd))
+ {
+ return;
+ }
+
+ uint32_t unpinDepth = dd;
+ bool success = pinObject(s, src, unpinDepth, PIN_DOWN);
+
+ if (success || dd == unpinDepthOf(src))
+ {
+ struct HM_remembered remElem_ = {.object = src, .from = dst};
+ HM_remembered remElem = &remElem_;
+
+ HM_HierarchicalHeap shh = HM_HH_getHeapAtDepth(s, getThreadCurrent(s), sd);
+ assert(NULL != shh);
+ assert(HM_HH_getConcurrentPack(shh)->ccstate == CC_UNREG);
+
+ HM_HH_rememberAtLevel(shh, remElem, false);
+ LOG(LM_HH_PROMOTION, LL_INFO,
+ "remembered downptr %" PRIu32 "->%" PRIu32 " from " FMTOBJPTR " to " FMTOBJPTR,
+ dstHH->depth, srcHH->depth,
+ dst, src);
+ }
+
+ if (dd > 0 && !ES_contains(NULL, dst)) {
+ /*if dst is not a suspect, it must be disentangled*/
+ // if (!dst_de) {
+ // printf("problematix: %p \n", dst);
+ // DIE("done");
+ // }
+ // assert (dst_de);
+ HM_HierarchicalHeap dhh = HM_HH_getHeapAtDepth(s, getThreadCurrent(s), dd);
+ ES_add(s, HM_HH_getSuspects(dhh), dst);
+ }
+ }
+ else if(dstHH->depth != 0) {
+ // traverseAndCheck(s, &dst, dst, NULL);
+
+ // SAM_NOTE: TODO: do we count this one??
+ s->cumulativeStatistics->numEntanglements++;
+ manage_entangled (s, src, HM_getChunkOf(dstp)->decheckState);
+ }
+
+
+ // if (!dst_de) {
+ // assert (ES_contains(NULL, dst));
+ // }
+
+ /* Depth comparisons make sense only when src && dst are on the same root-to-leaf path,
+ * checking this maybe expensive, so we approximate here.
+ * If both dst_de && src_de hold, they are on the same path
+ * Otherwise, we assume they are on different paths.
+ */
+
+
+
+
+ // /* otherwise pin*/
+ // // bool primary_down_ptr = dst_de && dd < sd && (HM_HH_getConcurrentPack(dstHH)->ccstate == CC_UNREG);
+ // = dst_de ? dd :
+ // (uint32_t)lcaHeapDepth(HM_getChunkOf(srcp)->decheckState,
+ // HM_getChunkOf(dstp)->decheckState);
+ // enum PinType pt = dst_de ? PIN_DOWN : PIN_ANY;
+
+ // bool success = pinTemp(s, src, unpinDepth, pt);
+ // if (success || (dst_de && dd == unpinDepthOf (src))) {
+ // objptr fromObj = pt == PIN_DOWN ? dst : BOGUS_OBJPTR;
+ // struct HM_remembered remElem_ = {.object = src, .from = fromObj};
+ // HM_remembered remElem = &remElem_;
+
+ // HM_HierarchicalHeap shh = HM_HH_getHeapAtDepth(s, getThreadCurrent(s), sd);
+ // assert(NULL != shh);
+ // assert(HM_HH_getConcurrentPack(shh)->ccstate == CC_UNREG);
+
+ // HM_HH_rememberAtLevel(shh, remElem, false);
+ // LOG(LM_HH_PROMOTION, LL_INFO,
+ // "remembered downptr %"PRIu32"->%"PRIu32" from "FMTOBJPTR" to "FMTOBJPTR,
+ // dstHH->depth, srcHH->depth,
+ // dst, src);
+ // }
+
+ // /*add dst to the suspect set*/
+ // if (dd > 0 && dst_de && !ES_contains(NULL, dst)) {
+ // /*if dst is not a suspect, it must be disentangled*/
+ // // if (!dst_de) {
+ // // printf("problematix: %p \n", dst);
+ // // DIE("done");
+ // // }
+ // // assert (dst_de);
+ // HM_HierarchicalHeap dhh = HM_HH_getHeapAtDepth(s, getThreadCurrent(s), dd);
+ // ES_add(s, HM_HH_getSuspects(dhh), dst);
+ // }
+ } else {
+ // assert (isPinned(src));
+ // assert (!hasFwdPtr(srcp));
+ // assert (pinType(getHeader(srcp)) == PIN_ANY);
+ traverseAndCheck(s, &src, src, NULL);
+ }
- pointer srcp = objptrToPointer(src, NULL);
#if 0
/** This is disabled for now. In the future we will come back to
@@ -157,36 +310,35 @@ void Assignable_writeBarrier(
if (pinObject(src, (uint32_t)unpinDepth)) {
/** Just remember it at some arbitrary place... */
- HM_rememberAtLevel(getThreadCurrent(s)->hierarchicalHeap, remElem);
+ HM_HH_rememberAtLevel(getThreadCurrent(s)->hierarchicalHeap, remElem);
}
return;
}
#endif
- HM_HierarchicalHeap srcHH = HM_getLevelHeadPathCompress(HM_getChunkOf(srcp));
/* Up-pointer. */
- if (dstHH->depth > srcHH->depth)
- return;
+ // if (dstHH->depth > srcHH->depth)
+ // return;
/* Internal pointer. It's safe to ignore an internal pointer if:
* 1. it's contained entirely within one subheap, or
* 2. the pointed-to object (src) lives in an already snapshotted subregion
*/
- if ( (dstHH == srcHH) ||
- (dstHH->depth == srcHH->depth &&
- HM_HH_getConcurrentPack(srcHH)->ccstate != CC_UNREG) ) {
- // assert(...);
- // if (dstHH != srcHH) {
- // printf(
- // "ignore internal pointer "FMTPTR" --> "FMTPTR". dstHH == srcHH? %d\n",
- // (uintptr_t)dstp,
- // (uintptr_t)srcp,
- // srcHH == dstHH);
- // }
- return;
- }
+ // if ( (dstHH == srcHH) ||
+ // (dstHH->depth == srcHH->depth &&
+ // HM_HH_getConcurrentPack(srcHH)->ccstate != CC_UNREG) ) {
+ // // assert(...);
+ // // if (dstHH != srcHH) {
+ // // printf(
+ // // "ignore internal pointer "FMTPTR" --> "FMTPTR". dstHH == srcHH? %d\n",
+ // // (uintptr_t)dstp,
+ // // (uintptr_t)srcp,
+ // // srcHH == dstHH);
+ // // }
+ // return;
+ // }
/** Otherwise, its a down-pointer, so
* (i) make dst a suspect for entanglement, i.e., mark the suspect bit of dst's header
* (see pin.h for header-layout).
@@ -197,42 +349,46 @@ void Assignable_writeBarrier(
*/
/* make dst a suspect for entanglement */
- uint32_t dd = dstHH->depth;
- GC_thread thread = getThreadCurrent(s);
- if (dd > 0 && !ES_contains(NULL, dst)) {
- HM_HierarchicalHeap dhh = HM_HH_getHeapAtDepth(s, thread, dd);
- ES_add(s, HM_HH_getSuspects(dhh), dst);
- }
+ // uint32_t dd = dstHH->depth;
+ // if (dd > 0 && !ES_contains(NULL, dst)) {
+ // HM_HierarchicalHeap dhh = HM_HH_getHeapAtDepth(s, thread, dd);
+ // ES_add(s, HM_HH_getSuspects(dhh), dst);
+ // }
+
+ // if (decheck(s, src)) {
+ // uint32_t sd = srcHH->depth;
+ // bool dst_de = decheck(s, dst);
+ // assert (dd <= sd);
+ // bool true_down_ptr = dd < sd && (HM_HH_getConcurrentPack(dstHH)->ccstate == CC_UNREG) && dst_de;
+ // // bool unpinDepth = dst_de ? dd : lcaDepth(srcHH->tid, dstHH->tid);
+ // /* treat a pointer from a chained heap as a cross pointer */
+ // bool success = pinObject(src, dd, true_down_ptr ? PIN_DOWN : PIN_ANY);
+ // if (success)
+ // {
+ // HM_HierarchicalHeap shh = HM_HH_getHeapAtDepth(s, thread, sd);
+ // assert(NULL != shh);
+ // assert(HM_HH_getConcurrentPack(shh)->ccstate == CC_UNREG);
+ // HM_HH_rememberAtLevel(shh, remElem, false);
+
+ // LOG(LM_HH_PROMOTION, LL_INFO,
+ // "remembered downptr %"PRIu32"->%"PRIu32" from "FMTOBJPTR" to "FMTOBJPTR,
+ // dstHH->depth, srcHH->depth,
+ // dst, src);
+ // }
+ // if (!dst_de)
+ // {
+ // DIE("HAVE TO HANDLE ENTANGLED WRITES SEPARATELY");
+ // // HM_HierarchicalHeap lcaHeap = HM_HH_getHeapAtDepth(s, thread, unpinDepth);
+ // // ES_add(s, HM_HH_getSuspects(lcaHeap), ptr);
+ // }
+ // }
+
+
+ // // any concurrent pin can only decrease unpinDepth
+ // assert(unpinDepth <= dd);
+
+ // bool maybe_across_chain = write_de && (dd == unpinDepth) && (dd == sd);
- bool success = pinObject(src, dd);
-
- // any concurrent pin can only decrease unpinDepth
- uint32_t unpinDepth = unpinDepthOf(src);
- assert(unpinDepth <= dd);
-
- if (success || dd == unpinDepth)
- {
- uint32_t sd = srcHH->depth;
-#if 0
- /** Fix a silly issue where, when we are dealing with entanglement, the
- * lower object is actually deeper than the current thread (which is
- * possible because of entanglement! the thread is peeking inside of
- * some other thread's heaps, and the other thread might be deeper).
- */
- if (d > thread->currentDepth && s->controls->manageEntanglement)
- d = thread->currentDepth;
-#endif
-
- HM_HierarchicalHeap shh = HM_HH_getHeapAtDepth(s, thread, sd);
- assert(NULL != shh);
- assert(HM_HH_getConcurrentPack(shh)->ccstate == CC_UNREG);
- HM_rememberAtLevel(shh, remElem);
-
- LOG(LM_HH_PROMOTION, LL_INFO,
- "remembered downptr %"PRIu32"->%"PRIu32" from "FMTOBJPTR" to "FMTOBJPTR,
- dstHH->depth, srcHH->depth,
- dst, src);
- }
/* SAM_NOTE: TODO: track bytes allocated here in
* thread->bytesAllocatedSinceLast...? */
@@ -242,8 +398,9 @@ void Assignable_writeBarrier(
// Assignable_writeBarrier(s, dst, field, src, false);
// // *field = src;
// }
-// void Assignable_casBarrier (GC_state s, objptr dst, objptr* field, objptr src) {
-// Assignable_writeBarrier(s, dst, field, src, true);
+// void Assignable_casBarrier (objptr dst, objptr* field, objptr src) {
+// GC_state s = pthread_getspecific(gcstate_key);
+// Assignable_writeBarrier(s, dst, field, src);
// // cas(field, (*field), dst); //return?
// }
diff --git a/runtime/gc/assign.h b/runtime/gc/assign.h
index 7807e10a3..73fc5805a 100644
--- a/runtime/gc/assign.h
+++ b/runtime/gc/assign.h
@@ -22,7 +22,7 @@ PRIVATE objptr Assignable_readBarrier(
GC_state s, objptr dst, objptr* field
);
-PRIVATE void Assignable_decheckObjptr(objptr op);
+PRIVATE objptr Assignable_decheckObjptr(objptr dst, objptr src);
#endif /* MLTON_GC_INTERNAL_BASIS */
diff --git a/runtime/gc/block-allocator.c b/runtime/gc/block-allocator.c
index 20821cdce..abc97d9a4 100644
--- a/runtime/gc/block-allocator.c
+++ b/runtime/gc/block-allocator.c
@@ -134,9 +134,13 @@ static void initBlockAllocator(GC_state s, BlockAllocator ball) {
ball->completelyEmptyGroup.firstSuperBlock = NULL;
- ball->numBlocks = 0;
- ball->numBlocksInUse = 0;
ball->firstFreedByOther = NULL;
+ ball->numBlocksMapped = 0;
+ ball->numBlocksReleased = 0;
+ for (enum BlockPurpose p = 0; p < NUM_BLOCK_PURPOSES; p++) {
+ ball->numBlocksAllocated[p] = 0;
+ ball->numBlocksFreed[p] = 0;
+ }
for (size_t i = 0; i < numMegaBlockSizeClasses; i++) {
ball->megaBlockSizeClass[i].firstMegaBlock = NULL;
@@ -239,14 +243,15 @@ static void mmapNewSuperBlocks(
prependSuperBlock(getFullnessGroup(s, ball, 0, COMPLETELY_EMPTY), sb);
}
- ball->numBlocks += count*(SUPERBLOCK_SIZE(s));
+ ball->numBlocksMapped += count*(SUPERBLOCK_SIZE(s));
}
static Blocks allocateInSuperBlock(
GC_state s,
SuperBlock sb,
- int sizeClass)
+ int sizeClass,
+ enum BlockPurpose purpose)
{
if ((size_t)sb->numBlocksFree == SUPERBLOCK_SIZE(s)) {
// It's completely empty! We can reuse.
@@ -262,10 +267,11 @@ static Blocks allocateInSuperBlock(
sb->numBlocksFree -= (1 << sb->sizeClass);
assert(sb->owner != NULL);
- sb->owner->numBlocksInUse += (1 << sb->sizeClass);
+ sb->owner->numBlocksAllocated[purpose] += (1 << sb->sizeClass);
result->container = sb;
result->numBlocks = 1 << sb->sizeClass;
+ result->purpose = purpose;
return result;
}
@@ -276,11 +282,12 @@ static Blocks allocateInSuperBlock(
sb->numBlocksFree -= (1 << sb->sizeClass);
assert(sb->owner != NULL);
- sb->owner->numBlocksInUse += (1 << sb->sizeClass);
+ sb->owner->numBlocksAllocated[purpose] += (1 << sb->sizeClass);
Blocks bs = (Blocks)result;
bs->container = sb;
bs->numBlocks = (1 << sb->sizeClass);
+ bs->purpose = purpose;
return bs;
}
@@ -300,16 +307,19 @@ static void deallocateInSuperBlock(
sb->firstFree = block;
sb->numBlocksFree += (1 << sb->sizeClass);
+ enum BlockPurpose purpose = block->purpose;
+ assert( purpose < NUM_BLOCK_PURPOSES );
+
assert(sb->owner != NULL);
- assert(sb->owner->numBlocksInUse >= ((size_t)1 << sb->sizeClass));
- sb->owner->numBlocksInUse -= (1 << sb->sizeClass);
+ sb->owner->numBlocksFreed[purpose] += (1 << sb->sizeClass);
}
static Blocks tryAllocateAndAdjustSuperBlocks(
GC_state s,
BlockAllocator ball,
- int class)
+ int class,
+ enum BlockPurpose purpose)
{
SuperBlockList targetList = NULL;
@@ -337,7 +347,7 @@ static Blocks tryAllocateAndAdjustSuperBlocks(
SuperBlock sb = targetList->firstSuperBlock;
assert( sb != NULL );
enum FullnessGroup fg = fullness(s, sb);
- Blocks result = allocateInSuperBlock(s, sb, class);
+ Blocks result = allocateInSuperBlock(s, sb, class, purpose);
enum FullnessGroup newfg = fullness(s, sb);
if (fg != newfg) {
@@ -389,7 +399,7 @@ static void clearOutOtherFrees(GC_state s) {
}
if (numFreed > 400) {
- LOG(LM_CHUNK_POOL, LL_INFO,
+ LOG(LM_CHUNK_POOL, LL_DEBUG,
"number of freed blocks: %zu",
numFreed);
}
@@ -398,14 +408,22 @@ static void clearOutOtherFrees(GC_state s) {
static void freeMegaBlock(GC_state s, MegaBlock mb, size_t sizeClass) {
BlockAllocator global = s->blockAllocatorGlobal;
+ size_t nb = mb->numBlocks;
+ enum BlockPurpose purpose = mb->purpose;
+
+ LOG(LM_CHUNK_POOL, LL_INFO,
+ "Freeing megablock of %zu blocks",
+ nb);
if (sizeClass >= s->controls->megablockThreshold) {
- size_t nb = mb->numBlocks;
GC_release((pointer)mb, s->controls->blockSize * mb->numBlocks);
LOG(LM_CHUNK_POOL, LL_INFO,
"Released large allocation of %zu blocks (unmap threshold: %zu)",
nb,
(size_t)1 << (s->controls->megablockThreshold - 1));
+
+ __sync_fetch_and_add(&(global->numBlocksFreed[purpose]), nb);
+ __sync_fetch_and_add(&(global->numBlocksReleased), nb);
return;
}
@@ -415,6 +433,8 @@ static void freeMegaBlock(GC_state s, MegaBlock mb, size_t sizeClass) {
mb->nextMegaBlock = global->megaBlockSizeClass[mbClass].firstMegaBlock;
global->megaBlockSizeClass[mbClass].firstMegaBlock = mb;
pthread_mutex_unlock(&(global->megaBlockLock));
+
+ __sync_fetch_and_add(&(global->numBlocksFreed[purpose]), nb);
return;
}
@@ -422,7 +442,8 @@ static void freeMegaBlock(GC_state s, MegaBlock mb, size_t sizeClass) {
static MegaBlock tryFindMegaBlock(
GC_state s,
size_t numBlocksNeeded,
- size_t sizeClass)
+ size_t sizeClass,
+ enum BlockPurpose purpose)
{
BlockAllocator global = s->blockAllocatorGlobal;
assert(sizeClass >= s->controls->superblockThreshold);
@@ -452,6 +473,8 @@ static MegaBlock tryFindMegaBlock(
mb->nextMegaBlock = NULL;
pthread_mutex_unlock(&(global->megaBlockLock));
+ __sync_fetch_and_add(&(global->numBlocksAllocated[purpose]), mb->numBlocks);
+
LOG(LM_CHUNK_POOL, LL_INFO,
"inspected %zu, satisfied large alloc of %zu blocks using megablock of %zu",
count,
@@ -468,7 +491,7 @@ static MegaBlock tryFindMegaBlock(
}
-static MegaBlock mmapNewMegaBlock(GC_state s, size_t numBlocks)
+static MegaBlock mmapNewMegaBlock(GC_state s, size_t numBlocks, enum BlockPurpose purpose)
{
pointer start = GC_mmapAnon(NULL, s->controls->blockSize * numBlocks);
if (MAP_FAILED == start) {
@@ -478,14 +501,24 @@ static MegaBlock mmapNewMegaBlock(GC_state s, size_t numBlocks)
DIE("whoops, mmap didn't align by the block-size.");
}
+ BlockAllocator global = s->blockAllocatorGlobal;
+ __sync_fetch_and_add(&(global->numBlocksMapped), numBlocks);
+ __sync_fetch_and_add(&(global->numBlocksAllocated[purpose]), numBlocks);
+
MegaBlock mb = (MegaBlock)start;
mb->numBlocks = numBlocks;
mb->nextMegaBlock = NULL;
+ mb->purpose = purpose;
+
+ LOG(LM_CHUNK_POOL, LL_INFO,
+ "mmap'ed new megablock of size %zu",
+ numBlocks);
+
return mb;
}
-Blocks allocateBlocks(GC_state s, size_t numBlocks) {
+Blocks allocateBlocksWithPurpose(GC_state s, size_t numBlocks, enum BlockPurpose purpose) {
BlockAllocator local = s->blockAllocatorLocal;
assertBlockAllocatorOkay(s, local);
@@ -497,10 +530,10 @@ Blocks allocateBlocks(GC_state s, size_t numBlocks) {
* fails, we're a bit screwed.
*/
- MegaBlock mb = tryFindMegaBlock(s, numBlocks, class);
+ MegaBlock mb = tryFindMegaBlock(s, numBlocks, class, purpose);
if (NULL == mb)
- mb = mmapNewMegaBlock(s, numBlocks);
+ mb = mmapNewMegaBlock(s, numBlocks, purpose);
if (NULL == mb)
DIE("ran out of space!");
@@ -510,6 +543,7 @@ Blocks allocateBlocks(GC_state s, size_t numBlocks) {
Blocks bs = (Blocks)mb;
bs->container = NULL;
bs->numBlocks = actualNumBlocks;
+ bs->purpose = purpose;
return bs;
}
@@ -517,31 +551,39 @@ Blocks allocateBlocks(GC_state s, size_t numBlocks) {
assertBlockAllocatorOkay(s, local);
/** Look in local first. */
- Blocks result = tryAllocateAndAdjustSuperBlocks(s, local, class);
+ Blocks result = tryAllocateAndAdjustSuperBlocks(s, local, class, purpose);
if (result != NULL) {
assertBlockAllocatorOkay(s, local);
+ assert( result->purpose == purpose );
return result;
}
/** If both local fails, we need to mmap new superchunks. */
mmapNewSuperBlocks(s, local);
- result = tryAllocateAndAdjustSuperBlocks(s, local, class);
+ result = tryAllocateAndAdjustSuperBlocks(s, local, class, purpose);
if (result == NULL) {
DIE("Ran out of space for new superblocks!");
}
assertBlockAllocatorOkay(s, local);
+ assert( result->purpose == purpose );
return result;
}
+Blocks allocateBlocks(GC_state s, size_t numBlocks) {
+ return allocateBlocksWithPurpose(s, numBlocks, BLOCK_FOR_UNKNOWN_PURPOSE);
+}
+
+
void freeBlocks(GC_state s, Blocks bs, writeFreedBlockInfoFnClosure f) {
BlockAllocator local = s->blockAllocatorLocal;
assertBlockAllocatorOkay(s, local);
size_t numBlocks = bs->numBlocks;
SuperBlock sb = bs->container;
+ enum BlockPurpose purpose = bs->purpose;
pointer blockStart = (pointer)bs;
#if ASSERT
@@ -586,6 +628,7 @@ void freeBlocks(GC_state s, Blocks bs, writeFreedBlockInfoFnClosure f) {
MegaBlock mb = (MegaBlock)blockStart;
mb->numBlocks = numBlocks;
mb->nextMegaBlock = NULL;
+ mb->purpose = purpose;
freeMegaBlock(s, mb, sizeClass);
return;
}
@@ -594,6 +637,7 @@ void freeBlocks(GC_state s, Blocks bs, writeFreedBlockInfoFnClosure f) {
FreeBlock elem = (FreeBlock)blockStart;
elem->container = sb;
+ elem->purpose = purpose;
BlockAllocator owner = sb->owner;
assert(owner != NULL);
assert(sb->sizeClass == computeSizeClass(numBlocks));
@@ -613,4 +657,166 @@ void freeBlocks(GC_state s, Blocks bs, writeFreedBlockInfoFnClosure f) {
}
}
+
+void queryCurrentBlockUsage(
+ GC_state s,
+ size_t *numBlocksMapped,
+ size_t *numGlobalBlocksMapped,
+ size_t *numBlocksReleased,
+ size_t *numGlobalBlocksReleased,
+ size_t *numBlocksAllocated,
+ size_t *numBlocksFreed)
+{
+ *numBlocksMapped = 0;
+ *numGlobalBlocksMapped = 0;
+ *numBlocksReleased = 0;
+ *numGlobalBlocksReleased = 0;
+ for (enum BlockPurpose p = 0; p < NUM_BLOCK_PURPOSES; p++) {
+ numBlocksAllocated[p] = 0;
+ numBlocksFreed[p] = 0;
+ }
+
+ // query local allocators
+ for (uint32_t i = 0; i < s->numberOfProcs; i++) {
+ BlockAllocator ball = s->procStates[i].blockAllocatorLocal;
+ *numBlocksMapped += ball->numBlocksMapped;
+ *numBlocksReleased += ball->numBlocksReleased;
+ for (enum BlockPurpose p = 0; p < NUM_BLOCK_PURPOSES; p++) {
+ numBlocksAllocated[p] += ball->numBlocksAllocated[p];
+ numBlocksFreed[p] += ball->numBlocksFreed[p];
+ }
+ }
+
+ // query global allocator
+ BlockAllocator global = s->blockAllocatorGlobal;
+ *numBlocksMapped += global->numBlocksMapped;
+ *numBlocksReleased += global->numBlocksReleased;
+ for (enum BlockPurpose p = 0; p < NUM_BLOCK_PURPOSES; p++) {
+ numBlocksAllocated[p] += global->numBlocksAllocated[p];
+ numBlocksFreed[p] += global->numBlocksFreed[p];
+ }
+
+ *numGlobalBlocksMapped += global->numBlocksMapped;
+ *numGlobalBlocksReleased += global->numBlocksReleased;
+}
+
+
+void logCurrentBlockUsage(
+ GC_state s,
+ struct timespec *now,
+ __attribute__((unused)) void *env)
+{
+ size_t mapped;
+ size_t globalMapped;
+ size_t released;
+ size_t globalReleased;
+ size_t allocated[NUM_BLOCK_PURPOSES];
+ size_t freed[NUM_BLOCK_PURPOSES];
+ queryCurrentBlockUsage(
+ s,
+ &mapped,
+ &globalMapped,
+ &released,
+ &globalReleased,
+ (size_t*)allocated,
+ (size_t*)freed
+ );
+
+ size_t inUse[NUM_BLOCK_PURPOSES];
+ for (enum BlockPurpose p = 0; p < NUM_BLOCK_PURPOSES; p++) {
+ if (freed[p] > allocated[p]) {
+ inUse[p] = 0;
+ }
+ else {
+ inUse[p] = allocated[p] - freed[p];
+ }
+ }
+
+ size_t count = mapped-released;
+ if (released > mapped) count = 0;
+
+ size_t globalCount = globalMapped - globalReleased;
+ if (globalReleased > globalMapped) globalCount = 0;
+
+ LOG(LM_BLOCK_ALLOCATOR, LL_INFO,
+ "block-allocator(%zu.%.9zu)\n"
+ " currently mapped %zu (= %zu - %zu)\n"
+ " currently mapped (global) %zu (= %zu - %zu)\n"
+ " BLOCK_FOR_HEAP_CHUNK %zu (%zu%%) (= %zu - %zu)\n"
+ " BLOCK_FOR_REMEMBERED_SET %zu (%zu%%) (= %zu - %zu)\n"
+ " BLOCK_FOR_FORGOTTEN_SET %zu (%zu%%) (= %zu - %zu)\n"
+ " BLOCK_FOR_HH_ALLOCATOR %zu (%zu%%) (= %zu - %zu)\n"
+ " BLOCK_FOR_UF_ALLOCATOR %zu (%zu%%) (= %zu - %zu)\n"
+ " BLOCK_FOR_GC_WORKLIST %zu (%zu%%) (= %zu - %zu)\n"
+ " BLOCK_FOR_SUSPECTS %zu (%zu%%) (= %zu - %zu)\n"
+ " BLOCK_FOR_EBR %zu (%zu%%) (= %zu - %zu)\n"
+ " BLOCK_FOR_UNKNOWN_PURPOSE %zu (%zu%%) (= %zu - %zu)\n",
+ now->tv_sec,
+ now->tv_nsec,
+ count,
+ mapped,
+ released,
+
+ globalCount,
+ globalMapped,
+ globalReleased,
+
+ inUse[BLOCK_FOR_HEAP_CHUNK],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_HEAP_CHUNK] / (double)count),
+ allocated[BLOCK_FOR_HEAP_CHUNK],
+ freed[BLOCK_FOR_HEAP_CHUNK],
+
+ inUse[BLOCK_FOR_REMEMBERED_SET],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_REMEMBERED_SET] / (double)count),
+ allocated[BLOCK_FOR_REMEMBERED_SET],
+ freed[BLOCK_FOR_REMEMBERED_SET],
+
+ inUse[BLOCK_FOR_FORGOTTEN_SET],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_FORGOTTEN_SET] / (double)count),
+ allocated[BLOCK_FOR_FORGOTTEN_SET],
+ freed[BLOCK_FOR_FORGOTTEN_SET],
+
+ inUse[BLOCK_FOR_HH_ALLOCATOR],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_HH_ALLOCATOR] / (double)count),
+ allocated[BLOCK_FOR_HH_ALLOCATOR],
+ freed[BLOCK_FOR_HH_ALLOCATOR],
+
+ inUse[BLOCK_FOR_UF_ALLOCATOR],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_UF_ALLOCATOR] / (double)count),
+ allocated[BLOCK_FOR_UF_ALLOCATOR],
+ freed[BLOCK_FOR_UF_ALLOCATOR],
+
+ inUse[BLOCK_FOR_GC_WORKLIST],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_GC_WORKLIST] / (double)count),
+ allocated[BLOCK_FOR_GC_WORKLIST],
+ freed[BLOCK_FOR_GC_WORKLIST],
+
+ inUse[BLOCK_FOR_SUSPECTS],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_SUSPECTS] / (double)count),
+ allocated[BLOCK_FOR_SUSPECTS],
+ freed[BLOCK_FOR_SUSPECTS],
+
+ inUse[BLOCK_FOR_EBR],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_EBR] / (double)count),
+ allocated[BLOCK_FOR_EBR],
+ freed[BLOCK_FOR_EBR],
+
+ inUse[BLOCK_FOR_UNKNOWN_PURPOSE],
+ (size_t)(100.0 * (double)inUse[BLOCK_FOR_UNKNOWN_PURPOSE] / (double)count),
+ allocated[BLOCK_FOR_UNKNOWN_PURPOSE],
+ freed[BLOCK_FOR_UNKNOWN_PURPOSE]);
+}
+
+Sampler newBlockUsageSampler(GC_state s) {
+ struct SamplerClosure func;
+ func.fun = logCurrentBlockUsage;
+ func.env = NULL;
+
+ struct timespec desiredInterval = s->controls->blockUsageSampleInterval;
+ Sampler result = malloc(sizeof(struct Sampler));
+ initSampler(s, result, &func, &desiredInterval);
+
+ return result;
+}
+
#endif
diff --git a/runtime/gc/block-allocator.h b/runtime/gc/block-allocator.h
index 76d277ebd..47b55a2dc 100644
--- a/runtime/gc/block-allocator.h
+++ b/runtime/gc/block-allocator.h
@@ -16,6 +16,20 @@
#if (defined (MLTON_GC_INTERNAL_TYPES))
+enum BlockPurpose {
+ BLOCK_FOR_HEAP_CHUNK,
+ BLOCK_FOR_REMEMBERED_SET,
+ BLOCK_FOR_FORGOTTEN_SET,
+ BLOCK_FOR_HH_ALLOCATOR,
+ BLOCK_FOR_UF_ALLOCATOR,
+ BLOCK_FOR_GC_WORKLIST,
+ BLOCK_FOR_SUSPECTS,
+ BLOCK_FOR_EBR,
+ BLOCK_FOR_UNKNOWN_PURPOSE,
+ NUM_BLOCK_PURPOSES /** Hack to know statically how many there are. Make sure
+ * this comes last in the list. */
+};
+
/** This is used for debugging, to write info about freed blocks when
* s->controls->debugKeepFreedBlocks is enabled.
*
@@ -48,6 +62,7 @@ typedef struct DebugKeptFreeBlock {
typedef struct FreeBlock {
struct FreeBlock *nextFree;
struct SuperBlock *container;
+ enum BlockPurpose purpose;
} *FreeBlock;
@@ -103,6 +118,7 @@ typedef struct SuperBlockList {
typedef struct MegaBlock {
struct MegaBlock *nextMegaBlock;
size_t numBlocks;
+ enum BlockPurpose purpose;
} *MegaBlock;
@@ -124,11 +140,10 @@ enum FullnessGroup {
typedef struct BlockAllocator {
- /** These are used for local to decide to move things to global, but
- * are ignored in global.
- */
- size_t numBlocks;
- size_t numBlocksInUse;
+ size_t numBlocksMapped;
+ size_t numBlocksReleased;
+ size_t numBlocksAllocated[NUM_BLOCK_PURPOSES];
+ size_t numBlocksFreed[NUM_BLOCK_PURPOSES];
/** There are 3 fullness groups in each size class:
* 0 is completely full, i.e. no free blocks available
@@ -160,6 +175,7 @@ typedef struct BlockAllocator {
typedef struct Blocks {
SuperBlock container;
size_t numBlocks;
+ enum BlockPurpose purpose;
} *Blocks;
#else
@@ -179,9 +195,30 @@ void initLocalBlockAllocator(GC_state s, BlockAllocator globalAllocator);
/** Get a pointer to the start of some number of free contiguous blocks. */
Blocks allocateBlocks(GC_state s, size_t numBlocks);
+Blocks allocateBlocksWithPurpose(GC_state s, size_t numBlocks, enum BlockPurpose purpose);
+
/** Free a group of contiguous blocks. */
void freeBlocks(GC_state s, Blocks bs, writeFreedBlockInfoFnClosure f);
+
+/** populate:
+ * *numBlocks := current total number of blocks mmap'ed
+ * blocksAllocated[p] := cumulative number of blocks allocated for purpose `p`
+ * blocksFreed[p] := cumulative number of blocks freed for purpose `p`
+ *
+ * The `blocksAllocated` and `blocksFreed` arrays must have length `NUM_BLOCK_PURPOSES`
+ */
+void queryCurrentBlockUsage(
+ GC_state s,
+ size_t *numBlocksMapped,
+ size_t *numGlobalBlocksMapped,
+ size_t *numBlocksReleased,
+ size_t *numGlobalBlocksReleased,
+ size_t *blocksAllocated,
+ size_t *blocksFreed);
+
+Sampler newBlockUsageSampler(GC_state s);
+
#endif
#endif // BLOCK_ALLOCATOR_H_
diff --git a/runtime/gc/cc-work-list.c b/runtime/gc/cc-work-list.c
index 1724fa8b9..18705b6be 100644
--- a/runtime/gc/cc-work-list.c
+++ b/runtime/gc/cc-work-list.c
@@ -10,16 +10,18 @@ void CC_workList_init(
HM_chunkList c = &(w->storage);
HM_initChunkList(c);
// arbitrary, just need an initial chunk
- w->currentChunk = HM_allocateChunk(c, sizeof(struct CC_workList_elem));
+ w->currentChunk = HM_allocateChunkWithPurpose(
+ c,
+ sizeof(struct CC_workList_elem),
+ BLOCK_FOR_GC_WORKLIST);
}
void CC_workList_free(
__attribute__((unused)) GC_state s,
CC_workList w)
{
-
HM_chunkList c = &(w->storage);
- HM_freeChunksInListWithInfo(s, c, NULL);
+ HM_freeChunksInListWithInfo(s, c, NULL, BLOCK_FOR_GC_WORKLIST);
w->currentChunk = NULL;
}
@@ -137,7 +139,11 @@ void CC_workList_push(
if (chunk->nextChunk != NULL) {
chunk = chunk->nextChunk; // this will be an empty chunk
} else {
- chunk = HM_allocateChunk(list, elemSize);
+ // chunk = HM_allocateChunk(list, elemSize);
+ chunk = HM_allocateChunkWithPurpose(
+ list,
+ elemSize,
+ BLOCK_FOR_GC_WORKLIST);
}
w->currentChunk = chunk;
}
@@ -298,7 +304,7 @@ objptr* CC_workList_pop(
if (NULL != chunk->nextChunk) {
HM_chunk nextChunk = chunk->nextChunk;
HM_unlinkChunk(list, nextChunk);
- HM_freeChunk(s, nextChunk);
+ HM_freeChunkWithInfo(s, nextChunk, NULL, BLOCK_FOR_GC_WORKLIST);
}
assert(NULL == chunk->nextChunk);
diff --git a/runtime/gc/cc-work-list.h b/runtime/gc/cc-work-list.h
index c30eff34a..0af8209ce 100644
--- a/runtime/gc/cc-work-list.h
+++ b/runtime/gc/cc-work-list.h
@@ -51,6 +51,8 @@ void CC_workList_push(GC_state s, CC_workList w, objptr op);
* Returns NULL if work list is empty */
objptr* CC_workList_pop(GC_state s, CC_workList w);
+void CC_workList_free(GC_state s, CC_workList w);
+
#endif /* MLTON_GC_INTERNAL_FUNCS */
#endif /* CC_WORK_LIST_H */
diff --git a/runtime/gc/chunk.c b/runtime/gc/chunk.c
index 1d0589831..09c078bc4 100644
--- a/runtime/gc/chunk.c
+++ b/runtime/gc/chunk.c
@@ -107,7 +107,7 @@ HM_chunk HM_initializeChunk(pointer start, pointer end) {
chunk->mightContainMultipleObjects = TRUE;
chunk->tmpHeap = NULL;
chunk->decheckState = DECHECK_BOGUS_TID;
- chunk->disentangledDepth = INT32_MAX;
+ chunk->retireChunk = FALSE;
chunk->magic = CHUNK_MAGIC;
#if ASSERT
@@ -119,11 +119,11 @@ HM_chunk HM_initializeChunk(pointer start, pointer end) {
}
-HM_chunk HM_getFreeChunk(GC_state s, size_t bytesRequested) {
+HM_chunk HM_getFreeChunkWithPurpose(GC_state s, size_t bytesRequested, enum BlockPurpose purpose) {
size_t chunkWidth =
align(bytesRequested + sizeof(struct HM_chunk), HM_BLOCK_SIZE);
size_t numBlocks = chunkWidth / HM_BLOCK_SIZE;
- Blocks start = allocateBlocks(s, numBlocks);
+ Blocks start = allocateBlocksWithPurpose(s, numBlocks, purpose);
SuperBlock container = start->container;
numBlocks = start->numBlocks;
HM_chunk result =
@@ -134,6 +134,11 @@ HM_chunk HM_getFreeChunk(GC_state s, size_t bytesRequested) {
}
+HM_chunk HM_getFreeChunk(GC_state s, size_t bytesRequested) {
+ return HM_getFreeChunkWithPurpose(s, bytesRequested, BLOCK_FOR_UNKNOWN_PURPOSE);
+}
+
+
struct writeChunkInfoArgs {
writeFreedBlockInfoFn fun;
void* env;
@@ -174,7 +179,8 @@ void writeChunkInfo(
void HM_freeChunkWithInfo(
GC_state s,
HM_chunk chunk,
- writeFreedBlockInfoFnClosure f)
+ writeFreedBlockInfoFnClosure f,
+ enum BlockPurpose purpose)
{
struct writeChunkInfoArgs args;
@@ -199,34 +205,40 @@ void HM_freeChunkWithInfo(
Blocks bs = (Blocks)chunk;
bs->numBlocks = numBlocks;
bs->container = container;
+ bs->purpose = purpose;
freeBlocks(s, bs, &c);
}
void HM_freeChunk(GC_state s, HM_chunk chunk) {
- HM_freeChunkWithInfo(s, chunk, NULL);
+ HM_freeChunkWithInfo(s, chunk, NULL, BLOCK_FOR_UNKNOWN_PURPOSE);
}
void HM_freeChunksInListWithInfo(
GC_state s,
HM_chunkList list,
- writeFreedBlockInfoFnClosure f)
+ writeFreedBlockInfoFnClosure f,
+ enum BlockPurpose purpose)
{
HM_chunk chunk = list->firstChunk;
while (chunk != NULL) {
HM_chunk next = chunk->nextChunk;
- HM_freeChunkWithInfo(s, chunk, f);
+ HM_freeChunkWithInfo(s, chunk, f, purpose);
chunk = next;
}
HM_initChunkList(list);
}
void HM_freeChunksInList(GC_state s, HM_chunkList list) {
- HM_freeChunksInListWithInfo(s, list, NULL);
+ HM_freeChunksInListWithInfo(s, list, NULL, BLOCK_FOR_UNKNOWN_PURPOSE);
}
-HM_chunk HM_allocateChunk(HM_chunkList list, size_t bytesRequested) {
+HM_chunk HM_allocateChunkWithPurpose(
+ HM_chunkList list,
+ size_t bytesRequested,
+ enum BlockPurpose purpose)
+{
GC_state s = pthread_getspecific(gcstate_key);
- HM_chunk chunk = HM_getFreeChunk(s, bytesRequested);
+ HM_chunk chunk = HM_getFreeChunkWithPurpose(s, bytesRequested, purpose);
if (NULL == chunk) {
DIE("Out of memory. Unable to allocate chunk of size %zu.",
@@ -245,6 +257,12 @@ HM_chunk HM_allocateChunk(HM_chunkList list, size_t bytesRequested) {
return chunk;
}
+
+HM_chunk HM_allocateChunk(HM_chunkList list, size_t bytesRequested) {
+ return HM_allocateChunkWithPurpose(list, bytesRequested, BLOCK_FOR_UNKNOWN_PURPOSE);
+}
+
+
void HM_initChunkList(HM_chunkList list) {
list->firstChunk = NULL;
list->lastChunk = NULL;
@@ -475,10 +493,10 @@ size_t HM_getChunkListUsedSize(HM_chunkList list) {
return list->usedSize;
}
-pointer HM_storeInchunkList(HM_chunkList chunkList, void* p, size_t objSize) {
+pointer HM_storeInChunkListWithPurpose(HM_chunkList chunkList, void* p, size_t objSize, enum BlockPurpose purpose) {
HM_chunk chunk = HM_getChunkListLastChunk(chunkList);
if (NULL == chunk || HM_getChunkSizePastFrontier(chunk) < objSize) {
- chunk = HM_allocateChunk(chunkList, objSize);
+ chunk = HM_allocateChunkWithPurpose(chunkList, objSize, purpose);
}
assert(NULL != chunk);
@@ -493,6 +511,10 @@ pointer HM_storeInchunkList(HM_chunkList chunkList, void* p, size_t objSize) {
return frontier;
}
+pointer HM_storeInChunkList(HM_chunkList chunkList, void* p, size_t objSize) {
+ return HM_storeInChunkListWithPurpose(chunkList, p, objSize, BLOCK_FOR_UNKNOWN_PURPOSE);
+}
+
HM_HierarchicalHeap HM_getLevelHead(HM_chunk chunk) {
assert(chunk != NULL);
assert(chunk->levelHead != NULL);
@@ -624,8 +646,8 @@ void HM_assertChunkListInvariants(HM_chunkList chunkList) {
chunk = chunk->nextChunk;
}
assert(chunkList->lastChunk == chunk);
- assert(chunkList->size == size);
- assert(chunkList->usedSize == usedSize);
+ // assert(chunkList->size == size);
+ // assert(chunkList->usedSize == usedSize);
}
#else
void HM_assertChunkListInvariants(HM_chunkList chunkList) {
diff --git a/runtime/gc/chunk.h b/runtime/gc/chunk.h
index ee3cfbf62..6335b45ea 100644
--- a/runtime/gc/chunk.h
+++ b/runtime/gc/chunk.h
@@ -69,7 +69,7 @@ struct HM_chunk {
/** set during entanglement when in "safe" mode, to help temporarily disable
* local GCs while the entanglement persists.
*/
- int32_t disentangledDepth;
+ bool retireChunk;
bool mightContainMultipleObjects;
void* tmpHeap;
@@ -156,14 +156,15 @@ HM_chunk HM_getFreeChunk(GC_state s, size_t bytesRequested);
* chunk->limit - chunk->frontier <= bytesRequested
* Returns NULL if unable to find space for such a chunk. */
HM_chunk HM_allocateChunk(HM_chunkList list, size_t bytesRequested);
+HM_chunk HM_allocateChunkWithPurpose(HM_chunkList list, size_t bytesRequested, enum BlockPurpose purpose);
void HM_initChunkList(HM_chunkList list);
void HM_freeChunk(GC_state s, HM_chunk chunk);
void HM_freeChunksInList(GC_state s, HM_chunkList list);
-void HM_freeChunkWithInfo(GC_state s, HM_chunk chunk, writeFreedBlockInfoFnClosure f);
-void HM_freeChunksInListWithInfo(GC_state s, HM_chunkList list, writeFreedBlockInfoFnClosure f);
+void HM_freeChunkWithInfo(GC_state s, HM_chunk chunk, writeFreedBlockInfoFnClosure f, enum BlockPurpose purpose);
+void HM_freeChunksInListWithInfo(GC_state s, HM_chunkList list, writeFreedBlockInfoFnClosure f, enum BlockPurpose purpose);
// void HM_deleteChunks(GC_state s, HM_chunkList deleteList);
void HM_appendChunkList(HM_chunkList destinationChunkList, HM_chunkList chunkList);
@@ -265,7 +266,8 @@ pointer HM_shiftChunkStart(HM_chunk chunk, size_t bytes);
pointer HM_getChunkStartGap(HM_chunk chunk);
/* store the object pointed by p at the end of list and return the address */
-pointer HM_storeInchunkList(HM_chunkList chunkList, void* p, size_t objSize);
+pointer HM_storeInChunkList(HM_chunkList chunkList, void* p, size_t objSize);
+pointer HM_storeInchunkListWithPurpose(HM_chunkList chunkList, void* p, size_t objSize, enum BlockPurpose purpose);
/**
diff --git a/runtime/gc/concurrent-collection.c b/runtime/gc/concurrent-collection.c
index 13987d1c7..f7133f87e 100644
--- a/runtime/gc/concurrent-collection.c
+++ b/runtime/gc/concurrent-collection.c
@@ -325,6 +325,7 @@ void tryMarkAndAddToWorkList(
if (!CC_isPointerMarked(p)) {
markObj(p);
args->bytesSaved += sizeofObject(s, p);
+ args->numObjectsMarked++;
assert(CC_isPointerMarked(p));
CC_workList_push(s, &(args->worklist), op);
}
@@ -416,7 +417,9 @@ void forwardPtrChunk (GC_state s, objptr *opp, void* rawArgs) {
void forwardPinned(GC_state s, HM_remembered remElem, void* rawArgs) {
objptr src = remElem->object;
tryMarkAndMarkLoop(s, &src, src, rawArgs);
- tryMarkAndMarkLoop(s, &(remElem->from), remElem->from, rawArgs);
+ if (remElem->from != BOGUS_OBJPTR) {
+ tryMarkAndMarkLoop(s, &(remElem->from), remElem->from, rawArgs);
+ }
#if 0
#if ASSERT
@@ -496,10 +499,12 @@ void unmarkPinned(
{
objptr src = remElem->object;
assert(!(HM_getChunkOf(objptrToPointer(src, NULL))->pinnedDuringCollection));
+ tryUnmarkAndUnmarkLoop(s, &src, src, rawArgs);
+ if (remElem->from != BOGUS_OBJPTR) {
+ tryUnmarkAndUnmarkLoop(s, &(remElem->from), remElem->from, rawArgs);
+ }
// unmarkPtrChunk(s, &src, rawArgs);
// unmarkPtrChunk(s, &(remElem->from), rawArgs);
- tryUnmarkAndUnmarkLoop(s, &src, src, rawArgs);
- tryUnmarkAndUnmarkLoop(s, &(remElem->from), remElem->from, rawArgs);
#if 0
#if ASSERT
@@ -536,6 +541,7 @@ void forceForward(GC_state s, objptr *opp, void* rawArgs) {
markObj(p);
assert(CC_isPointerMarked(p));
args->bytesSaved += sizeofObject(s, p);
+ args->numObjectsMarked++;
}
CC_workList_push(s, &(args->worklist), op);
@@ -648,19 +654,22 @@ void CC_collectAtRoot(pointer threadp, pointer hhp) {
#endif
size_t beforeSize = HM_getChunkListSize(HM_HH_getChunkList(heap));
- size_t live = CC_collectWithRoots(s, heap, thread);
+ size_t live = 0;
+ size_t numObjectsMarked = 0;
+ CC_collectWithRoots(s, heap, thread, &live, &numObjectsMarked);
size_t afterSize = HM_getChunkListSize(HM_HH_getChunkList(heap));
size_t diff = beforeSize > afterSize ? beforeSize - afterSize : 0;
LOG(LM_CC_COLLECTION, LL_INFO,
- "finished at depth %u. before: %zu after: %zu (-%.01lf%%) live: %zu (%.01lf%% fragmented)",
+ "finished at depth %u. before: %zu after: %zu (-%.01lf%%) live: %zu (%.01lf%% fragmented) objects: %zu",
heap->depth,
beforeSize,
afterSize,
100.0 * ((double)diff / (double)beforeSize),
live,
- 100.0 * (1.0 - (double)live / (double)afterSize));
+ 100.0 * (1.0 - (double)live / (double)afterSize),
+ numObjectsMarked);
// HM_HH_getConcurrentPack(heap)->ccstate = CC_UNREG;
__atomic_store_n(&(HM_HH_getConcurrentPack(heap)->ccstate), CC_DONE, __ATOMIC_SEQ_CST);
@@ -694,7 +703,7 @@ void CC_collectAtPublicLevel(GC_state s, GC_thread thread, uint32_t depth) {
// collect only if the heap is above a threshold size
if (HM_getChunkListSize(&(heap->chunkList)) >= 2 * HM_BLOCK_SIZE) {
assert(getThreadCurrent(s) == thread);
- CC_collectWithRoots(s, heap, thread);
+ CC_collectWithRoots(s, heap, thread, NULL, NULL);
}
// Mark that collection is complete
@@ -705,18 +714,17 @@ void CC_collectAtPublicLevel(GC_state s, GC_thread thread, uint32_t depth) {
/* ========================================================================= */
struct CC_tryUnpinOrKeepPinnedArgs {
- HM_chunkList newRemSet;
+ HM_remSet newRemSet;
HM_HierarchicalHeap tgtHeap;
void* fromSpaceMarker;
void* toSpaceMarker;
};
-
void CC_tryUnpinOrKeepPinned(
- __attribute__((unused)) GC_state s,
- HM_remembered remElem,
- void* rawArgs)
+ __attribute__((unused)) GC_state s,
+ HM_remembered remElem,
+ void *rawArgs)
{
struct CC_tryUnpinOrKeepPinnedArgs* args =
(struct CC_tryUnpinOrKeepPinnedArgs *)rawArgs;
@@ -751,7 +759,7 @@ void CC_tryUnpinOrKeepPinned(
* entry. It will be merged and handled properly later.
*/
- HM_remember(args->newRemSet, remElem);
+ HM_remember(args->newRemSet, remElem, false);
return;
}
@@ -759,7 +767,40 @@ void CC_tryUnpinOrKeepPinned(
assert(chunk->tmpHeap == args->fromSpaceMarker);
assert(HM_getLevelHead(chunk) == args->tgtHeap);
- HM_chunk fromChunk = HM_getChunkOf(objptrToPointer(remElem->from, NULL));
+ // pointer p = objptrToPointer(remElem->object, NULL);
+ // GC_header header = getHeader(p);
+ // enum PinType pt = pinType(header);
+ // uint32_t unpinDepth = unpinDepthOf(remElem->object);
+
+ // assert (pt != PIN_NONE);
+
+ if (remElem->from != BOGUS_OBJPTR)
+ {
+
+ // uint32_t opDepth = HM_HH_getDepth(args->tgtHeap);
+ // if (unpinDepth > opDepth && pt == PIN_ANY) {
+ // tryPinDec(remElem->object, opDepth);
+ // }
+
+ HM_chunk fromChunk = HM_getChunkOf(objptrToPointer(remElem->from, NULL));
+ assert(fromChunk->tmpHeap != args->toSpaceMarker);
+
+ if (fromChunk->tmpHeap == args->fromSpaceMarker) {
+ assert(isChunkInList(fromChunk, HM_HH_getChunkList(args->tgtHeap)));
+ return;
+ }
+
+ /* otherwise, object stays pinned, and we have to keep this remembered
+ * entry into the toSpace. */
+ } else {
+ GC_header header = getHeader(objptrToPointer(remElem->object, NULL));
+ if (pinType(header) == PIN_ANY &&
+ unpinDepthOfH(header) >= HM_HH_getDepth(args->tgtHeap)) {
+ return;
+ }
+ }
+
+ HM_remember(args->newRemSet, remElem, false);
/** SAM_NOTE: The goal of the following was to filter remset entries
* to only keep the "shallowest" entries. But this is really tricky,
@@ -772,7 +813,6 @@ void CC_tryUnpinOrKeepPinned(
*/
#if 0
uint32_t unpinDepth = unpinDepthOf(op);
- uint32_t opDepth = HM_HH_getDepth(args->tgtHeap);
uint32_t fromDepth = HM_HH_getDepth(HM_getLevelHead(fromChunk));
if (fromDepth > unpinDepth) {
/** Can forget any down-pointer that came from shallower than the
@@ -783,20 +823,6 @@ void CC_tryUnpinOrKeepPinned(
assert(opDepth < unpinDepth || fromDepth == unpinDepth);
#endif
- assert(fromChunk->tmpHeap != args->toSpaceMarker);
-
- if (fromChunk->tmpHeap == args->fromSpaceMarker)
- {
- // fromChunk is in-scope of CC. Don't need to keep this remembered entry.
- assert(isChunkInList(fromChunk, HM_HH_getChunkList(args->tgtHeap)));
- return;
- }
-
- /* otherwise, object stays pinned, and we have to keep this remembered
- * entry into the toSpace. */
-
- HM_remember(args->newRemSet, remElem);
-
assert(isChunkInList(chunk, HM_HH_getChunkList(args->tgtHeap)));
assert(HM_getLevelHead(chunk) == args->tgtHeap);
}
@@ -809,9 +835,9 @@ void CC_filterPinned(
void* fromSpaceMarker,
void* toSpaceMarker)
{
- HM_chunkList oldRemSet = HM_HH_getRemSet(hh);
- struct HM_chunkList newRemSet;
- HM_initChunkList(&newRemSet);
+ HM_remSet oldRemSet = HM_HH_getRemSet(hh);
+ struct HM_remSet newRemSet;
+ HM_initRemSet(&newRemSet);
LOG(LM_CC_COLLECTION, LL_INFO,
"num pinned initially: %zu",
@@ -832,7 +858,7 @@ void CC_filterPinned(
/** Save "valid" entries to newRemSet, throw away old entries, and store
* valid entries back into the main remembered set.
*/
- HM_foreachRemembered(s, oldRemSet, &closure);
+ HM_foreachRemembered(s, oldRemSet, &closure, false);
struct CC_chunkInfo info =
{.initialDepth = initialDepth,
@@ -842,13 +868,17 @@ void CC_filterPinned(
struct writeFreedBlockInfoFnClosure infoc =
{.fun = CC_writeFreeChunkInfo, .env = &info};
- HM_freeChunksInListWithInfo(s, oldRemSet, &infoc);
- *oldRemSet = newRemSet; // this moves all data into remset of hh
+ // HM_freeRemSetWithInfo(s, oldRemSet, &infoc);
+ // this reintializes the private remset
+ HM_freeChunksInListWithInfo(s, &(oldRemSet->private), &infoc, BLOCK_FOR_REMEMBERED_SET);
+ assert (newRemSet.public.firstChunk == NULL);
+ // this moves all data into remset of hh
+ HM_appendRemSet(oldRemSet, &newRemSet);
- assert(HM_HH_getRemSet(hh)->firstChunk == newRemSet.firstChunk);
- assert(HM_HH_getRemSet(hh)->lastChunk == newRemSet.lastChunk);
- assert(HM_HH_getRemSet(hh)->size == newRemSet.size);
- assert(HM_HH_getRemSet(hh)->usedSize == newRemSet.usedSize);
+ // assert(HM_HH_getRemSet(hh)->firstChunk == newRemSet.firstChunk);
+ // assert(HM_HH_getRemSet(hh)->lastChunk == newRemSet.lastChunk);
+ // assert(HM_HH_getRemSet(hh)->size == newRemSet.size);
+ // assert(HM_HH_getRemSet(hh)->usedSize == newRemSet.usedSize);
LOG(LM_CC_COLLECTION, LL_INFO,
"num pinned after filter: %zu",
@@ -883,10 +913,12 @@ void CC_filterDownPointers(GC_state s, HM_chunkList x, HM_HierarchicalHeap hh){
#endif
-size_t CC_collectWithRoots(
+void CC_collectWithRoots(
GC_state s,
HM_HierarchicalHeap targetHH,
- __attribute__((unused)) GC_thread thread)
+ __attribute__((unused)) GC_thread thread,
+ size_t *outputBytesSaved,
+ size_t *outputNumObjectsMarked)
{
getStackCurrent(s)->used = sizeofGCStateCurrentStackUsed(s);
getThreadCurrent(s)->exnStack = s->exnStack;
@@ -910,9 +942,6 @@ size_t CC_collectWithRoots(
// chunks in which all objects are garbage. Before exiting, chunks in
// origList are added to the free list.
- bool isConcurrent = (HM_HH_getDepth(targetHH) == 1);
- // assert(isConcurrent);
-
uint32_t initialDepth = HM_HH_getDepth(targetHH);
struct HM_chunkList _repList;
@@ -927,7 +956,8 @@ size_t CC_collectWithRoots(
.repList = repList,
.toHead = (void*)repList,
.fromHead = (void*) &(origList),
- .bytesSaved = 0
+ .bytesSaved = 0,
+ .numObjectsMarked = 0
};
CC_workList_init(s, &(lists.worklist));
@@ -965,7 +995,7 @@ size_t CC_collectWithRoots(
struct HM_foreachDownptrClosure forwardPinnedClosure =
{.fun = forwardPinned, .env = (void*)&lists};
- HM_foreachRemembered(s, HM_HH_getRemSet(targetHH), &forwardPinnedClosure);
+ HM_foreachRemembered(s, HM_HH_getRemSet(targetHH), &forwardPinnedClosure, false);
// forward closures, stack and deque?
forceForward(s, &(cp->snapLeft), &lists);
@@ -1025,7 +1055,7 @@ size_t CC_collectWithRoots(
struct HM_foreachDownptrClosure unmarkPinnedClosure =
{.fun = unmarkPinned, .env = &lists};
- HM_foreachRemembered(s, HM_HH_getRemSet(targetHH), &unmarkPinnedClosure);
+ HM_foreachRemembered(s, HM_HH_getRemSet(targetHH), &unmarkPinnedClosure, false);
forceUnmark(s, &(cp->snapLeft), &lists);
forceUnmark(s, &(cp->snapRight), &lists);
@@ -1039,7 +1069,7 @@ size_t CC_collectWithRoots(
forEachObjptrInCCStackBag(s, removedFromCCBag, tryUnmarkAndUnmarkLoop, &lists);
unmarkLoop(s, &lists);
- HM_freeChunksInList(s, removedFromCCBag);
+ HM_freeChunksInListWithInfo(s, removedFromCCBag, NULL, BLOCK_FOR_FORGOTTEN_SET);
assert(CC_workList_isEmpty(s, &(lists.worklist)));
CC_workList_free(s, &(lists.worklist));
@@ -1126,8 +1156,8 @@ size_t CC_collectWithRoots(
/** SAM_NOTE: TODO: deleteList no longer needed, because
* block allocator handles that.
*/
- HM_freeChunksInListWithInfo(s, origList, &infoc);
- HM_freeChunksInListWithInfo(s, deleteList, &infoc);
+ HM_freeChunksInListWithInfo(s, origList, &infoc, BLOCK_FOR_HEAP_CHUNK);
+ HM_freeChunksInListWithInfo(s, deleteList, &infoc, BLOCK_FOR_HEAP_CHUNK);
for(HM_chunk chunk = repList->firstChunk;
chunk!=NULL; chunk = chunk->nextChunk) {
@@ -1141,9 +1171,11 @@ size_t CC_collectWithRoots(
assert(!(stackChunk->mightContainMultipleObjects));
assert(HM_HH_getChunkList(HM_getLevelHead(stackChunk)) == origList);
assert(isChunkInList(stackChunk, origList));
+ assert(bytesSaved >= HM_getChunkUsedSize(stackChunk));
+ bytesSaved -= HM_getChunkUsedSize(stackChunk);
HM_unlinkChunk(origList, stackChunk);
info.freedType = CC_FREED_STACK_CHUNK;
- HM_freeChunkWithInfo(s, stackChunk, &infoc);
+ HM_freeChunkWithInfo(s, stackChunk, &infoc, BLOCK_FOR_HEAP_CHUNK);
info.freedType = CC_FREED_NORMAL_CHUNK;
cp->stack = BOGUS_OBJPTR;
@@ -1164,19 +1196,22 @@ size_t CC_collectWithRoots(
timespec_now(&stopTime);
timespec_sub(&stopTime, &startTime);
-
- if (isConcurrent) {
- timespec_add(&(s->cumulativeStatistics->timeRootCC), &stopTime);
- s->cumulativeStatistics->numRootCCs++;
- s->cumulativeStatistics->bytesReclaimedByRootCC += bytesScanned-bytesSaved;
- } else {
- timespec_add(&(s->cumulativeStatistics->timeInternalCC), &stopTime);
- s->cumulativeStatistics->numInternalCCs++;
- s->cumulativeStatistics->bytesReclaimedByInternalCC += bytesScanned-bytesSaved;
+ timespec_add(&(s->cumulativeStatistics->timeCC), &stopTime);
+ s->cumulativeStatistics->numCCs++;
+ assert(bytesScanned >= bytesSaved);
+ uintmax_t bytesReclaimed = bytesScanned-bytesSaved;
+ s->cumulativeStatistics->bytesInScopeForCC += bytesScanned;
+ s->cumulativeStatistics->bytesReclaimedByCC += bytesReclaimed;
+
+ if (outputBytesSaved != NULL) {
+ *outputBytesSaved = lists.bytesSaved;
}
- return lists.bytesSaved;
-
+ if (outputNumObjectsMarked != NULL) {
+ *outputNumObjectsMarked = lists.numObjectsMarked;
+ }
+
+ return;
}
#endif
diff --git a/runtime/gc/concurrent-collection.h b/runtime/gc/concurrent-collection.h
index a52dd757b..db0bfc561 100644
--- a/runtime/gc/concurrent-collection.h
+++ b/runtime/gc/concurrent-collection.h
@@ -28,6 +28,7 @@ typedef struct ConcurrentCollectArgs {
void* toHead;
void* fromHead;
size_t bytesSaved;
+ size_t numObjectsMarked;
} ConcurrentCollectArgs;
@@ -89,8 +90,13 @@ PRIVATE void GC_updateObjectHeader(GC_state s, pointer p, GC_header newHeader);
// in the chunk is live then the whole chunk is. However, tracing is at the granularity of objects.
// Objects in chunks that are preserved may point to chunks that are not. But such objects aren't
// reachable.
-size_t CC_collectWithRoots(GC_state s, struct HM_HierarchicalHeap * targetHH, GC_thread thread);
-
+void CC_collectWithRoots(
+ GC_state s,
+ struct HM_HierarchicalHeap * targetHH,
+ GC_thread thread,
+ size_t *bytesSaved,
+ size_t *numObjectsMarked);
+
void CC_collectAtPublicLevel(GC_state s, GC_thread thread, uint32_t depth);
void CC_addToStack(GC_state s, ConcurrentPackage cp, pointer p);
void CC_initStack(GC_state s, ConcurrentPackage cp);
diff --git a/runtime/gc/concurrent-list.c b/runtime/gc/concurrent-list.c
new file mode 100644
index 000000000..e34c165ad
--- /dev/null
+++ b/runtime/gc/concurrent-list.c
@@ -0,0 +1,224 @@
+void CC_initConcList(CC_concList concList) {
+ concList->firstChunk = NULL;
+ concList->lastChunk = NULL;
+ pthread_mutex_init(&(concList->mutex), NULL);
+}
+
+
+void allocateChunkInConcList(
+ CC_concList concList,
+ size_t objSize,
+ HM_chunk lastChunk,
+ enum BlockPurpose purpose)
+{
+ GC_state s = pthread_getspecific(gcstate_key);
+
+ // pthread_mutex_lock(&concList->mutex);
+ if(concList->lastChunk != lastChunk) {
+ // pthread_mutex_unlock(&concList->mutex);
+ return;
+ }
+
+ HM_chunk chunk = HM_getFreeChunkWithPurpose(s, objSize, purpose);
+
+ if (NULL == chunk)
+ {
+ DIE("Out of memory. Unable to allocate chunk of size %zu.",
+ objSize);
+ }
+
+ assert(chunk->frontier == HM_getChunkStart(chunk));
+ assert(chunk->mightContainMultipleObjects);
+ assert((size_t)(chunk->limit - chunk->frontier) >= objSize);
+ assert(chunk != NULL);
+
+ chunk->prevChunk = lastChunk;
+ // if (lastChunk != NULL)
+ // {
+ // lastChunk->nextChunk = chunk;
+ // }
+ // concList->lastChunk = chunk;
+
+ memset((void *)HM_getChunkStart(chunk), '\0', HM_getChunkLimit(chunk) - HM_getChunkStart(chunk));
+
+ bool success = false;
+ if(concList->lastChunk == lastChunk) {
+ pthread_mutex_lock(&concList->mutex);
+ if (concList->lastChunk == lastChunk) {
+ if (concList->firstChunk == NULL) {
+ concList->firstChunk = chunk;
+ }
+ if(lastChunk != NULL) {
+ lastChunk->nextChunk = chunk;
+ }
+ concList->lastChunk = chunk;
+ success = true;
+ }
+ pthread_mutex_unlock(&concList->mutex);
+ }
+ if (!success) {
+ HM_freeChunkWithInfo(s, chunk, NULL, purpose);
+ }
+
+ // if (!__sync_bool_compare_and_swap(&(concList->lastChunk), lastChunk, chunk)) {
+ // HM_freeChunk(s, chunk);
+ // } else if (lastChunk != NULL) {
+ // lastChunk->nextChunk = chunk;
+ // }
+}
+
+
+pointer CC_storeInConcListWithPurpose(CC_concList concList, void* p, size_t objSize, enum BlockPurpose purpose){
+ assert(concList != NULL);
+ // pthread_mutex_lock(&concList->mutex);
+ while(TRUE) {
+ HM_chunk chunk = concList->lastChunk;
+ if (NULL == chunk) {
+ allocateChunkInConcList(concList, objSize, chunk, purpose);
+ continue;
+ }
+ else {
+ pointer frontier = HM_getChunkFrontier(chunk);
+ size_t sizePast = (size_t) (chunk->limit - frontier);
+ if (sizePast < objSize) {
+ allocateChunkInConcList(concList, objSize, chunk, purpose);
+ continue;
+ }
+
+ pointer new_frontier = frontier + objSize;
+ bool success = __sync_bool_compare_and_swap(&(chunk->frontier), frontier, new_frontier);
+ if (success)
+ {
+ memcpy(frontier, p, objSize);
+ // pthread_mutex_unlock(&concList->mutex);
+ return frontier;
+ }
+ }
+ }
+ // pthread_mutex_unlock(&concList->mutex);
+ DIE("should never come here");
+ return NULL;
+}
+
+
+// void CC_foreachObjInList(CC_concList concList, size_t objSize, HM_foreachObjClosure f) {
+
+// struct HM_chunkList _chunkList;
+// HM_chunkList chunkList = &(_chunkList);
+// chunkList->firstChunk = concList->firstChunk;
+// pthread_mutex_lock(&concList->mutex);
+// chunkList->lastChunk = concList->lastChunk;
+// concList->firstChunk = NULL;
+// concList->lastChunk = NULL;
+// pthread_mutex_unlock(&concList->mutex);
+// HM_foreachObjInChunkList(chunkList, objSize, f);
+// }
+
+// void CC_foreachRemInConc(GC_state s, CC_concList concList, struct HM_foreachDownptrClosure* f) {
+// struct HM_chunkList _store;
+// HM_chunkList store = &(_store);
+// HM_initChunkList(store);
+
+// while (TRUE) {
+// HM_chunk firstChunk = concList->firstChunk;
+// if (firstChunk == NULL) {
+// break;
+// }
+
+// pthread_mutex_lock(&concList->mutex);
+// HM_chunk lastChunk = concList->lastChunk;
+// concList->firstChunk = NULL;
+// concList->lastChunk = NULL;
+// pthread_mutex_unlock(&concList->mutex);
+
+// assert(firstChunk != NULL);
+// assert(lastChunk != NULL);
+// HM_chunk chunk = firstChunk;
+// while (chunk != NULL) {
+// pointer p = HM_getChunkStart(chunk);
+// pointer frontier = HM_getChunkFrontier(chunk);
+// while (p < frontier)
+// {
+// f->fun(s, (HM_remembered)p, f->env);
+// p += sizeof(struct HM_remembered);
+// }
+// chunk = chunk->nextChunk;
+// }
+
+// if (store->firstChunk != NULL) {
+// store->lastChunk->nextChunk = firstChunk;
+// firstChunk->prevChunk = store->lastChunk;
+// store->lastChunk = lastChunk;
+// }
+// else {
+// store->firstChunk = firstChunk;
+// store->lastChunk = lastChunk;
+// }
+// }
+
+// /*add the chunks back to the list*/
+// pthread_mutex_lock(&concList->mutex);
+// if (concList->firstChunk != NULL) {
+// concList->firstChunk->prevChunk = store->lastChunk;
+// store->lastChunk->nextChunk = concList->firstChunk;
+// concList->firstChunk = store->firstChunk;
+// }
+// else {
+// concList->firstChunk = store->firstChunk;
+// concList->lastChunk = store->lastChunk;
+// }
+// pthread_mutex_unlock(&concList->mutex);
+
+// }
+
+void CC_popAsChunkList(CC_concList concList, HM_chunkList chunkList) {
+ pthread_mutex_lock(&concList->mutex);
+ chunkList->firstChunk = concList->firstChunk;
+ concList->firstChunk = NULL;
+ chunkList->lastChunk = concList->lastChunk;
+ concList->lastChunk = NULL;
+ pthread_mutex_unlock(&concList->mutex);
+}
+
+HM_chunk CC_getLastChunk (CC_concList concList) {
+ HM_chunk c;
+ pthread_mutex_lock(&concList->mutex);
+ c = concList->lastChunk;
+ pthread_mutex_unlock(&concList->mutex);
+ return c;
+}
+
+void CC_appendConcList(CC_concList concList1, CC_concList concList2) {
+
+ HM_chunk firstChunk, lastChunk;
+ pthread_mutex_lock(&concList2->mutex);
+ firstChunk = concList2->firstChunk;
+ lastChunk = concList2->lastChunk;
+ concList2->firstChunk = NULL;
+ concList2->lastChunk = NULL;
+ pthread_mutex_unlock(&concList2->mutex);
+
+ if (firstChunk == NULL || lastChunk == NULL) {
+ return;
+ }
+
+
+ pthread_mutex_lock(&concList1->mutex);
+ if (concList1->lastChunk == NULL) {
+ concList1->firstChunk = firstChunk;
+ concList1->lastChunk = lastChunk;
+ }
+ else {
+ concList1->lastChunk->nextChunk = firstChunk;
+ concList1->lastChunk->retireChunk = true;
+ firstChunk->prevChunk = concList1->lastChunk;
+ concList1->lastChunk = lastChunk;
+ }
+ pthread_mutex_unlock(&concList1->mutex);
+}
+
+void CC_freeChunksInConcListWithInfo(GC_state s, CC_concList concList, void *info, enum BlockPurpose purpose) {
+ struct HM_chunkList _chunkList;
+ CC_popAsChunkList(concList, &(_chunkList));
+ HM_freeChunksInListWithInfo(s, &(_chunkList), info, purpose);
+}
\ No newline at end of file
diff --git a/runtime/gc/concurrent-list.h b/runtime/gc/concurrent-list.h
new file mode 100644
index 000000000..dbe042abd
--- /dev/null
+++ b/runtime/gc/concurrent-list.h
@@ -0,0 +1,40 @@
+/* Copyright (C) 2018-2019 Sam Westrick
+ * Copyright (C) 2015 Ram Raghunathan.
+ *
+ * MLton is released under a HPND-style license.
+ * See the file MLton-LICENSE for details.
+ */
+
+
+#ifndef CC_LIST_H
+#define CC_LIST_H
+
+struct CC_concList;
+typedef struct CC_concList * CC_concList;
+
+#if (defined (MLTON_GC_INTERNAL_TYPES))
+
+struct CC_concList {
+ HM_chunk firstChunk;
+ HM_chunk lastChunk;
+ pthread_mutex_t mutex;
+};
+
+#endif /* MLTON_GC_INTERNAL_TYPES */
+
+#if (defined (MLTON_GC_INTERNAL_FUNCS))
+
+void CC_initConcList(CC_concList concList);
+pointer CC_storeInConcListWithPurpose(CC_concList concList, void* p, size_t objSize, enum BlockPurpose purpose);
+
+// void CC_foreachObjInList(CC_concList concList, size_t objSize, HM_foreachObjClosure f);
+// void CC_foreachRemInConc(GC_state s, CC_concList concList, struct HM_foreachDownptrClosure* f);
+void CC_popAsChunkList(CC_concList concList, HM_chunkList chunkList);
+
+HM_chunk CC_getLastChunk (CC_concList concList);
+void CC_freeChunksInConcListWithInfo(GC_state s, CC_concList concList, void *info, enum BlockPurpose purpose);
+void CC_appendConcList(CC_concList concList1, CC_concList concList2);
+
+#endif /* MLTON_GC_INTERNAL_FUNCS */
+
+#endif /* CC_LIST_H */
diff --git a/runtime/gc/concurrent-stack.c b/runtime/gc/concurrent-stack.c
index 325ed4025..15c4535c5 100644
--- a/runtime/gc/concurrent-stack.c
+++ b/runtime/gc/concurrent-stack.c
@@ -71,7 +71,7 @@ bool CC_stack_data_push(CC_stack_data* stack, void* datum){
stack->storage[stack->size++] = datum;
#endif
- HM_storeInchunkList(&(stack->storage), &(datum), sizeof(datum));
+ HM_storeInChunkListWithPurpose(&(stack->storage), &(datum), sizeof(datum), BLOCK_FOR_FORGOTTEN_SET);
pthread_mutex_unlock(&stack->mutex);
return TRUE;
}
@@ -141,7 +141,7 @@ void CC_stack_data_free(CC_stack_data* stack){
#endif
void CC_stack_data_free(GC_state s, CC_stack_data* stack) {
- HM_freeChunksInList(s, &(stack->storage));
+ HM_freeChunksInListWithInfo(s, &(stack->storage), NULL, BLOCK_FOR_FORGOTTEN_SET);
}
void CC_stack_free(GC_state s, CC_stack* stack) {
@@ -154,7 +154,7 @@ void CC_stack_free(GC_state s, CC_stack* stack) {
void CC_stack_data_clear(GC_state s, CC_stack_data* stack){
pthread_mutex_lock(&stack->mutex);
- HM_freeChunksInList(s, &(stack->storage));
+ HM_freeChunksInListWithInfo(s, &(stack->storage), NULL, BLOCK_FOR_FORGOTTEN_SET);
pthread_mutex_unlock(&stack->mutex);
}
diff --git a/runtime/gc/controls.h b/runtime/gc/controls.h
index e9ff7f394..75af66944 100644
--- a/runtime/gc/controls.h
+++ b/runtime/gc/controls.h
@@ -64,6 +64,7 @@ struct GC_controls {
size_t allocBlocksMinSize;
size_t superblockThreshold; // upper bound on size-class of a superblock
size_t megablockThreshold; // upper bound on size-class of a megablock (unmap above this threshold)
+ struct timespec blockUsageSampleInterval;
float emptinessFraction;
bool debugKeepFreeBlocks;
bool manageEntanglement;
diff --git a/runtime/gc/decheck.c b/runtime/gc/decheck.c
index 288ee4220..14556e204 100644
--- a/runtime/gc/decheck.c
+++ b/runtime/gc/decheck.c
@@ -34,18 +34,18 @@ bool GC_HH_decheckMaxDepth(ARG_USED_FOR_DETECT_ENTANGLEMENT objptr resultRef) {
#ifdef DETECT_ENTANGLEMENT
void decheckInit(GC_state s) {
#if ASSERT
- if (mmap(SYNCH_DEPTHS_BASE, SYNCH_DEPTHS_LEN, PROT_WRITE,
- MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, 0, 0) == MAP_FAILED) {
- perror("mmap error");
- exit(-1);
- }
- memset(synch_depths, 0, MAX_PATHS * sizeof(uint32_t));
- synch_depths[1] = 0;
+ if (mmap(SYNCH_DEPTHS_BASE, SYNCH_DEPTHS_LEN, PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, 0, 0) == MAP_FAILED) {
+ perror("mmap error");
+ exit(-1);
+ }
+ memset(synch_depths, 0, MAX_PATHS * sizeof(uint32_t));
+ synch_depths[1] = 0;
#endif
- GC_thread thread = getThreadCurrent(s);
- thread->decheckState.internal.path = 1;
- thread->decheckState.internal.depth = 0;
+ GC_thread thread = getThreadCurrent(s);
+ thread->decheckState.internal.path = 1;
+ thread->decheckState.internal.depth = 0;
}
#else
inline void decheckInit(GC_state s) {
@@ -55,16 +55,16 @@ inline void decheckInit(GC_state s) {
#endif
static inline unsigned int tree_depth(decheck_tid_t tid) {
- return tid.internal.depth & 0x1f;
+ return tid.internal.depth & 0x1f;
}
static inline unsigned int dag_depth(decheck_tid_t tid) {
- return tid.internal.depth >> 5;
+ return tid.internal.depth >> 5;
}
static inline uint32_t norm_path(decheck_tid_t tid) {
- unsigned int td = tree_depth(tid);
- return tid.internal.path & ((1 << (td+1)) - 1);
+ unsigned int td = tree_depth(tid);
+ return tid.internal.path & ((1 << (td+1)) - 1);
}
@@ -89,31 +89,31 @@ static inline uint32_t decheckGetSyncDepth(GC_thread thread, uint32_t pathLen) {
* refs just to pass values by destination-passing through the FFI.
*/
void GC_HH_decheckFork(GC_state s, uint64_t *left, uint64_t *right) {
- GC_thread thread = getThreadCurrent(s);
- decheck_tid_t tid = thread->decheckState;
- assert(tid.bits != DECHECK_BOGUS_BITS);
- unsigned int h = tree_depth(tid);
- assert(h < MAX_FORK_DEPTH);
-
- decheck_tid_t t1;
- t1.internal.path = (tid.internal.path & ~(1 << h)) | (1 << (h+1));
- t1.internal.depth = tid.internal.depth + (1 << 5) + 1;
- *left = t1.bits;
-
- decheck_tid_t t2;
- t2.internal.path = (tid.internal.path | (1 << h)) | (1 << (h+1));
- t2.internal.depth = tid.internal.depth + (1 << 5) + 1;
- *right = t2.bits;
-
- assert(tree_depth(t1) == tree_depth(tid)+1);
- assert(tree_depth(t2) == tree_depth(tid)+1);
- assert(dag_depth(t1) == dag_depth(tid)+1);
- assert(dag_depth(t2) == dag_depth(tid)+1);
- assert((norm_path(t1) ^ norm_path(t2)) == (uint32_t)(1 << h));
+ GC_thread thread = getThreadCurrent(s);
+ decheck_tid_t tid = thread->decheckState;
+ assert(tid.bits != DECHECK_BOGUS_BITS);
+ unsigned int h = tree_depth(tid);
+ assert(h < MAX_FORK_DEPTH);
+
+ decheck_tid_t t1;
+ t1.internal.path = (tid.internal.path & ~(1 << h)) | (1 << (h+1));
+ t1.internal.depth = tid.internal.depth + (1 << 5) + 1;
+ *left = t1.bits;
+
+ decheck_tid_t t2;
+ t2.internal.path = (tid.internal.path | (1 << h)) | (1 << (h+1));
+ t2.internal.depth = tid.internal.depth + (1 << 5) + 1;
+ *right = t2.bits;
+
+ assert(tree_depth(t1) == tree_depth(tid)+1);
+ assert(tree_depth(t2) == tree_depth(tid)+1);
+ assert(dag_depth(t1) == dag_depth(tid)+1);
+ assert(dag_depth(t2) == dag_depth(tid)+1);
+ assert((norm_path(t1) ^ norm_path(t2)) == (uint32_t)(1 << h));
#if ASSERT
- synch_depths[norm_path(t1)] = dag_depth(t1);
- synch_depths[norm_path(t2)] = dag_depth(t2);
+ synch_depths[norm_path(t1)] = dag_depth(t1);
+ synch_depths[norm_path(t2)] = dag_depth(t2);
#endif
}
#else
@@ -127,19 +127,27 @@ void GC_HH_decheckFork(GC_state s, uint64_t *left, uint64_t *right) {
#ifdef DETECT_ENTANGLEMENT
+
+void setStateIfBogus(HM_chunk chunk, decheck_tid_t tid) {
+ if ((chunk->decheckState).bits == DECHECK_BOGUS_BITS)
+ {
+ chunk->decheckState = tid;
+ }
+}
+
void GC_HH_decheckSetTid(GC_state s, uint64_t bits) {
- decheck_tid_t tid;
- tid.bits = bits;
+ decheck_tid_t tid;
+ tid.bits = bits;
+
+ GC_thread thread = getThreadCurrent(s);
+ thread->decheckState = tid;
- GC_thread thread = getThreadCurrent(s);
- thread->decheckState = tid;
+ setStateIfBogus(HM_getChunkOf((pointer)thread), tid);
+ setStateIfBogus(HM_getChunkOf((pointer)thread->stack), tid);
-// #if ASSERT
-// synch_depths[norm_path(tid)] = dag_depth(tid);
-// #endif
- decheckSetSyncDepth(thread, tree_depth(tid), dag_depth(tid));
+ decheckSetSyncDepth(thread, tree_depth(tid), dag_depth(tid));
- assert(decheckGetSyncDepth(thread, tree_depth(tid)) == synch_depths[norm_path(tid)]);
+ assert(decheckGetSyncDepth(thread, tree_depth(tid)) == synch_depths[norm_path(tid)]);
}
#else
void GC_HH_decheckSetTid(GC_state s, uint64_t bits) {
@@ -165,32 +173,32 @@ uint64_t GC_HH_decheckGetTid(GC_state s, objptr threadp) {
#ifdef DETECT_ENTANGLEMENT
void GC_HH_decheckJoin(GC_state s, uint64_t left, uint64_t right) {
- decheck_tid_t t1;
- t1.bits = left;
- decheck_tid_t t2;
- t2.bits = right;
-
- assert(tree_depth(t1) == tree_depth(t2));
- assert(tree_depth(t1) >= 1);
-
- GC_thread thread = getThreadCurrent(s);
- unsigned int td = tree_depth(t1) - 1;
- unsigned int dd = MAX(dag_depth(t1), dag_depth(t2)) + 1;
- assert(dag_depth(t1) == synch_depths[norm_path(t1)]);
- assert(dag_depth(t2) == synch_depths[norm_path(t2)]);
- decheck_tid_t tid;
- tid.internal.path = t1.internal.path | (1 << td);
- tid.internal.depth = (dd << 5) + td;
- thread->decheckState = tid;
-
- assert(tree_depth(tid) == tree_depth(t1)-1);
+ decheck_tid_t t1;
+ t1.bits = left;
+ decheck_tid_t t2;
+ t2.bits = right;
+
+ assert(tree_depth(t1) == tree_depth(t2));
+ assert(tree_depth(t1) >= 1);
+
+ GC_thread thread = getThreadCurrent(s);
+ unsigned int td = tree_depth(t1) - 1;
+ unsigned int dd = MAX(dag_depth(t1), dag_depth(t2)) + 1;
+ assert(dag_depth(t1) == synch_depths[norm_path(t1)]);
+ assert(dag_depth(t2) == synch_depths[norm_path(t2)]);
+ decheck_tid_t tid;
+ tid.internal.path = t1.internal.path | (1 << td);
+ tid.internal.depth = (dd << 5) + td;
+ thread->decheckState = tid;
+
+ assert(tree_depth(tid) == tree_depth(t1)-1);
#if ASSERT
- synch_depths[norm_path(tid)] = dd;
+ synch_depths[norm_path(tid)] = dd;
#endif
- decheckSetSyncDepth(thread, tree_depth(tid), dd);
+ decheckSetSyncDepth(thread, tree_depth(tid), dd);
- assert(decheckGetSyncDepth(thread, tree_depth(tid)) == synch_depths[norm_path(tid)]);
+ assert(decheckGetSyncDepth(thread, tree_depth(tid)) == synch_depths[norm_path(tid)]);
}
#else
void GC_HH_decheckJoin(GC_state s, uint64_t left, uint64_t right) {
@@ -262,13 +270,15 @@ int lcaHeapDepth(decheck_tid_t t1, decheck_tid_t t2)
uint32_t p1mask = (1 << tree_depth(t1)) - 1;
uint32_t p2 = norm_path(t2);
uint32_t p2mask = (1 << tree_depth(t2)) - 1;
- assert(p1 != p2);
uint32_t shared_mask = p1mask & p2mask;
uint32_t shared_upper_bit = shared_mask+1;
uint32_t x = ((p1 ^ p2) & shared_mask) | shared_upper_bit;
uint32_t lca_bit = x & -x;
// uint32_t lca_mask = lca_bit-1;
int llen = bitIndex(lca_bit);
+ if (p1 == p2) {
+ return tree_depth(t1) + 1;
+ }
assert(llen == lcaLen(p1, p2));
return llen+1;
}
@@ -307,69 +317,284 @@ bool decheckIsOrdered(GC_thread thread, decheck_tid_t t1) {
}
#endif
-
#ifdef DETECT_ENTANGLEMENT
-void decheckRead(GC_state s, objptr ptr) {
- GC_thread thread = getThreadCurrent(s);
- if (thread == NULL)
- return;
- decheck_tid_t tid = thread->decheckState;
- if (tid.bits == DECHECK_BOGUS_BITS)
- return;
- if (!isObjptr(ptr))
- return;
- HM_chunk chunk = HM_getChunkOf(objptrToPointer(ptr, NULL));
- if (chunk == NULL)
- return;
- decheck_tid_t allocator = chunk->decheckState;
- if (allocator.bits == DECHECK_BOGUS_BITS)
- return;
- if (decheckIsOrdered(thread, allocator))
- return;
-
- /** If we get here, there is entanglement. Next is how to handle it. */
-
- assert(!s->controls->manageEntanglement);
- if (!s->controls->manageEntanglement) {
- printf("Entanglement detected: object at %p\n", (void *) ptr);
- printf("Allocator tree depth: %d\n", tree_depth(allocator));
- printf("Allocator path: 0x%x\n", allocator.internal.path);
- printf("Allocator dag depth: %d\n", dag_depth(allocator));
- printf("Reader tree depth: %d\n", tree_depth(tid));
- printf("Reader path: 0x%x\n", tid.internal.path);
- printf("Reader dag depth: %d\n", dag_depth(tid));
- exit(-1);
+
+#if ASSERT
+void traverseAndCheck(
+ GC_state s,
+ __attribute__((unused)) objptr *opp,
+ objptr op,
+ __attribute__((unused)) void *rawArgs)
+{
+ GC_header header = getHeader(objptrToPointer(op, NULL));
+ pointer p = objptrToPointer (op, NULL);
+ assert (pinType(header) == PIN_ANY);
+ assert (!isFwdHeader(header));
+ if (isMutableH(s, header)) {
+ assert (ES_contains(NULL, op));
+ }
+ else {
+ struct GC_foreachObjptrClosure echeckClosure =
+ {.fun = traverseAndCheck, .env = NULL};
+ foreachObjptrInObject(s, p, &trueObjptrPredicateClosure, &echeckClosure, FALSE);
+ }
+}
+#else
+inline void traverseAndCheck(
+ __attribute__((unused)) GC_state s,
+ __attribute__((unused)) objptr *opp,
+ __attribute__((unused)) objptr op,
+ __attribute__((unused)) void *rawArgs)
+{
+ return;
+}
+#endif
+
+static inline objptr getRacyFwdPtr(pointer p) {
+ while (isFwdHeader(getHeader(p))) {
+ p = objptrToPointer(getFwdPtr(p), NULL);
+ }
+ return pointerToObjptr(p, NULL);
+}
+
+void make_entangled(
+ GC_state s,
+ objptr *opp,
+ objptr ptr,
+ void *rawArgs)
+{
+
+ struct ManageEntangledArgs* mea = (struct ManageEntangledArgs*) rawArgs;
+
+ HM_chunk chunk = HM_getChunkOf(objptrToPointer(ptr, NULL));
+ // if (!decheckIsOrdered(mea->root, allocator)) {
+ // // while managing entanglement, we stay ordered wrt the root of the entanglement
+ // return;
+ // }
+
+ pointer p_ptr = objptrToPointer(ptr, NULL);
+ GC_header header = getRacyHeader(p_ptr);
+ assert(!isFwdHeader(header));
+ bool mutable = isMutableH(s, header);
+ bool headerChange = false, pinChange = false;
+ // unpin depth according to the caller
+ uint32_t unpinDepth = mea->unpinDepth;
+
+ objptr new_ptr;
+ if (pinType(header) != PIN_ANY || unpinDepthOfH(header) > unpinDepth)
+ {
+ bool addToRemSet = mea->firstCall;
+ if (mutable) {
+ new_ptr = pinObjectInfo(s, ptr, unpinDepth, PIN_ANY, &headerChange, &pinChange);
+ }
+ else
+ {
+ mea->firstCall = false;
+ struct GC_foreachObjptrClosure emanageClosure =
+ {.fun = make_entangled, .env = rawArgs};
+ // the unpinDepth of reachable maybe smaller.
+ mea->unpinDepth = pinType(header) == PIN_NONE ? unpinDepth : min(unpinDepth, unpinDepthOfH(header));
+ foreachObjptrInObject(s, p_ptr, &trueObjptrPredicateClosure, &emanageClosure, FALSE);
+ new_ptr = pinObjectInfo(s, ptr, unpinDepth, PIN_ANY, &headerChange, &pinChange);
+ assert(pinType(getHeader(objptrToPointer(new_ptr, NULL))) == PIN_ANY);
+ }
+ if (pinChange && addToRemSet)
+ {
+ struct HM_remembered remElem_ = {.object = new_ptr, .from = BOGUS_OBJPTR};
+ HM_HH_rememberAtLevel(HM_getLevelHead(chunk), &(remElem_), true);
+ assert (HM_HH_getDepth(HM_getLevelHead(chunk)) != 1);
}
+ }
+ else {
+ new_ptr = getRacyFwdPtr(p_ptr);
+ }
+ mea->unpinDepth = unpinDepth;
+
+ assert(!hasFwdPtr(objptrToPointer(new_ptr, NULL)));
+ assert(isPinned(new_ptr));
+
+ if (ptr != new_ptr) {
+ // Help LGC move along--because this reader might traverse this pointer
+ // and it shouldn't see the forwarded one
+ assert(hasFwdPtr(objptrToPointer(ptr, NULL)));
+ *opp = new_ptr;
+ }
+
+ if (mutable && !ES_contains(NULL, new_ptr)) {
+ HM_HierarchicalHeap lcaHeap = HM_HH_getHeapAtDepth(s, getThreadCurrent(s), unpinDepth);
+ ES_add(s, HM_HH_getSuspects(lcaHeap), new_ptr);
+ assert(ES_contains(NULL, new_ptr));
+ }
+
+ traverseAndCheck(s, &new_ptr, new_ptr, NULL);
+
+ assert (!mutable || ES_contains(NULL, new_ptr));
+}
+
+objptr manage_entangled(
+ GC_state s,
+ objptr ptr,
+ decheck_tid_t reader)
+{
+
+ // GC_thread thread = getThreadCurrent(s);
+ // decheck_tid_t tid = thread->decheckState;
+ HM_chunk chunk = HM_getChunkOf(objptrToPointer(ptr, NULL));
+ decheck_tid_t allocator = chunk->decheckState;
+
+ if (!s->controls->manageEntanglement && false)
+ {
+ printf("Entanglement detected: object at %p\n", (void *)ptr);
+ printf("Allocator tree depth: %d\n", tree_depth(allocator));
+ printf("Allocator path: 0x%x\n", allocator.internal.path);
+ printf("Allocator dag depth: %d\n", dag_depth(allocator));
+ printf("Reader tree depth: %d\n", tree_depth(allocator));
+ printf("Reader path: 0x%x\n", allocator.internal.path);
+ printf("Reader dag depth: %d\n", dag_depth(allocator));
+ exit(-1);
+ }
+
+ uint32_t unpinDepth = lcaHeapDepth(reader, allocator);
+ GC_header header = getHeader(objptrToPointer (ptr, NULL));
+
+
+ uint32_t current_ud = unpinDepthOfH(header);
+ enum PinType current_pt = pinType(header);
+ bool manage = isFwdHeader(header) ||
+ current_pt != PIN_ANY ||
+ current_ud > unpinDepth;
+
+ if (current_pt != PIN_NONE && current_ud == 0)
+ {
+ return ptr;
+ }
+
+ if (manage) {
+ uint32_t newUnpinDepth = current_pt == PIN_NONE ? unpinDepth : min(current_ud, unpinDepth);
+ struct ManageEntangledArgs mea = {
+ .reader = reader,
+ .root = allocator,
+ .unpinDepth = newUnpinDepth,
+ .firstCall = !(current_pt == PIN_DOWN && current_ud == 1)
+ };
+ make_entangled(s, &ptr, ptr, (void*) &mea);
+ }
+ else {
+ if (isMutableH(s, header) && !ES_contains(NULL, ptr)) {
+ HM_HierarchicalHeap lcaHeap = HM_HH_getHeapAtDepth(s, getThreadCurrent(s), unpinDepth);
+ ES_add(s, HM_HH_getSuspects(lcaHeap), ptr);
+ assert(ES_contains(NULL, ptr));
+ }
+ traverseAndCheck(s, &ptr, ptr, NULL);
+ }
+
+
+ traverseAndCheck(s, &ptr, ptr, NULL);
+ return ptr;
+ // GC_header header = getRacyHeader(objptrToPointer(ptr, NULL));
+ // bool mutable = isMutableH(s, header);
+ // bool headerChange = false, pinChange = false;
+ // objptr new_ptr = ptr;
+ // if (pinType(header) != PIN_ANY || unpinDepthOfH(header) > unpinDepth)
+ // {
+ // if (mutable)
+ // {
+ // new_ptr = pinObjectInfo(ptr, unpinDepth, PIN_ANY, &headerChange, &pinChange);
+ // if (!ES_contains(NULL, new_ptr)) {
+ // HM_HierarchicalHeap lcaHeap = HM_HH_getHeapAtDepth(s, thread, unpinDepth);
+ // ES_add(s, HM_HH_getSuspects(lcaHeap), new_ptr);
+ // }
+ // }
+ // else
+ // {
+ // struct GC_foreachObjptrClosure emanageClosure =
+ // {.fun = manage_entangled, .env = NULL};
+ // foreachObjptrInObject(s, ptr, &trueObjptrPredicateClosure, &emanageClosure, FALSE);
+ // new_ptr = pinObjectInfo(ptr, unpinDepth, PIN_ANY, &headerChange, &pinChange);
+ // }
+ // if (pinChange)
+ // {
+ // struct HM_remembered remElem_ = {.object = new_ptr, .from = BOGUS_OBJPTR};
+ // HM_HH_rememberAtLevel(HM_getLevelHeadPathCompress(chunk), &(remElem_), true);
+ // }
+ // }
+ // else
+ // {
+ // if (!mutable)
+ // {
+ // traverseAndCheck(s, &new_ptr, new_ptr, NULL);
+ // }
+ // }
+
+ // traverseAndCheck(s, &new_ptr, new_ptr, NULL);
+ // return new_ptr;
+}
+
+#else
+objptr manage_entangled(GC_state s, objptr ptr, decheck_tid_t reader) {
+ (void)s;
+ (void)ptr;
+ (void)reader;
+ return ptr;
+}
+#endif
+
+#ifdef DETECT_ENTANGLEMENT
+
+bool decheck(GC_state s, objptr ptr) {
+ if (!s->controls->manageEntanglement) {
+ return true;
+ }
+ GC_thread thread = getThreadCurrent(s);
+ if (thread == NULL)
+ return true;
+ decheck_tid_t tid = thread->decheckState;
+ if (tid.bits == DECHECK_BOGUS_BITS)
+ return true;
+ if (!isObjptr(ptr))
+ return true;
+ HM_chunk chunk = HM_getChunkOf(objptrToPointer(ptr, NULL));
+ if (chunk == NULL)
+ return true;
+ decheck_tid_t allocator = chunk->decheckState;
+ if (allocator.bits == DECHECK_BOGUS_BITS) {
+ // assert (false);
+ return true;
+ }
+ if (decheckIsOrdered(thread, allocator))
+ return true;
+
+ return false;
#if 0
- s->cumulativeStatistics->numEntanglementsDetected++;
-
- /** set the chunk's disentangled depth. This synchronizes with GC, if there
- * is GC happening by the owner of this chunk.
- */
- int32_t newDD = lcaHeapDepth(thread->decheckState, allocator);
- assert(newDD >= 1);
- while (TRUE) {
- int32_t oldDD = atomicLoadS32(&(chunk->disentangledDepth));
-
- /** Negative means it's frozen for GC. Wait until it's unfrozen... */
- while (oldDD < 0) {
- pthread_yield();
- oldDD = atomicLoadS32(&(chunk->disentangledDepth));
- }
-
- /** And then attempt to update. */
- if (newDD >= oldDD ||
- __sync_bool_compare_and_swap(&(chunk->disentangledDepth), oldDD, newDD))
- break;
+
+ /** set the chunk's disentangled depth. This synchronizes with GC, if there
+ * is GC happening by the owner of this chunk.
+ */
+ int32_t newDD = lcaHeapDepth(thread->decheckState, allocator);
+ assert(newDD >= 1);
+ while (TRUE) {
+ int32_t oldDD = atomicLoadS32(&(chunk->disentangledDepth));
+
+ /** Negative means it's frozen for GC. Wait until it's unfrozen... */
+ while (oldDD < 0) {
+ pthread_yield();
+ oldDD = atomicLoadS32(&(chunk->disentangledDepth));
}
+
+ /** And then attempt to update. */
+ if (newDD >= oldDD ||
+ __sync_bool_compare_and_swap(&(chunk->disentangledDepth), oldDD, newDD))
+ break;
+ }
#endif
}
#else
-void decheckRead(GC_state s, objptr ptr) {
+bool decheck(GC_state s, objptr ptr)
+{
(void)s;
(void)ptr;
- return;
+ return true;
}
#endif
@@ -400,11 +625,27 @@ void GC_HH_copySyncDepthsFromThread(GC_state s, objptr victimThread, uint32_t st
*/
memcpy(to, from, DECHECK_DEPTHS_LEN * sizeof(uint32_t));
}
+
#else
-void GC_HH_copySyncDepthsFromThread(GC_state s, objptr victimThread, uint32_t stealDepth) {
+void GC_HH_copySyncDepthsFromThread(GC_state s, objptr victimThread, uint32_t stealDepth)
+{
(void)s;
(void)victimThread;
(void)stealDepth;
return;
}
#endif
+
+// returns true if the object is unpinned.
+bool disentangleObject(GC_state s, objptr op, uint32_t opDepth) {
+ if (isPinned(op) && unpinDepthOf(op) >= opDepth) {
+ bool success = tryUnpinWithDepth(op, opDepth);
+ if (success && ES_contains(NULL, op)) {
+ ES_unmark(s, op);
+ return true;
+ }
+ return false;
+ }
+ return true;
+}
+
diff --git a/runtime/gc/decheck.h b/runtime/gc/decheck.h
index 92e26a3d1..406b9c207 100644
--- a/runtime/gc/decheck.h
+++ b/runtime/gc/decheck.h
@@ -20,6 +20,14 @@ typedef union {
uint64_t bits;
} decheck_tid_t;
+struct ManageEntangledArgs
+{
+ decheck_tid_t reader;
+ decheck_tid_t root;
+ uint32_t unpinDepth;
+ bool firstCall;
+};
+
#define DECHECK_BOGUS_BITS ((uint64_t)0)
#define DECHECK_BOGUS_TID ((decheck_tid_t){ .bits = DECHECK_BOGUS_BITS })
@@ -40,10 +48,12 @@ PRIVATE bool GC_HH_decheckMaxDepth(objptr resultRef);
#if (defined (MLTON_GC_INTERNAL_FUNCS))
void decheckInit(GC_state s);
-void decheckRead(GC_state s, objptr ptr);
+bool decheck(GC_state s, objptr ptr);
bool decheckIsOrdered(GC_thread thread, decheck_tid_t t1);
int lcaHeapDepth(decheck_tid_t t1, decheck_tid_t t2);
-
+bool disentangleObject(GC_state s, objptr op, uint32_t opDepth);
+objptr manage_entangled(GC_state s, objptr ptr, decheck_tid_t reader);
+void traverseAndCheck(GC_state s, objptr *opp ,objptr op, void *rawArgs);
#endif /* (defined (MLTON_GC_INTERNAL_FUNCS)) */
#endif /* _DECHECK_H_ */
diff --git a/runtime/gc/ebr.c b/runtime/gc/ebr.c
new file mode 100644
index 000000000..ec7847371
--- /dev/null
+++ b/runtime/gc/ebr.c
@@ -0,0 +1,170 @@
+/* Copyright (C) 2021 Sam Westrick
+ * Copyright (C) 2022 Jatin Arora
+ *
+ * MLton is released under a HPND-style license.
+ * See the file MLton-LICENSE for details.
+ */
+
+#if (defined(MLTON_GC_INTERNAL_FUNCS))
+
+/** Helpers for packing/unpacking announcements. DEBRA packs epochs with a
+ * "quiescent" bit, the idea being that processors should set the bit during
+ * quiescent periods (between operations) and have it unset otherwise (i.e.
+ * during an operation). Being precise about quiescent periods in this way
+ * is helpful for reclamation, because in order to advance the epoch, all we
+ * need to know is that every processor has been in a quiescent period since
+ * the beginning of the last epoch.
+ *
+ * But note that updating the quiescent bits is only efficient if we can
+ * amortize the cost of the setting/unsetting the bit with other nearby
+ * operations. If we assumed that the typical state for each processor
+ * is quiescent and then paid for non-quiescent periods, this would
+ * be WAY too expensive. In our case, processors are USUALLY NON-QUIESCENT,
+ * due to depth queries at the write-barrier.
+ *
+ * So
+ */
+#define PACK(epoch, qbit) ((((size_t)(epoch)) << 1) | ((qbit)&1))
+#define UNPACK_EPOCH(announcement) ((announcement) >> 1)
+#define UNPACK_QBIT(announcement) ((announcement)&1)
+#define SET_Q_TRUE(announcement) ((announcement) | (size_t)1)
+#define SET_Q_FALSE(announcement) ((announcement) & (~(size_t)1))
+
+#define ANNOUNCEMENT_PADDING 16
+
+static inline size_t getAnnouncement(EBR_shared ebr, uint32_t pid)
+{
+ return ebr->announce[ANNOUNCEMENT_PADDING * pid];
+}
+
+static inline void setAnnouncement(EBR_shared ebr, uint32_t pid, size_t ann)
+{
+ ebr->announce[ANNOUNCEMENT_PADDING * pid] = ann;
+}
+
+void EBR_enterQuiescentState(GC_state s, EBR_shared ebr)
+{
+ uint32_t mypid = s->procNumber;
+ setAnnouncement(ebr, mypid, SET_Q_TRUE(getAnnouncement(ebr, mypid)));
+}
+
+static void rotateAndReclaim(GC_state s, EBR_shared ebr)
+{
+ uint32_t mypid = s->procNumber;
+
+ int limboIdx = (ebr->local[mypid].limboIdx + 1) % 3;
+ ebr->local[mypid].limboIdx = limboIdx;
+ HM_chunkList limboBag = &(ebr->local[mypid].limboBags[limboIdx]);
+
+ // Free all HH records in the limbo bag.
+ for (HM_chunk chunk = HM_getChunkListFirstChunk(limboBag);
+ NULL != chunk;
+ chunk = chunk->nextChunk)
+ {
+ for (pointer p = HM_getChunkStart(chunk);
+ p < HM_getChunkFrontier(chunk);
+ p += sizeof(void *))
+ {
+ ebr->freeFun(s, *(void **)p);
+ // HM_UnionFindNode hufp = *(HM_UnionFindNode *)p;
+ // assert(hufp->payload != NULL);
+ // freeFixedSize(getHHAllocator(s), hufp->payload);
+ // freeFixedSize(getUFAllocator(s), hufp);
+ }
+ }
+
+ HM_freeChunksInListWithInfo(s, limboBag, NULL, BLOCK_FOR_EBR);
+ HM_initChunkList(limboBag); // clear it out
+}
+
+EBR_shared EBR_new(GC_state s, EBR_freeRetiredObj freeFun)
+{
+ EBR_shared ebr = malloc(sizeof(struct EBR_shared));
+
+ ebr->epoch = 0;
+ ebr->announce =
+ malloc(s->numberOfProcs * ANNOUNCEMENT_PADDING * sizeof(size_t));
+ ebr->local =
+ malloc(s->numberOfProcs * sizeof(struct EBR_local));
+ ebr->freeFun = freeFun;
+
+ for (uint32_t i = 0; i < s->numberOfProcs; i++)
+ {
+ // Everyone starts by announcing epoch = 0 and is non-quiescent
+ setAnnouncement(ebr, i, PACK(0, 0));
+ ebr->local[i].limboIdx = 0;
+ ebr->local[i].checkNext = 0;
+ for (int j = 0; j < 3; j++)
+ HM_initChunkList(&(ebr->local[i].limboBags[j]));
+ }
+ return ebr;
+}
+
+void EBR_leaveQuiescentState(GC_state s, EBR_shared ebr)
+{
+ uint32_t mypid = s->procNumber;
+ uint32_t numProcs = s->numberOfProcs;
+
+ size_t globalEpoch = ebr->epoch;
+ size_t myann = getAnnouncement(ebr, mypid);
+ size_t myEpoch = UNPACK_EPOCH(myann);
+ assert(globalEpoch >= myEpoch);
+
+ if (myEpoch != globalEpoch)
+ {
+ ebr->local[mypid].checkNext = 0;
+ /** Advance into the current epoch. To do so, we need to clear the limbo
+ * bag of the epoch we're moving into.
+ */
+ rotateAndReclaim(s, ebr);
+ }
+ // write a function which takes a number of reads of otherann as an argument
+ uint32_t otherpid = (ebr->local[mypid].checkNext) % numProcs;
+ size_t otherann = getAnnouncement(ebr, otherpid);
+ if (UNPACK_EPOCH(otherann) == globalEpoch || UNPACK_QBIT(otherann))
+ {
+ uint32_t c = ++ebr->local[mypid].checkNext;
+ if (c >= numProcs)
+ {
+ __sync_val_compare_and_swap(&(ebr->epoch), globalEpoch, globalEpoch + 1);
+ }
+ }
+
+ setAnnouncement(ebr, mypid, PACK(globalEpoch, 0));
+}
+
+void EBR_retire(GC_state s, EBR_shared ebr, void *ptr)
+{
+ uint32_t mypid = s->procNumber;
+ int limboIdx = ebr->local[mypid].limboIdx;
+ HM_chunkList limboBag = &(ebr->local[mypid].limboBags[limboIdx]);
+ HM_chunk chunk = HM_getChunkListLastChunk(limboBag);
+
+ // fast path: bump frontier in chunk
+
+ if (NULL != chunk &&
+ HM_getChunkSizePastFrontier(chunk) >= sizeof(void *))
+ {
+ pointer p = HM_getChunkFrontier(chunk);
+ *(void **) p = ptr;
+ HM_updateChunkFrontierInList(limboBag, chunk, p + sizeof(void *));
+ return;
+ }
+
+ // slow path: allocate new chunk
+
+ chunk = HM_allocateChunkWithPurpose(
+ limboBag,
+ sizeof(void *),
+ BLOCK_FOR_EBR);
+
+ assert(NULL != chunk &&
+ HM_getChunkSizePastFrontier(chunk) >= sizeof(void *));
+
+ pointer p = HM_getChunkFrontier(chunk);
+ *(void **) p = ptr;
+ HM_updateChunkFrontierInList(limboBag, chunk, p + sizeof(void *));
+ return;
+}
+
+#endif // MLTON_GC_INTERNAL_FUNCS
diff --git a/runtime/gc/ebr.h b/runtime/gc/ebr.h
new file mode 100644
index 000000000..dbf2be156
--- /dev/null
+++ b/runtime/gc/ebr.h
@@ -0,0 +1,57 @@
+/* Copyright (C) 2021 Sam Westrick
+ * Copyright (C) 2022 Jatin Arora
+ *
+ * MLton is released under a HPND-style license.
+ * See the file MLton-LICENSE for details.
+ */
+
+/** Epoch-based reclamation (EBR) of hierarchical heap records.
+ */
+
+#ifndef EBR_H_
+#define EBR_H_
+
+#if (defined(MLTON_GC_INTERNAL_TYPES))
+
+struct EBR_local
+{
+ struct HM_chunkList limboBags[3];
+ int limboIdx;
+ uint32_t checkNext;
+} __attribute__((aligned(128)));
+
+typedef void (*EBR_freeRetiredObj) (GC_state s, void *ptr);
+
+// There is exactly one of these! Everyone shares a reference to it.
+typedef struct EBR_shared
+{
+ size_t epoch;
+
+ // announcement array, length = num procs
+ // each announcement is packed: 63 bits for epoch, 1 bit for quiescent bit
+ size_t *announce;
+
+ // processor-local data, length = num procs
+ struct EBR_local *local;
+
+ EBR_freeRetiredObj freeFun;
+} * EBR_shared;
+
+#else
+
+struct EBR_local;
+struct EBR_shared;
+typedef struct EBR_shared *EBR_shared;
+
+#endif // MLTON_GC_INTERNAL_TYPES
+
+#if (defined(MLTON_GC_INTERNAL_FUNCS))
+
+EBR_shared EBR_new(GC_state s, EBR_freeRetiredObj freeFun);
+void EBR_enterQuiescentState(GC_state s, EBR_shared ebr);
+void EBR_leaveQuiescentState(GC_state s, EBR_shared ebr);
+void EBR_retire(GC_state s, EBR_shared ebr, void *ptr);
+
+#endif // MLTON_GC_INTERNAL_FUNCS
+
+#endif // EBR_H_
diff --git a/runtime/gc/entangled-ebr.c b/runtime/gc/entangled-ebr.c
new file mode 100644
index 000000000..8e934762c
--- /dev/null
+++ b/runtime/gc/entangled-ebr.c
@@ -0,0 +1,29 @@
+/* Copyright (C) 2022 Jatin Arora
+ *
+ * MLton is released under a HPND-style license.
+ * See the file MLton-LICENSE for details.
+ */
+
+#if (defined (MLTON_GC_INTERNAL_FUNCS))
+
+void freeChunk(GC_state s, void *ptr) {
+ HM_freeChunkWithInfo(s, (HM_chunk)ptr, NULL, BLOCK_FOR_HEAP_CHUNK);
+}
+
+void HM_EBR_init(GC_state s) {
+ s->hmEBR = EBR_new(s, &freeChunk);
+}
+
+void HM_EBR_enterQuiescentState (GC_state s) {
+ EBR_enterQuiescentState(s, s->hmEBR);
+}
+
+void HM_EBR_leaveQuiescentState(GC_state s) {
+ EBR_leaveQuiescentState(s, s->hmEBR);
+}
+
+void HM_EBR_retire(GC_state s, HM_chunk chunk) {
+ EBR_retire(s, s->hmEBR, (void *)chunk);
+}
+
+#endif // MLTON_GC_INTERNAL_FUNCS
diff --git a/runtime/gc/entangled-ebr.h b/runtime/gc/entangled-ebr.h
new file mode 100644
index 000000000..85cdf59fe
--- /dev/null
+++ b/runtime/gc/entangled-ebr.h
@@ -0,0 +1,17 @@
+/** Epoch-based reclamation (EBR) of hierarchical heap records.
+ */
+
+#ifndef ENTANGLED_EBR_H_
+#define ENTANGLED_EBR_H_
+
+#if (defined(MLTON_GC_INTERNAL_FUNCS))
+
+
+void HM_EBR_init(GC_state s);
+void HM_EBR_enterQuiescentState(GC_state s);
+void HM_EBR_leaveQuiescentState(GC_state s);
+void HM_EBR_retire(GC_state s, HM_chunk chunk);
+
+#endif // MLTON_GC_INTERNAL_FUNCS
+
+#endif //CHUNK_EBR_H_
diff --git a/runtime/gc/entanglement-suspects.c b/runtime/gc/entanglement-suspects.c
index 434231b76..30220f9b8 100644
--- a/runtime/gc/entanglement-suspects.c
+++ b/runtime/gc/entanglement-suspects.c
@@ -6,7 +6,30 @@ static inline bool mark_suspect(objptr op)
{
pointer p = objptrToPointer(op, NULL);
GC_header header = __sync_fetch_and_or(getHeaderp(p), SUSPECT_MASK);
- assert (1 == (header & GC_VALID_HEADER_MASK));
+ assert(1 == (header & GC_VALID_HEADER_MASK));
+ // while (TRUE)
+ // {
+ // GC_header header = getHeader(p);
+ // GC_header newHeader = header | SUSPECT_MASK;
+ // if (header == newHeader)
+ // {
+ // /*
+ // just return because the suspect bit is already set
+ // */
+ // return false;
+ // }
+ // else
+ // {
+ // /*
+ // otherwise, install the new header with the bit set. this might fail
+ // if a concurrent thread changes the header first.
+ // */
+ // if (__sync_bool_compare_and_swap(getHeaderp(p), header, newHeader))
+ // return true;
+ // }
+ // }
+ // DIE("should be impossible to reach here");
+ /*return true if this call marked the header, false if someone else did*/
return !suspicious_header(header);
}
@@ -21,14 +44,36 @@ static inline bool is_suspect(objptr op)
}
void clear_suspect(
- __attribute__((unused)) GC_state s,
- __attribute__((unused)) objptr *opp,
+ GC_state s,
+ objptr *opp,
objptr op,
- __attribute__((unused)) void *rawArgs)
+ void *rawArgs)
{
pointer p = objptrToPointer(op, NULL);
- assert(isObjptr(op) && is_suspect(op));
- __sync_fetch_and_and(getHeaderp(p), ~(SUSPECT_MASK));
+ ES_clearArgs eargs = (ES_clearArgs) rawArgs;
+
+ GC_header header = getHeader(p);
+ uint32_t unpinDepth = (header & UNPIN_DEPTH_MASK) >> UNPIN_DEPTH_SHIFT;
+
+ if (pinType(header) == PIN_ANY && unpinDepth < eargs->heapDepth) {
+ /* Not ready to be cleared */
+ HM_HierarchicalHeap unpinHeap = HM_HH_getHeapAtDepth(s, eargs->thread, unpinDepth);
+ HM_storeInChunkListWithPurpose(HM_HH_getSuspects(unpinHeap), opp, sizeof(objptr), BLOCK_FOR_SUSPECTS);
+ eargs->numMoved++;
+ return;
+ }
+
+ GC_header newHeader = header & ~(SUSPECT_MASK);
+ if (__sync_bool_compare_and_swap(getHeaderp(p), header, newHeader)) {
+ /* clearing successful */
+ eargs->numCleared++;
+ return;
+ }
+ else {
+ /*oops something changed in b/w, let's try at the next join*/
+ HM_storeInChunkListWithPurpose(eargs->newList, opp, sizeof(objptr), BLOCK_FOR_SUSPECTS);
+ eargs->numFailed++;
+ }
}
bool ES_contains(__attribute__((unused)) HM_chunkList es, objptr op)
@@ -36,6 +81,14 @@ bool ES_contains(__attribute__((unused)) HM_chunkList es, objptr op)
return is_suspect(op);
}
+bool ES_mark(__attribute__((unused)) GC_state s, objptr op) {
+ return mark_suspect(op);
+}
+
+void ES_unmark(GC_state s, objptr op) {
+ clear_suspect(s, &op, op, NULL);
+}
+
void ES_add(__attribute__((unused)) GC_state s, HM_chunkList es, objptr op)
{
@@ -45,7 +98,7 @@ void ES_add(__attribute__((unused)) GC_state s, HM_chunkList es, objptr op)
return;
}
s->cumulativeStatistics->numSuspectsMarked++;
- HM_storeInchunkList(es, &op, sizeof(objptr));
+ HM_storeInChunkListWithPurpose(es, &op, sizeof(objptr), BLOCK_FOR_SUSPECTS);
}
int ES_foreachSuspect(
@@ -88,12 +141,241 @@ void ES_move(HM_chunkList list1, HM_chunkList list2) {
HM_initChunkList(list2);
}
-void ES_clear(GC_state s, HM_chunkList es)
+static size_t SUSPECTS_THRESHOLD = 10000;
+
+
+void ES_clear(GC_state s, HM_HierarchicalHeap hh)
{
+ struct timespec startTime;
+ struct timespec stopTime;
+
+ HM_chunkList es = HM_HH_getSuspects(hh);
+ uint32_t heapDepth = HM_HH_getDepth(hh);
+ struct HM_chunkList oldList = *(es);
+ HM_initChunkList(HM_HH_getSuspects(hh));
+
+ size_t numSuspects = HM_getChunkListUsedSize(&oldList) / sizeof(objptr);
+ if (numSuspects >= SUSPECTS_THRESHOLD) {
+ timespec_now(&startTime);
+ }
+
+ struct ES_clearArgs eargs = {
+ .newList = HM_HH_getSuspects(hh),
+ .heapDepth = heapDepth,
+ .thread = getThreadCurrent(s),
+ .numMoved = 0,
+ .numCleared = 0,
+ .numFailed = 0
+ };
+
struct GC_foreachObjptrClosure fObjptrClosure =
- {.fun = clear_suspect, .env = NULL};
- int numSuspects = ES_foreachSuspect(s, es, &fObjptrClosure);
- s->cumulativeStatistics->numSuspectsCleared+=numSuspects;
+ {.fun = clear_suspect, .env = &(eargs)};
+#if ASSERT
+ int ns = ES_foreachSuspect(s, &oldList, &fObjptrClosure);
+ assert(numSuspects == (size_t)ns);
+#else
+ ES_foreachSuspect(s, &oldList, &fObjptrClosure);
+#endif
+ s->cumulativeStatistics->numSuspectsCleared += numSuspects;
+
+ HM_freeChunksInListWithInfo(s, &(oldList), NULL, BLOCK_FOR_SUSPECTS);
+
+ if (eargs.numFailed > 0) {
+ LOG(LM_HIERARCHICAL_HEAP, LL_INFO,
+ "WARNING: %zu failed suspect clear(s)",
+ eargs.numFailed);
+ }
- HM_freeChunksInList(s, es);
+ if (numSuspects >= SUSPECTS_THRESHOLD) {
+ timespec_now(&stopTime);
+ timespec_sub(&stopTime, &startTime);
+ LOG(LM_HIERARCHICAL_HEAP, LL_FORCE,
+ "time to process %zu suspects (%zu cleared, %zu moved) at depth %u: %ld.%09ld",
+ numSuspects,
+ eargs.numCleared,
+ eargs.numMoved,
+ HM_HH_getDepth(hh),
+ (long)stopTime.tv_sec,
+ stopTime.tv_nsec
+ );
+ }
+}
+
+
+size_t ES_numSuspects(
+ __attribute__((unused)) GC_state s,
+ HM_HierarchicalHeap hh)
+{
+ return HM_getChunkListUsedSize(HM_HH_getSuspects(hh)) / sizeof(objptr);
+}
+
+
+ES_clearSet ES_takeClearSet(
+ __attribute__((unused)) GC_state s,
+ HM_HierarchicalHeap hh)
+{
+ struct timespec startTime;
+ timespec_now(&startTime);
+
+ size_t numSuspects = ES_numSuspects(s, hh);
+
+ ES_clearSet result = malloc(sizeof(struct ES_clearSet));
+ HM_chunkList es = HM_HH_getSuspects(hh);
+ struct HM_chunkList oldList = *es;
+ HM_initChunkList(es);
+
+ size_t numChunks = 0;
+ for (HM_chunk cursor = HM_getChunkListFirstChunk(&oldList);
+ cursor != NULL;
+ cursor = cursor->nextChunk)
+ {
+ numChunks++;
+ }
+
+ HM_chunk *chunkArray = malloc(numChunks * sizeof(HM_chunk));
+ result->chunkArray = chunkArray;
+ result->lenChunkArray = numChunks;
+ result->depth = HM_HH_getDepth(hh);
+ result->numSuspects = numSuspects;
+ result->startTime = startTime;
+
+ size_t i = 0;
+ for (HM_chunk cursor = HM_getChunkListFirstChunk(&oldList);
+ cursor != NULL;
+ cursor = cursor->nextChunk)
+ {
+ chunkArray[i] = cursor;
+ i++;
+ }
+
+ return result;
}
+
+
+size_t ES_numChunksInClearSet(
+ __attribute__((unused)) GC_state s,
+ ES_clearSet es)
+{
+ return es->lenChunkArray;
+}
+
+
+void clear_suspect_par_safe(
+ __attribute__((unused)) GC_state s,
+ objptr *opp,
+ objptr op,
+ struct HM_chunkList *output,
+ size_t lenOutput)
+{
+ pointer p = objptrToPointer(op, NULL);
+ while (TRUE) {
+ GC_header header = getHeader(p);
+ uint32_t unpinDepth = (header & UNPIN_DEPTH_MASK) >> UNPIN_DEPTH_SHIFT;
+
+ // Note: lenOutput == depth of heap whose suspects we are clearing
+ if (pinType(header) == PIN_ANY && unpinDepth < lenOutput) {
+ /* Not ready to be cleared; move it instead */
+ HM_storeInChunkListWithPurpose(&(output[unpinDepth]), opp, sizeof(objptr), BLOCK_FOR_SUSPECTS);
+ return;
+ }
+
+ GC_header newHeader = header & ~(SUSPECT_MASK);
+ if (__sync_bool_compare_and_swap(getHeaderp(p), header, newHeader)) {
+ /* clearing successful */
+ return;
+ }
+ else {
+ // oops, something changed in between
+ // Is this possible?
+ // - Seems like it could be, if there is a CGC happening
+ // simultaneously at the same level (and it's marking/unmarking objects)?
+ // - If this is the only possibility, then we should be able to just
+ // do the CAS with newHeader in a loop?
+ LOG(LM_HIERARCHICAL_HEAP, LL_INFO,
+ "WARNING: failed suspect clear; trying again");
+ }
+ }
+}
+
+
+ES_finishedClearSetGrain ES_processClearSetGrain(
+ GC_state s,
+ ES_clearSet es,
+ size_t start,
+ size_t stop)
+{
+ ES_finishedClearSetGrain result = malloc(sizeof(struct ES_finishedClearSetGrain));
+ struct HM_chunkList *output = malloc(es->depth * sizeof(struct HM_chunkList));
+ result->output = output;
+ result->lenOutput = es->depth;
+
+ // initialize output
+ for (uint32_t i = 0; i < es->depth; i++) {
+ HM_initChunkList(&(output[i]));
+ }
+
+ // process each input chunk
+ for (size_t i = start; i < stop; i++) {
+ HM_chunk chunk = es->chunkArray[i];
+ pointer p = HM_getChunkStart(chunk);
+ pointer frontier = HM_getChunkFrontier(chunk);
+ while (p < frontier)
+ {
+ objptr* opp = (objptr*)p;
+ objptr op = *opp;
+ if (isObjptr(op)) {
+ clear_suspect_par_safe(s, opp, op, output, es->depth);
+ }
+ p += sizeof(objptr);
+ }
+ }
+
+ return result;
+}
+
+
+void ES_commitFinishedClearSetGrain(
+ GC_state s,
+ GC_thread thread,
+ ES_finishedClearSetGrain es)
+{
+ for (size_t i = 0; i < es->lenOutput; i++) {
+ HM_chunkList list = &(es->output[i]);
+ if (HM_getChunkListSize(list) == 0)
+ continue;
+
+ HM_HierarchicalHeap dest = HM_HH_getHeapAtDepth(s, thread, i);
+ HM_appendChunkList(HM_HH_getSuspects(dest), list);
+ HM_initChunkList(list);
+ }
+
+ free(es->output);
+ free(es);
+}
+
+
+void ES_deleteClearSet(GC_state s, ES_clearSet es) {
+ s->cumulativeStatistics->numSuspectsCleared += es->numSuspects;
+
+ for (size_t i = 0; i < es->lenChunkArray; i++) {
+ HM_freeChunkWithInfo(s, es->chunkArray[i], NULL, BLOCK_FOR_SUSPECTS);
+ }
+
+ size_t numSuspects = es->numSuspects;
+ uint32_t depth = es->depth;
+ struct timespec startTime = es->startTime;
+ free(es->chunkArray);
+ free(es);
+
+ struct timespec stopTime;
+ timespec_now(&stopTime);
+ timespec_sub(&stopTime, &startTime);
+ LOG(LM_HIERARCHICAL_HEAP, LL_INFO,
+ "time to process %zu suspects at depth %u: %ld.%09ld",
+ numSuspects,
+ depth,
+ (long)stopTime.tv_sec,
+ stopTime.tv_nsec
+ );
+
+}
\ No newline at end of file
diff --git a/runtime/gc/entanglement-suspects.h b/runtime/gc/entanglement-suspects.h
index 235e987e6..9afabb960 100644
--- a/runtime/gc/entanglement-suspects.h
+++ b/runtime/gc/entanglement-suspects.h
@@ -6,17 +6,53 @@
#define SUSPECT_MASK ((GC_header)0x40000000)
#define SUSPECT_SHIFT 30
+typedef struct ES_clearArgs {
+ HM_chunkList newList;
+ uint32_t heapDepth;
+ GC_thread thread;
+ size_t numMoved;
+ size_t numFailed;
+ size_t numCleared;
+} * ES_clearArgs;
+
+
+typedef struct ES_clearSet {
+ HM_chunk *chunkArray; // array of chunks that need to be processed
+ size_t lenChunkArray; // len(chunkArray)
+ uint32_t depth;
+ size_t numSuspects;
+ struct timespec startTime;
+} * ES_clearSet;
+
+typedef struct ES_finishedClearSetGrain {
+ struct HM_chunkList *output; // output[d]: unsuccessful clears that were moved to depth d
+ size_t lenOutput; // len(output array)
+} * ES_finishedClearSetGrain;
+
+
+bool ES_mark(__attribute__((unused)) GC_state s, objptr op);
+void ES_unmark(GC_state s, objptr op);
+
void ES_add(GC_state s, HM_chunkList es, objptr op);
bool ES_contains(HM_chunkList es, objptr op);
-HM_chunkList ES_append (GC_state s, HM_chunkList es1, HM_chunkList es2);
+HM_chunkList ES_append(GC_state s, HM_chunkList es1, HM_chunkList es2);
+
+void ES_clear(GC_state s, HM_HierarchicalHeap hh);
-void ES_clear(GC_state s, HM_chunkList es);
+// These functions allow us to clear a suspect set in parallel,
+// by integrating with the scheduler. The idea is...
+size_t ES_numSuspects(GC_state s, HM_HierarchicalHeap hh);
+ES_clearSet ES_takeClearSet(GC_state s, HM_HierarchicalHeap hh);
+size_t ES_numChunksInClearSet(GC_state s, ES_clearSet es);
+ES_finishedClearSetGrain ES_processClearSetGrain(GC_state s, ES_clearSet es, size_t start, size_t stop);
+void ES_commitFinishedClearSetGrain(GC_state s, GC_thread thread, ES_finishedClearSetGrain es);
+void ES_deleteClearSet(GC_state s, ES_clearSet es);
void ES_move(HM_chunkList list1, HM_chunkList list2);
-int ES_foreachSuspect(GC_state s, HM_chunkList storage, struct GC_foreachObjptrClosure* fObjptrClosure);
+int ES_foreachSuspect(GC_state s, HM_chunkList storage, struct GC_foreachObjptrClosure * fObjptrClosure);
#endif
#endif
\ No newline at end of file
diff --git a/runtime/gc/fixed-size-allocator.c b/runtime/gc/fixed-size-allocator.c
index 9eec57607..8a6757d08 100644
--- a/runtime/gc/fixed-size-allocator.c
+++ b/runtime/gc/fixed-size-allocator.c
@@ -6,7 +6,11 @@
#if (defined (MLTON_GC_INTERNAL_FUNCS))
-void initFixedSizeAllocator(FixedSizeAllocator fsa, size_t fixedSize) {
+void initFixedSizeAllocator(
+ FixedSizeAllocator fsa,
+ size_t fixedSize,
+ enum BlockPurpose purpose)
+{
size_t minSize = sizeof(struct FixedSizeElement);
fsa->fixedSize = align(fixedSize < minSize ? minSize : fixedSize, 8);
HM_initChunkList(&(fsa->buffer));
@@ -16,6 +20,7 @@ void initFixedSizeAllocator(FixedSizeAllocator fsa, size_t fixedSize) {
fsa->numAllocated = 0;
fsa->numLocalFreed = 0;
fsa->numSharedFreed = 0;
+ fsa->purpose = purpose;
return;
}
@@ -86,7 +91,7 @@ void* allocateFixedSize(FixedSizeAllocator fsa) {
* to their original buffer, by looking up the chunk header.
*/
- chunk = HM_allocateChunk(buffer, fsa->fixedSize + sizeof(void*));
+ chunk = HM_allocateChunkWithPurpose(buffer, fsa->fixedSize + sizeof(void*), fsa->purpose);
pointer gap = HM_shiftChunkStart(chunk, sizeof(void*));
*(FixedSizeAllocator *)gap = fsa;
diff --git a/runtime/gc/fixed-size-allocator.h b/runtime/gc/fixed-size-allocator.h
index 95cad510d..30061c696 100644
--- a/runtime/gc/fixed-size-allocator.h
+++ b/runtime/gc/fixed-size-allocator.h
@@ -29,6 +29,7 @@ typedef struct FixedSizeAllocator {
size_t numAllocated;
size_t numLocalFreed;
size_t numSharedFreed;
+ enum BlockPurpose purpose;
/** A bit of a hack. I just want quick access to pages to store elements.
* I'll reuse the frontier mechanism inherent to chunks to remember which
@@ -66,7 +67,10 @@ typedef struct FixedSizeAllocator *FixedSizeAllocator;
/** Initialize [fsa] to be able to allocate objects of size [fixedSize].
* You should never re-initialize an allocator.
*/
-void initFixedSizeAllocator(FixedSizeAllocator fsa, size_t fixedSize);
+void initFixedSizeAllocator(
+ FixedSizeAllocator fsa,
+ size_t fixedSize,
+ enum BlockPurpose purpose);
/** Allocate an object of the size specified when the allocator was initialized.
diff --git a/runtime/gc/foreach.c b/runtime/gc/foreach.c
index 8c16ab249..da5812e41 100644
--- a/runtime/gc/foreach.c
+++ b/runtime/gc/foreach.c
@@ -125,7 +125,7 @@ pointer foreachObjptrInObject (GC_state s, pointer p,
bool skip = !pred->fun(s, p, pred->env);
- header = getHeader (p);
+ header = getRacyHeader (p);
splitHeader(s, header, &tag, NULL, &bytesNonObjptrs, &numObjptrs);
if (DEBUG_DETAILED)
fprintf (stderr,
diff --git a/runtime/gc/forward.c b/runtime/gc/forward.c
index 664d94ec7..f94777f5a 100644
--- a/runtime/gc/forward.c
+++ b/runtime/gc/forward.c
@@ -25,10 +25,14 @@ objptr getFwdPtr (pointer p) {
return *(getFwdPtrp(p));
}
+bool isFwdHeader (GC_header h) {
+ return (not (GC_VALID_HEADER_MASK & h));
+}
+
/* hasFwdPtr (p)
*
* Returns true if the object pointed to by p has a valid forwarding pointer.
*/
bool hasFwdPtr (pointer p) {
- return (not (GC_VALID_HEADER_MASK & getHeader(p)));
+ return isFwdHeader (getHeader(p));
}
diff --git a/runtime/gc/forward.h b/runtime/gc/forward.h
index 154381d81..364b49cef 100644
--- a/runtime/gc/forward.h
+++ b/runtime/gc/forward.h
@@ -15,5 +15,6 @@
static inline objptr* getFwdPtrp (pointer p);
static inline objptr getFwdPtr (pointer p);
static inline bool hasFwdPtr (pointer p);
+static inline bool isFwdHeader (GC_header h);
#endif /* (defined (MLTON_GC_INTERNAL_FUNCS)) */
diff --git a/runtime/gc/garbage-collection.c b/runtime/gc/garbage-collection.c
index 5168cb3e7..ff92abe05 100644
--- a/runtime/gc/garbage-collection.c
+++ b/runtime/gc/garbage-collection.c
@@ -53,7 +53,11 @@ void growStackCurrent(GC_state s) {
/* in this case, the new stack needs more space, so allocate a new chunk,
* copy the stack, and throw away the old chunk. */
- HM_chunk newChunk = HM_allocateChunk(HM_HH_getChunkList(newhh), stackSize);
+ HM_chunk newChunk = HM_allocateChunkWithPurpose(
+ HM_HH_getChunkList(newhh),
+ stackSize,
+ BLOCK_FOR_HEAP_CHUNK);
+
if (NULL == newChunk) {
DIE("Ran out of space to grow stack!");
}
@@ -95,8 +99,15 @@ void GC_collect (GC_state s, size_t bytesRequested, bool force) {
getThreadCurrent(s)->exnStack = s->exnStack;
HM_HH_updateValues(getThreadCurrent(s), s->frontier);
beginAtomic(s);
+ // ebr for hh nodes
HH_EBR_leaveQuiescentState(s);
+ // ebr for chunks
+ HM_EBR_leaveQuiescentState(s);
+ // HM_EBR_enterQuiescentState(s);
+
+ maybeSample(s, s->blockUsageSampler);
+
// HM_HierarchicalHeap h = getThreadCurrent(s)->hierarchicalHeap;
// while (h->nextAncestor != NULL) h = h->nextAncestor;
// if (HM_HH_getDepth(h) == 0 && HM_getChunkListSize(HM_HH_getChunkList(h)) > 8192) {
diff --git a/runtime/gc/gc_state.c b/runtime/gc/gc_state.c
index 3b7008442..c821e9741 100644
--- a/runtime/gc/gc_state.c
+++ b/runtime/gc/gc_state.c
@@ -98,6 +98,22 @@ uintmax_t GC_getCumulativeStatisticsLocalBytesReclaimedOfProc(GC_state s, uint32
return s->procStates[proc].cumulativeStatistics->bytesReclaimedByLocal;
}
+uintmax_t GC_bytesInScopeForLocal(GC_state s) {
+ uintmax_t retVal = 0;
+ for (size_t i = 0; i < s->numberOfProcs; i++) {
+ retVal += s->procStates[i].cumulativeStatistics->bytesInScopeForLocal;
+ }
+ return retVal;
+}
+
+uintmax_t GC_bytesInScopeForCC(GC_state s) {
+ uintmax_t retVal = 0;
+ for (size_t i = 0; i < s->numberOfProcs; i++) {
+ retVal += s->procStates[i].cumulativeStatistics->bytesInScopeForCC;
+ }
+ return retVal;
+}
+
uintmax_t GC_getCumulativeStatisticsBytesAllocated (GC_state s) {
/* return sum across all processors */
size_t retVal = 0;
@@ -165,30 +181,17 @@ uintmax_t GC_getCumulativeStatisticsNumLocalGCsOfProc(GC_state s, uint32_t proc)
return s->procStates[proc].cumulativeStatistics->numHHLocalGCs;
}
-uintmax_t GC_getNumRootCCsOfProc(GC_state s, uint32_t proc) {
- return s->procStates[proc].cumulativeStatistics->numRootCCs;
+uintmax_t GC_getNumCCsOfProc(GC_state s, uint32_t proc) {
+ return s->procStates[proc].cumulativeStatistics->numCCs;
}
-uintmax_t GC_getNumInternalCCsOfProc(GC_state s, uint32_t proc) {
- return s->procStates[proc].cumulativeStatistics->numInternalCCs;
-}
-
-uintmax_t GC_getRootCCMillisecondsOfProc(GC_state s, uint32_t proc) {
- struct timespec *t = &(s->procStates[proc].cumulativeStatistics->timeRootCC);
- return (uintmax_t)t->tv_sec * 1000 + (uintmax_t)t->tv_nsec / 1000000;
-}
-
-uintmax_t GC_getInternalCCMillisecondsOfProc(GC_state s, uint32_t proc) {
- struct timespec *t = &(s->procStates[proc].cumulativeStatistics->timeInternalCC);
+uintmax_t GC_getCCMillisecondsOfProc(GC_state s, uint32_t proc) {
+ struct timespec *t = &(s->procStates[proc].cumulativeStatistics->timeCC);
return (uintmax_t)t->tv_sec * 1000 + (uintmax_t)t->tv_nsec / 1000000;
}
-uintmax_t GC_getRootCCBytesReclaimedOfProc(GC_state s, uint32_t proc) {
- return s->procStates[proc].cumulativeStatistics->bytesReclaimedByRootCC;
-}
-
-uintmax_t GC_getInternalCCBytesReclaimedOfProc(GC_state s, uint32_t proc) {
- return s->procStates[proc].cumulativeStatistics->bytesReclaimedByInternalCC;
+uintmax_t GC_getCCBytesReclaimedOfProc(GC_state s, uint32_t proc) {
+ return s->procStates[proc].cumulativeStatistics->bytesReclaimedByCC;
}
uintmax_t GC_getLocalGCMillisecondsOfProc(GC_state s, uint32_t proc) {
@@ -209,6 +212,14 @@ uintmax_t GC_numDisentanglementChecks(GC_state s) {
return count;
}
+uintmax_t GC_numEntanglements(GC_state s) {
+ uintmax_t count = 0;
+ for (uint32_t p = 0; p < s->numberOfProcs; p++) {
+ count += s->procStates[p].cumulativeStatistics->numEntanglements;
+ }
+ return count;
+}
+
uintmax_t GC_numChecksSkipped(GC_state s)
{
uintmax_t count = 0;
@@ -239,10 +250,56 @@ uintmax_t GC_numSuspectsCleared(GC_state s)
return count;
}
-uintmax_t GC_numEntanglementsDetected(GC_state s) {
+uintmax_t GC_bytesPinnedEntangled(GC_state s)
+{
uintmax_t count = 0;
+ for (uint32_t p = 0; p < s->numberOfProcs; p++)
+ {
+ count += s->procStates[p].cumulativeStatistics->bytesPinnedEntangled;
+ }
+ return count;
+}
+
+uintmax_t GC_bytesPinnedEntangledWatermark(GC_state s)
+{
+ uintmax_t mark = 0;
+ for (uint32_t p = 0; p < s->numberOfProcs; p++)
+ {
+ mark = max(mark,
+ s->procStates[p].cumulativeStatistics->bytesPinnedEntangledWatermark);
+ }
+ return mark;
+}
+
+// must only be called immediately after join at root depth
+void GC_updateBytesPinnedEntangledWatermark(GC_state s)
+{
+ uintmax_t total = 0;
+ for (uint32_t p = 0; p < s->numberOfProcs; p++)
+ {
+ uintmax_t *currp =
+ &(s->procStates[p].cumulativeStatistics->currentPhaseBytesPinnedEntangled);
+ uintmax_t curr = __atomic_load_n(currp, __ATOMIC_SEQ_CST);
+ __atomic_store_n(currp, 0, __ATOMIC_SEQ_CST);
+ total += curr;
+ }
+
+ // if (total > 0) {
+ // LOG(LM_HIERARCHICAL_HEAP, LL_FORCE, "hello %zu", total);
+ // }
+
+ s->cumulativeStatistics->bytesPinnedEntangledWatermark =
+ max(
+ s->cumulativeStatistics->bytesPinnedEntangledWatermark,
+ total
+ );
+}
+
+float GC_approxRaceFactor(GC_state s)
+{
+ float count = 0;
for (uint32_t p = 0; p < s->numberOfProcs; p++) {
- count += s->procStates[p].cumulativeStatistics->numEntanglementsDetected;
+ count = max(count, s->procStates[p].cumulativeStatistics->approxRaceFactor);
}
return count;
}
diff --git a/runtime/gc/gc_state.h b/runtime/gc/gc_state.h
index 68bd4e31b..ff8246d7f 100644
--- a/runtime/gc/gc_state.h
+++ b/runtime/gc/gc_state.h
@@ -32,6 +32,7 @@ struct GC_state {
volatile uint32_t atomicState;
struct BlockAllocator *blockAllocatorGlobal;
struct BlockAllocator *blockAllocatorLocal;
+ struct Sampler *blockUsageSampler;
objptr callFromCHandlerThread; /* Handler for exported C calls (in heap). */
pointer callFromCOpArgsResPtr; /* Pass op, args, and res from exported C call */
struct GC_controls *controls;
@@ -51,7 +52,8 @@ struct GC_state {
uint32_t globalsLength;
struct FixedSizeAllocator hhAllocator;
struct FixedSizeAllocator hhUnionFindAllocator;
- struct HH_EBR_shared * hhEBR;
+ struct EBR_shared * hhEBR;
+ struct EBR_shared * hmEBR;
struct GC_lastMajorStatistics *lastMajorStatistics;
pointer limitPlusSlop; /* limit + GC_HEAP_LIMIT_SLOP */
int (*loadGlobals)(FILE *f); /* loads the globals from the file. */
@@ -131,17 +133,20 @@ PRIVATE uintmax_t GC_getPromoMillisecondsOfProc(GC_state s, uint32_t proc);
PRIVATE uintmax_t GC_getCumulativeStatisticsNumLocalGCsOfProc(GC_state s, uint32_t proc);
-PRIVATE uintmax_t GC_getNumRootCCsOfProc(GC_state s, uint32_t proc);
-PRIVATE uintmax_t GC_getNumInternalCCsOfProc(GC_state s, uint32_t proc);
-PRIVATE uintmax_t GC_getRootCCMillisecondsOfProc(GC_state s, uint32_t proc);
-PRIVATE uintmax_t GC_getInternalCCMillisecondsOfProc(GC_state s, uint32_t proc);
-PRIVATE uintmax_t GC_getRootCCBytesReclaimedOfProc(GC_state s, uint32_t proc);
-PRIVATE uintmax_t GC_getInternalCCBytesReclaimedOfProc(GC_state s, uint32_t proc);
+PRIVATE uintmax_t GC_getNumCCsOfProc(GC_state s, uint32_t proc);
+PRIVATE uintmax_t GC_getCCMillisecondsOfProc(GC_state s, uint32_t proc);
+PRIVATE uintmax_t GC_getCCBytesReclaimedOfProc(GC_state s, uint32_t proc);
+PRIVATE uintmax_t GC_bytesInScopeForLocal(GC_state s);
+PRIVATE uintmax_t GC_bytesInScopeForCC(GC_state s);
PRIVATE uintmax_t GC_numDisentanglementChecks(GC_state s);
-PRIVATE uintmax_t GC_numEntanglementsDetected(GC_state s);
+PRIVATE uintmax_t GC_numEntanglements(GC_state s);
+PRIVATE float GC_approxRaceFactor(GC_state s);
PRIVATE uintmax_t GC_numChecksSkipped(GC_state s);
PRIVATE uintmax_t GC_numSuspectsMarked(GC_state s);
PRIVATE uintmax_t GC_numSuspectsCleared(GC_state s);
+PRIVATE uintmax_t GC_bytesPinnedEntangled(GC_state s);
+PRIVATE uintmax_t GC_bytesPinnedEntangledWatermark(GC_state s);
+PRIVATE void GC_updateBytesPinnedEntangledWatermark(GC_state s);
PRIVATE uint32_t GC_getControlMaxCCDepth(GC_state s);
diff --git a/runtime/gc/hierarchical-heap-collection.c b/runtime/gc/hierarchical-heap-collection.c
index dc6a7191f..444f8744f 100644
--- a/runtime/gc/hierarchical-heap-collection.c
+++ b/runtime/gc/hierarchical-heap-collection.c
@@ -40,21 +40,38 @@ void unfreezeDisentangledDepthAfter(
#endif
void tryUnpinOrKeepPinned(
- GC_state s,
- HM_remembered remElem,
- void* rawArgs);
+ GC_state s,
+ HM_remembered remElem,
+ void *rawArgs);
+
+void LGC_markAndScan(GC_state s, HM_remembered remElem, void *rawArgs);
+void unmark(GC_state s, objptr *opp, objptr op, void *rawArgs);
void copySuspect(GC_state s, objptr *opp, objptr op, void *rawArghh);
-void forwardObjptrsOfRemembered(
- GC_state s,
+void forwardFromObjsOfRemembered(
+ GC_state s,
+ HM_remembered remElem,
+ void *rawArgs);
+
+void unmarkWrapper(
+ __attribute__((unused)) GC_state s,
HM_remembered remElem,
- void* rawArgs);
+ __attribute__((unused)) void *rawArgs);
+void addEntangledToRemSet(GC_state s, objptr op, uint32_t opDepth, struct ForwardHHObjptrArgs *args);
+static inline HM_HierarchicalHeap toSpaceHH (GC_state s, struct ForwardHHObjptrArgs *args, uint32_t depth) {
+ if (args->toSpace[depth] == NULL)
+ {
+ /* Level does not exist, so create it */
+ args->toSpace[depth] = HM_HH_new(s, depth);
+ }
+ return args->toSpace[depth];
+}
// void scavengeChunkOfPinnedObject(GC_state s, objptr op, void* rawArgs);
#if ASSERT
-void checkRememberedEntry(GC_state s, HM_remembered remElem, void* args);
+void checkRememberedEntry(GC_state s, HM_remembered remElem, void *args);
bool hhContainsChunk(HM_HierarchicalHeap hh, HM_chunk theChunk);
#endif
@@ -70,7 +87,8 @@ bool hhContainsChunk(HM_HierarchicalHeap hh, HM_chunk theChunk);
*
* @return the tag of the object
*/
-GC_objectTypeTag computeObjectCopyParameters(GC_state s, pointer p,
+GC_objectTypeTag computeObjectCopyParameters(GC_state s, GC_header header,
+ pointer p,
size_t *objectSize,
size_t *copySize,
size_t *metaDataSize);
@@ -80,6 +98,8 @@ pointer copyObject(pointer p,
size_t copySize,
HM_HierarchicalHeap tgtHeap);
+void delLastObj(objptr op, size_t objectSize, HM_HierarchicalHeap tgtHeap);
+
/**
* ObjptrPredicateFunction for skipping stacks and threads in the hierarchical
* heap.
@@ -87,68 +107,69 @@ pointer copyObject(pointer p,
* @note This function takes as additional arguments the
* struct SSATOPredicateArgs
*/
-struct SSATOPredicateArgs {
+struct SSATOPredicateArgs
+{
pointer expectedStackPointer;
pointer expectedThreadPointer;
};
bool skipStackAndThreadObjptrPredicate(GC_state s,
pointer p,
- void* rawArgs);
+ void *rawArgs);
/************************/
/* Function Definitions */
/************************/
-#if (defined (MLTON_GC_INTERNAL_BASIS))
+#if (defined(MLTON_GC_INTERNAL_BASIS))
#endif /* MLTON_GC_INTERNAL_BASIS */
-#if (defined (MLTON_GC_INTERNAL_FUNCS))
+#if (defined(MLTON_GC_INTERNAL_FUNCS))
-enum LGC_freedChunkType {
+enum LGC_freedChunkType
+{
LGC_FREED_REMSET_CHUNK,
LGC_FREED_STACK_CHUNK,
LGC_FREED_NORMAL_CHUNK,
LGC_FREED_SUSPECT_CHUNK
};
-static const char* LGC_freedChunkTypeToString[] = {
- "LGC_FREED_REMSET_CHUNK",
- "LGC_FREED_STACK_CHUNK",
- "LGC_FREED_NORMAL_CHUNK",
- "LGC_FREED_SUSPECT_CHUNK"
-};
+static const char *LGC_freedChunkTypeToString[] = {
+ "LGC_FREED_REMSET_CHUNK",
+ "LGC_FREED_STACK_CHUNK",
+ "LGC_FREED_NORMAL_CHUNK",
+ "LGC_FREED_SUSPECT_CHUNK"};
-struct LGC_chunkInfo {
+struct LGC_chunkInfo
+{
uint32_t depth;
int32_t procNum;
uintmax_t collectionNumber;
enum LGC_freedChunkType freedType;
};
-
void LGC_writeFreeChunkInfo(
- __attribute__((unused)) GC_state s,
- char* infoBuffer,
- size_t bufferLen,
- void* env)
+ __attribute__((unused)) GC_state s,
+ char *infoBuffer,
+ size_t bufferLen,
+ void *env)
{
struct LGC_chunkInfo *info = env;
snprintf(infoBuffer, bufferLen,
- "freed %s at depth %u by LGC %d:%zu",
- LGC_freedChunkTypeToString[info->freedType],
- info->depth,
- info->procNum,
- info->collectionNumber);
+ "freed %s at depth %u by LGC %d:%zu",
+ LGC_freedChunkTypeToString[info->freedType],
+ info->depth,
+ info->procNum,
+ info->collectionNumber);
}
-
-uint32_t minDepthWithoutCC(GC_thread thread) {
+uint32_t minDepthWithoutCC(GC_thread thread)
+{
assert(thread != NULL);
assert(thread->hierarchicalHeap != NULL);
HM_HierarchicalHeap cursor = thread->hierarchicalHeap;
if (cursor->subHeapForCC != NULL)
- return thread->currentDepth+1;
+ return thread->currentDepth + 1;
while (cursor->nextAncestor != NULL &&
cursor->nextAncestor->subHeapForCC == NULL)
@@ -166,17 +187,19 @@ uint32_t minDepthWithoutCC(GC_thread thread) {
return HM_HH_getDepth(cursor);
}
-void HM_HHC_collectLocal(uint32_t desiredScope) {
+void HM_HHC_collectLocal(uint32_t desiredScope)
+{
GC_state s = pthread_getspecific(gcstate_key);
GC_thread thread = getThreadCurrent(s);
- struct HM_HierarchicalHeap* hh = thread->hierarchicalHeap;
+ struct HM_HierarchicalHeap *hh = thread->hierarchicalHeap;
struct rusage ru_start;
struct timespec startTime;
struct timespec stopTime;
uint64_t oldObjectCopied;
- if (NONE == s->controls->collectionType) {
+ if (NONE == s->controls->collectionType)
+ {
/* collection disabled */
return;
}
@@ -188,62 +211,65 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
// return;
// }
- if (s->wsQueueTop == BOGUS_OBJPTR || s->wsQueueBot == BOGUS_OBJPTR) {
+ if (s->wsQueueTop == BOGUS_OBJPTR || s->wsQueueBot == BOGUS_OBJPTR)
+ {
LOG(LM_HH_COLLECTION, LL_DEBUG, "Skipping collection, deque not registered yet");
return;
}
- uint64_t topval = *(uint64_t*)objptrToPointer(s->wsQueueTop, NULL);
+ uint64_t topval = *(uint64_t *)objptrToPointer(s->wsQueueTop, NULL);
uint32_t potentialLocalScope = UNPACK_IDX(topval);
uint32_t originalLocalScope = pollCurrentLocalScope(s);
- if (thread->currentDepth != originalLocalScope) {
+ if (thread->currentDepth != originalLocalScope)
+ {
LOG(LM_HH_COLLECTION, LL_DEBUG,
- "Skipping collection:\n"
- " currentDepth %u\n"
- " originalLocalScope %u\n"
- " potentialLocalScope %u\n",
- thread->currentDepth,
- originalLocalScope,
- potentialLocalScope);
+ "Skipping collection:\n"
+ " currentDepth %u\n"
+ " originalLocalScope %u\n"
+ " potentialLocalScope %u\n",
+ thread->currentDepth,
+ originalLocalScope,
+ potentialLocalScope);
return;
}
/** Compute the min depth for local collection. We claim as many levels
- * as we can without interfering with CC, but only so far as desired.
- *
- * Note that we could permit local collection at the same level as a
- * registered (but not yet stolen) CC, as long as we update the rootsets
- * stored for the CC. But this is tricky. Much simpler to just avoid CC'ed
- * levels entirely.
- */
+ * as we can without interfering with CC, but only so far as desired.
+ *
+ * Note that we could permit local collection at the same level as a
+ * registered (but not yet stolen) CC, as long as we update the rootsets
+ * stored for the CC. But this is tricky. Much simpler to just avoid CC'ed
+ * levels entirely.
+ */
uint32_t minNoCC = minDepthWithoutCC(thread);
uint32_t minOkay = desiredScope;
minOkay = max(minOkay, thread->minLocalCollectionDepth);
minOkay = max(minOkay, minNoCC);
uint32_t minDepth = originalLocalScope;
- while (minDepth > minOkay && tryClaimLocalScope(s)) {
+ while (minDepth > minOkay && tryClaimLocalScope(s))
+ {
minDepth--;
assert(minDepth == pollCurrentLocalScope(s));
}
assert(minDepth == pollCurrentLocalScope(s));
- if ( minDepth == 0 ||
- minOkay > minDepth ||
- minDepth > thread->currentDepth )
+ if (minDepth == 0 ||
+ minOkay > minDepth ||
+ minDepth > thread->currentDepth)
{
LOG(LM_HH_COLLECTION, LL_DEBUG,
- "Skipping collection:\n"
- " minDepth %u\n"
- " currentDepth %u\n"
- " minNoCC %u\n"
- " desiredScope %u\n"
- " potentialLocalScope %u\n",
- minDepth,
- thread->currentDepth,
- minNoCC,
- desiredScope,
- potentialLocalScope);
+ "Skipping collection:\n"
+ " minDepth %u\n"
+ " currentDepth %u\n"
+ " minNoCC %u\n"
+ " desiredScope %u\n"
+ " potentialLocalScope %u\n",
+ minDepth,
+ thread->currentDepth,
+ minNoCC,
+ desiredScope,
+ potentialLocalScope);
releaseLocalScope(s, originalLocalScope);
return;
@@ -260,33 +286,38 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
s->cumulativeStatistics->numHHLocalGCs++;
/* used needs to be set because the mutator has changed s->stackTop. */
- getStackCurrent(s)->used = sizeofGCStateCurrentStackUsed (s);
+ getStackCurrent(s)->used = sizeofGCStateCurrentStackUsed(s);
getThreadCurrent(s)->exnStack = s->exnStack;
assertInvariants(thread);
- if (SUPERLOCAL == s->controls->collectionType) {
+ if (SUPERLOCAL == s->controls->collectionType)
+ {
minDepth = maxDepth;
}
/* copy roots */
struct ForwardHHObjptrArgs forwardHHObjptrArgs = {
- .hh = hh,
- .minDepth = minDepth,
- .maxDepth = maxDepth,
- .toDepth = HM_HH_INVALID_DEPTH,
- .fromSpace = NULL,
- .toSpace = NULL,
- .pinned = NULL,
- .containingObject = BOGUS_OBJPTR,
- .bytesCopied = 0,
- .objectsCopied = 0,
- .stacksCopied = 0,
- .bytesMoved = 0,
- .objectsMoved = 0
- };
+ .hh = hh,
+ .minDepth = minDepth,
+ .maxDepth = maxDepth,
+ .toDepth = HM_HH_INVALID_DEPTH,
+ .fromSpace = NULL,
+ .toSpace = NULL,
+ .toSpaceStart = NULL,
+ .toSpaceStartChunk = NULL,
+ .pinned = NULL,
+ .containingObject = BOGUS_OBJPTR,
+ .bytesCopied = 0,
+ .entangledBytes = 0,
+ .objectsCopied = 0,
+ .stacksCopied = 0,
+ .bytesMoved = 0,
+ .objectsMoved = 0,
+ .concurrent = false};
+ CC_workList_init(s, &(forwardHHObjptrArgs.worklist));
struct GC_foreachObjptrClosure forwardHHObjptrClosure =
- {.fun = forwardHHObjptr, .env = &forwardHHObjptrArgs};
+ {.fun = forwardHHObjptr, .env = &forwardHHObjptrArgs};
LOG(LM_HH_COLLECTION, LL_INFO,
"collecting hh %p (L: %u):\n"
@@ -298,7 +329,7 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
" potential local scope is %u -> %u\n"
" collection scope is %u -> %u\n",
// " lchs %"PRIu64" lcs %"PRIu64,
- ((void*)(hh)),
+ ((void *)(hh)),
thread->currentDepth,
s->procNumber,
s->cumulativeStatistics->numHHLocalGCs,
@@ -311,26 +342,38 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
forwardHHObjptrArgs.minDepth,
forwardHHObjptrArgs.maxDepth);
- struct HM_chunkList pinned[maxDepth+1];
+ struct HM_chunkList pinned[maxDepth + 1];
forwardHHObjptrArgs.pinned = &(pinned[0]);
- for (uint32_t i = 0; i <= maxDepth; i++) HM_initChunkList(&(pinned[i]));
+ for (uint32_t i = 0; i <= maxDepth; i++)
+ HM_initChunkList(&(pinned[i]));
- HM_HierarchicalHeap toSpace[maxDepth+1];
+ HM_HierarchicalHeap toSpace[maxDepth + 1];
forwardHHObjptrArgs.toSpace = &(toSpace[0]);
- for (uint32_t i = 0; i <= maxDepth; i++) toSpace[i] = NULL;
+ pointer toSpaceStart[maxDepth + 1];
+ forwardHHObjptrArgs.toSpaceStart = &(toSpaceStart[0]);
+ HM_chunk toSpaceStartChunk[maxDepth + 1];
+ forwardHHObjptrArgs.toSpaceStartChunk = &(toSpaceStartChunk[0]);
+ for (uint32_t i = 0; i <= maxDepth; i++)
+ {
+ toSpace[i] = NULL;
+ toSpaceStart[i] = NULL;
+ toSpaceStartChunk[i] = NULL;
+ }
- HM_HierarchicalHeap fromSpace[maxDepth+1];
+ HM_HierarchicalHeap fromSpace[maxDepth + 1];
forwardHHObjptrArgs.fromSpace = &(fromSpace[0]);
- for (uint32_t i = 0; i <= maxDepth; i++) fromSpace[i] = NULL;
+ for (uint32_t i = 0; i <= maxDepth; i++)
+ fromSpace[i] = NULL;
for (HM_HierarchicalHeap cursor = hh;
NULL != cursor;
- cursor = cursor->nextAncestor) {
+ cursor = cursor->nextAncestor)
+ {
fromSpace[HM_HH_getDepth(cursor)] = cursor;
}
/* =====================================================================
* logging */
- size_t sizesBefore[maxDepth+1];
+ size_t sizesBefore[maxDepth + 1];
for (uint32_t i = 0; i <= maxDepth; i++)
sizesBefore[i] = 0;
size_t totalSizeBefore = 0;
@@ -406,6 +449,7 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
Trace0(EVENT_PROMOTION_ENTER);
timespec_now(&startTime);
+ forwardHHObjptrArgs.concurrent = true;
/* For each remembered entry, if possible, unpin and discard the entry.
* otherwise, copy the remembered entry to the toSpace remembered set. */
for (HM_HierarchicalHeap cursor = hh;
@@ -415,11 +459,26 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
forwardHHObjptrArgs.toDepth = HM_HH_getDepth(cursor);
struct HM_foreachDownptrClosure closure =
- {.fun = tryUnpinOrKeepPinned, .env = (void*)&forwardHHObjptrArgs};
- HM_foreachRemembered(s, HM_HH_getRemSet(cursor), &closure);
+ {.fun = tryUnpinOrKeepPinned, .env = (void *)&forwardHHObjptrArgs};
+ HM_foreachRemembered(s, HM_HH_getRemSet(cursor), &closure, true);
}
+ forwardHHObjptrArgs.concurrent = false;
forwardHHObjptrArgs.toDepth = HM_HH_INVALID_DEPTH;
+ for (uint32_t i = 0; i <= maxDepth; i++)
+ {
+ if (toSpace[i] != NULL)
+ {
+ HM_chunkList toSpaceList = HM_HH_getChunkList(toSpace[i]);
+ if (toSpaceList->firstChunk != NULL)
+ {
+ toSpaceStart[i] = HM_getChunkFrontier(toSpaceList->lastChunk);
+ toSpaceStartChunk[i] = toSpaceList->lastChunk;
+ // assert(HM_getChunkOf(toSpaceStart[i]) == toSpaceList->lastChunk);
+ }
+ }
+ }
+
// assertInvariants(thread);
#if ASSERT
@@ -438,8 +497,9 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
/* ===================================================================== */
- if (needGCTime(s)) {
- startTiming (RUSAGE_THREAD, &ru_start);
+ if (needGCTime(s))
+ {
+ startTiming(RUSAGE_THREAD, &ru_start);
}
timespec_now(&startTime);
@@ -460,12 +520,12 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
&forwardHHObjptrClosure,
FALSE);
LOG(LM_HH_COLLECTION, LL_DEBUG,
- "Copied %"PRIu64" objects from stack",
+ "Copied %" PRIu64 " objects from stack",
forwardHHObjptrArgs.objectsCopied - oldObjectCopied);
Trace3(EVENT_COPY,
- forwardHHObjptrArgs.bytesCopied,
- forwardHHObjptrArgs.objectsCopied,
- forwardHHObjptrArgs.stacksCopied);
+ forwardHHObjptrArgs.bytesCopied,
+ forwardHHObjptrArgs.objectsCopied,
+ forwardHHObjptrArgs.stacksCopied);
/* forward contents of thread (hence including stack) */
oldObjectCopied = forwardHHObjptrArgs.objectsCopied;
@@ -477,26 +537,25 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
&forwardHHObjptrClosure,
FALSE);
LOG(LM_HH_COLLECTION, LL_DEBUG,
- "Copied %"PRIu64" objects from thread",
+ "Copied %" PRIu64 " objects from thread",
forwardHHObjptrArgs.objectsCopied - oldObjectCopied);
Trace3(EVENT_COPY,
- forwardHHObjptrArgs.bytesCopied,
- forwardHHObjptrArgs.objectsCopied,
- forwardHHObjptrArgs.stacksCopied);
+ forwardHHObjptrArgs.bytesCopied,
+ forwardHHObjptrArgs.objectsCopied,
+ forwardHHObjptrArgs.stacksCopied);
/* forward thread itself */
LOG(LM_HH_COLLECTION, LL_DEBUG,
- "Trying to forward current thread %p",
- (void*)s->currentThread);
+ "Trying to forward current thread %p",
+ (void *)s->currentThread);
oldObjectCopied = forwardHHObjptrArgs.objectsCopied;
forwardHHObjptr(s, &(s->currentThread), s->currentThread, &forwardHHObjptrArgs);
LOG(LM_HH_COLLECTION, LL_DEBUG,
- (1 == (forwardHHObjptrArgs.objectsCopied - oldObjectCopied)) ?
- "Copied thread from GC_state" : "Did not copy thread from GC_state");
+ (1 == (forwardHHObjptrArgs.objectsCopied - oldObjectCopied)) ? "Copied thread from GC_state" : "Did not copy thread from GC_state");
Trace3(EVENT_COPY,
- forwardHHObjptrArgs.bytesCopied,
- forwardHHObjptrArgs.objectsCopied,
- forwardHHObjptrArgs.stacksCopied);
+ forwardHHObjptrArgs.bytesCopied,
+ forwardHHObjptrArgs.objectsCopied,
+ forwardHHObjptrArgs.stacksCopied);
/* forward contents of deque */
oldObjectCopied = forwardHHObjptrArgs.objectsCopied;
@@ -507,12 +566,12 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
&forwardHHObjptrClosure,
FALSE);
LOG(LM_HH_COLLECTION, LL_DEBUG,
- "Copied %"PRIu64" objects from deque",
+ "Copied %" PRIu64 " objects from deque",
forwardHHObjptrArgs.objectsCopied - oldObjectCopied);
Trace3(EVENT_COPY,
- forwardHHObjptrArgs.bytesCopied,
- forwardHHObjptrArgs.objectsCopied,
- forwardHHObjptrArgs.stacksCopied);
+ forwardHHObjptrArgs.bytesCopied,
+ forwardHHObjptrArgs.objectsCopied,
+ forwardHHObjptrArgs.stacksCopied);
LOG(LM_HH_COLLECTION, LL_DEBUG, "END root copy");
@@ -530,107 +589,82 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
// };
/* off-by-one to prevent underflow */
- uint32_t depth = thread->currentDepth+1;
- while (depth > forwardHHObjptrArgs.minDepth) {
+ uint32_t depth = thread->currentDepth + 1;
+ while (depth > forwardHHObjptrArgs.minDepth)
+ {
depth--;
HM_HierarchicalHeap toSpaceLevel = toSpace[depth];
- if (NULL == toSpaceLevel) {
+ if (NULL == toSpaceLevel)
+ {
continue;
}
LOG(LM_HH_COLLECTION, LL_INFO,
- "level %"PRIu32": num pinned: %zu",
- depth,
- HM_numRemembered(HM_HH_getRemSet(toSpaceLevel)));
+ "level %" PRIu32 ": num pinned: %zu",
+ depth,
+ HM_numRemembered(HM_HH_getRemSet(toSpaceLevel)));
- /* use the remembered (pinned) entries at this level as extra roots */
+ /* forward the from-elements of the down-ptrs */
struct HM_foreachDownptrClosure closure =
- {.fun = forwardObjptrsOfRemembered, .env = (void*)&forwardHHObjptrArgs};
- HM_foreachRemembered(s, HM_HH_getRemSet(toSpaceLevel), &closure);
+ {.fun = forwardFromObjsOfRemembered, .env = (void *)&forwardHHObjptrArgs};
+ // HM_foreachRemembered pops the public remSet into private. So it interferes
+ // with the unmarking phase of GC. So use HM_foreachPrivate instead.
+ HM_foreachPrivate(s, &(HM_HH_getRemSet(toSpaceLevel)->private), &closure);
if (NULL != HM_HH_getChunkList(toSpaceLevel)->firstChunk)
{
HM_chunkList toSpaceList = HM_HH_getChunkList(toSpaceLevel);
+ pointer start = toSpaceStart[depth] != NULL ? toSpaceStart[depth] : HM_getChunkStart(toSpaceList->firstChunk);
+ HM_chunk startChunk = toSpaceStartChunk[depth] != NULL ? toSpaceStartChunk[depth] : toSpaceList->firstChunk;
HM_forwardHHObjptrsInChunkList(
- s,
- toSpaceList->firstChunk,
- HM_getChunkStart(toSpaceList->firstChunk),
- // &skipStackAndThreadObjptrPredicate,
- // &ssatoPredicateArgs,
- &trueObjptrPredicate,
- NULL,
- &forwardHHObjptr,
- &forwardHHObjptrArgs);
+ s,
+ startChunk,
+ start,
+ // &skipStackAndThreadObjptrPredicate,
+ // &ssatoPredicateArgs,
+ &trueObjptrPredicate,
+ NULL,
+ &forwardHHObjptr,
+ &forwardHHObjptrArgs);
}
}
- /* after everything has been scavenged, we have to move the pinned chunks */
- depth = thread->currentDepth+1;
- while (depth > forwardHHObjptrArgs.minDepth) {
- depth--;
- HM_HierarchicalHeap toSpaceLevel = toSpace[depth];
- if (NULL == toSpaceLevel) {
- /* check that there are also no pinned chunks at this level
- * (if there was pinned chunk, then we would have also created a
- * toSpace HH at this depth, because we would have scavenged the
- * remembered entry) */
- assert(pinned[depth].firstChunk == NULL);
- continue;
- }
-
-#if ASSERT
- // SAM_NOTE: safe to check here, because pinned chunks are separate.
- traverseEachObjInChunkList(s, HM_HH_getChunkList(toSpaceLevel));
-#endif
-
- /* unset the flags on pinned chunks and update their HH pointer */
- for (HM_chunk chunkCursor = pinned[depth].firstChunk;
- chunkCursor != NULL;
- chunkCursor = chunkCursor->nextChunk)
- {
- assert(chunkCursor->pinnedDuringCollection);
- chunkCursor->pinnedDuringCollection = FALSE;
- chunkCursor->levelHead = HM_HH_getUFNode(toSpaceLevel);
- }
-
- /* put the pinned chunks into the toSpace */
- HM_appendChunkList(HM_HH_getChunkList(toSpaceLevel), &(pinned[depth]));
- }
-
LOG(LM_HH_COLLECTION, LL_DEBUG,
- "Copied %"PRIu64" objects in copy-collection",
+ "Copied %" PRIu64 " objects in copy-collection",
forwardHHObjptrArgs.objectsCopied - oldObjectCopied);
LOG(LM_HH_COLLECTION, LL_DEBUG,
- "Copied %"PRIu64" stacks in copy-collection",
+ "Copied %" PRIu64 " stacks in copy-collection",
forwardHHObjptrArgs.stacksCopied);
Trace3(EVENT_COPY,
- forwardHHObjptrArgs.bytesCopied,
- forwardHHObjptrArgs.objectsCopied,
- forwardHHObjptrArgs.stacksCopied);
+ forwardHHObjptrArgs.bytesCopied,
+ forwardHHObjptrArgs.objectsCopied,
+ forwardHHObjptrArgs.stacksCopied);
/* ===================================================================== */
struct LGC_chunkInfo info =
- {.depth = 0,
- .procNum = s->procNumber,
- .collectionNumber = s->cumulativeStatistics->numHHLocalGCs,
- .freedType = LGC_FREED_NORMAL_CHUNK};
+ {.depth = 0,
+ .procNum = s->procNumber,
+ .collectionNumber = s->cumulativeStatistics->numHHLocalGCs,
+ .freedType = LGC_FREED_NORMAL_CHUNK};
struct writeFreedBlockInfoFnClosure infoc =
- {.fun = LGC_writeFreeChunkInfo, .env = &info};
+ {.fun = LGC_writeFreeChunkInfo, .env = &info};
for (HM_HierarchicalHeap cursor = hh;
NULL != cursor && HM_HH_getDepth(cursor) >= minDepth;
- cursor = cursor->nextAncestor) {
+ cursor = cursor->nextAncestor)
+ {
HM_chunkList suspects = HM_HH_getSuspects(cursor);
- if (suspects->size != 0) {
+ if (suspects->size != 0)
+ {
uint32_t depth = HM_HH_getDepth(cursor);
forwardHHObjptrArgs.toDepth = depth;
struct GC_foreachObjptrClosure fObjptrClosure =
- {.fun = copySuspect, .env = &forwardHHObjptrArgs};
+ {.fun = copySuspect, .env = &forwardHHObjptrArgs};
ES_foreachSuspect(s, suspects, &fObjptrClosure);
info.depth = depth;
info.freedType = LGC_FREED_SUSPECT_CHUNK;
- HM_freeChunksInListWithInfo(s, suspects, &infoc);
+ HM_freeChunksInListWithInfo(s, suspects, &infoc, BLOCK_FOR_SUSPECTS);
}
}
@@ -644,8 +678,9 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
HM_HierarchicalHeap nextAncestor = hhTail->nextAncestor;
HM_chunkList level = HM_HH_getChunkList(hhTail);
- HM_chunkList remset = HM_HH_getRemSet(hhTail);
- if (NULL != remset) {
+ HM_remSet remset = HM_HH_getRemSet(hhTail);
+ if (NULL != remset)
+ {
#if ASSERT
/* clear out memory to quickly catch some memory safety errors */
// HM_chunk chunkCursor = remset->firstChunk;
@@ -658,12 +693,13 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
#endif
info.depth = HM_HH_getDepth(hhTail);
info.freedType = LGC_FREED_REMSET_CHUNK;
- HM_freeChunksInListWithInfo(s, remset, &infoc);
+ HM_freeChunksInListWithInfo(s, &(remset->private), &infoc, BLOCK_FOR_REMEMBERED_SET);
}
#if ASSERT
HM_chunk chunkCursor = level->firstChunk;
- while (chunkCursor != NULL) {
+ while (chunkCursor != NULL)
+ {
assert(!chunkCursor->pinnedDuringCollection);
chunkCursor = chunkCursor->nextChunk;
}
@@ -671,14 +707,82 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
info.depth = HM_HH_getDepth(hhTail);
info.freedType = LGC_FREED_NORMAL_CHUNK;
- HM_freeChunksInListWithInfo(s, level, &infoc);
- HM_HH_freeAllDependants(s, hhTail, FALSE);
- freeFixedSize(getUFAllocator(s), HM_HH_getUFNode(hhTail));
- freeFixedSize(getHHAllocator(s), hhTail);
+ // HM_freeChunksInListWithInfo(s, level, &infoc);
+ HM_chunk chunk = level->firstChunk;
+ while (chunk != NULL) {
+ HM_chunk next = chunk->nextChunk;
+ if (chunk->retireChunk) {
+ HM_EBR_retire(s, chunk);
+ chunk->retireChunk = false;
+ }
+ else
+ {
+ HM_freeChunkWithInfo(s, chunk, &infoc, BLOCK_FOR_HEAP_CHUNK);
+ }
+ chunk = next;
+ }
+ HM_initChunkList(level);
+ HM_HH_freeAllDependants(s, hhTail, TRUE);
+ // freeFixedSize(getUFAllocator(s), HM_HH_getUFNode(hhTail));
+ // freeFixedSize(getHHAllocator(s), hhTail);
hhTail = nextAncestor;
}
+ HM_EBR_leaveQuiescentState(s);
+ // HM_EBR_enterQuiescentState(s);
+
+ /* after everything has been scavenged, we have to move the pinned chunks */
+ depth = thread->currentDepth + 1;
+ while (depth > forwardHHObjptrArgs.minDepth)
+ {
+ depth--;
+ HM_HierarchicalHeap fromSpaceLevel = fromSpace[depth];
+ if (NULL == fromSpaceLevel)
+ {
+ /* check that there are also no pinned chunks at this level
+ * (if there was pinned chunk, then there must also have been a
+ * fromSpace HH at this depth which originally stored the chunk)
+ */
+ assert(pinned[depth].firstChunk == NULL);
+ assert(NULL == toSpace[depth] || (HM_HH_getRemSet(toSpace[depth])->private).firstChunk == NULL);
+ continue;
+ }
+
+ HM_HierarchicalHeap toSpaceLevel = toSpace[depth];
+ // if (fromSpaceLevel != NULL) {
+ // struct HM_foreachDownptrClosure closure =
+ // {.fun = tryUnpinOrKeepPinned, .env = (void *)&forwardHHObjptrArgs};
+ // // HM_foreachRemembered(s, HM_HH_getRemSet(toSpaceLevel), &closure);
+ // /*go through the public of fromSpaceLevel, they will be joined later anyway*/
+ // // assert((HM_HH_getRemSet(fromSpaceLevel)->private).firstChunk == NULL);
+ // forwardHHObjptrArgs.toDepth = depth;
+ // HM_foreachRemembered(s, HM_HH_getRemSet(fromSpaceLevel), &closure);
+ // }
+
+ if (toSpaceLevel != NULL) {
+ struct HM_foreachDownptrClosure unmarkClosure =
+ {.fun = unmarkWrapper, .env = NULL};
+ HM_foreachPublic(s, HM_HH_getRemSet(toSpaceLevel), &unmarkClosure, true);
+ }
+
+ /* unset the flags on pinned chunks and update their HH pointer */
+ for (HM_chunk chunkCursor = pinned[depth].firstChunk;
+ chunkCursor != NULL;
+ chunkCursor = chunkCursor->nextChunk)
+ {
+ assert(chunkCursor->levelHead == HM_HH_getUFNode(fromSpaceLevel));
+ assert(chunkCursor->pinnedDuringCollection);
+ chunkCursor->pinnedDuringCollection = FALSE;
+ chunkCursor->retireChunk = FALSE;
+ }
+
+ /* put the pinned chunks into the toSpace */
+ HM_appendChunkList(HM_HH_getChunkList(fromSpaceLevel), &(pinned[depth]));
+ }
+
+ CC_workList_free(s, &(forwardHHObjptrArgs.worklist));
+
/* Build the toSpace hh */
HM_HierarchicalHeap hhToSpace = NULL;
for (uint32_t i = 0; i <= maxDepth; i++)
@@ -691,15 +795,17 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
}
/* merge in toSpace */
- if (NULL == hhTail && NULL == hhToSpace) {
+ if (NULL == hh && NULL == hhToSpace)
+ {
/** SAM_NOTE: If we collected everything, I suppose this is possible.
- * But shouldn't the stack and thread at least be in the root-to-leaf
- * path? Should look into this...
- */
+ * But shouldn't the stack and thread at least be in the root-to-leaf
+ * path? Should look into this...
+ */
hh = HM_HH_new(s, thread->currentDepth);
}
- else {
- hh = HM_HH_zip(s, hhTail, hhToSpace);
+ else
+ {
+ hh = HM_HH_zip(s, hh, hhToSpace);
}
thread->hierarchicalHeap = hh;
@@ -716,15 +822,18 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
NULL != cursor;
cursor = cursor->nextAncestor)
{
- if (HM_getChunkListLastChunk(HM_HH_getChunkList(cursor)) != NULL) {
+ if (HM_getChunkListLastChunk(HM_HH_getChunkList(cursor)) != NULL)
+ {
lastChunk = HM_getChunkListLastChunk(HM_HH_getChunkList(cursor));
break;
}
}
thread->currentChunk = lastChunk;
- if (lastChunk != NULL && !lastChunk->mightContainMultipleObjects) {
- if (!HM_HH_extend(s, thread, GC_HEAP_LIMIT_SLOP)) {
+ if (lastChunk != NULL && !lastChunk->mightContainMultipleObjects)
+ {
+ if (!HM_HH_extend(s, thread, GC_HEAP_LIMIT_SLOP))
+ {
DIE("Ran out of space for hierarchical heap!\n");
}
}
@@ -738,7 +847,6 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
* assert(lastChunk->frontier < (pointer)lastChunk + HM_BLOCK_SIZE);
*/
-
#if 0
/** Finally, unfreeze chunks if we need to. */
if (s->controls->manageEntanglement) {
@@ -769,8 +877,19 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
cursor = cursor->nextAncestor)
{
struct HM_foreachDownptrClosure closure =
- {.fun = checkRememberedEntry, .env = (void*)cursor};
- HM_foreachRemembered(s, HM_HH_getRemSet(cursor), &closure);
+ {.fun = checkRememberedEntry, .env = (void *)cursor};
+ HM_foreachRemembered(s, HM_HH_getRemSet(cursor), &closure, false);
+ }
+
+ // make sure that original representatives haven't been messed up
+ for (HM_HierarchicalHeap cursor = hh;
+ NULL != cursor;
+ cursor = cursor->nextAncestor)
+ {
+ if (NULL != fromSpace[HM_HH_getDepth(cursor)])
+ {
+ assert(fromSpace[HM_HH_getDepth(cursor)] == cursor);
+ }
}
#endif
@@ -783,7 +902,11 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
* TODO: IS THIS A PROBLEM?
*/
thread->bytesSurvivedLastCollection =
- forwardHHObjptrArgs.bytesMoved + forwardHHObjptrArgs.bytesCopied;
+ forwardHHObjptrArgs.bytesMoved + forwardHHObjptrArgs.bytesCopied;
+
+ float new_rf = forwardHHObjptrArgs.entangledBytes;
+
+ s->cumulativeStatistics->approxRaceFactor = max(s->cumulativeStatistics->approxRaceFactor, new_rf);
thread->bytesAllocatedSinceLastCollection = 0;
@@ -813,10 +936,13 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
size_t sizeBefore = sizesBefore[i];
const char *sign;
size_t diff;
- if (sizeBefore > sizeAfter) {
+ if (sizeBefore > sizeAfter)
+ {
sign = "-";
diff = sizeBefore - sizeAfter;
- } else {
+ }
+ else
+ {
sign = "+";
diff = sizeAfter - sizeBefore;
}
@@ -831,11 +957,16 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
}
}
- if (totalSizeAfter > totalSizeBefore) {
+ s->cumulativeStatistics->bytesInScopeForLocal += totalSizeBefore;
+
+ if (totalSizeAfter > totalSizeBefore)
+ {
// whoops?
- } else {
+ }
+ else
+ {
s->cumulativeStatistics->bytesReclaimedByLocal +=
- (totalSizeBefore - totalSizeAfter);
+ (totalSizeBefore - totalSizeAfter);
}
/* enter statistics if necessary */
@@ -853,8 +984,10 @@ void HM_HHC_collectLocal(uint32_t desiredScope) {
// (int)thread->minLocalCollectionDepth);
// }
- if (needGCTime(s)) {
- if (detailedGCTime(s)) {
+ if (needGCTime(s))
+ {
+ if (detailedGCTime(s))
+ {
stopTiming(RUSAGE_THREAD, &ru_start, &s->cumulativeStatistics->ru_gcHHLocal);
}
/*
@@ -881,7 +1014,7 @@ bool isObjptrInToSpace(objptr op, struct ForwardHHObjptrArgs *args)
HM_chunk c = HM_getChunkOf(objptrToPointer(op, NULL));
HM_HierarchicalHeap levelHead = HM_getLevelHeadPathCompress(c);
uint32_t depth = HM_HH_getDepth(levelHead);
- assert(depth <= args->maxDepth);
+ // assert(depth <= args->maxDepth);
assert(NULL != levelHead);
return args->toSpace[depth] == levelHead;
@@ -890,15 +1023,27 @@ bool isObjptrInToSpace(objptr op, struct ForwardHHObjptrArgs *args)
/* ========================================================================= */
objptr relocateObject(
- GC_state s,
- objptr op,
- HM_HierarchicalHeap tgtHeap,
- struct ForwardHHObjptrArgs *args)
+ GC_state s,
+ objptr op,
+ HM_HierarchicalHeap tgtHeap,
+ struct ForwardHHObjptrArgs *args,
+ bool *relocSuccess)
{
+ *relocSuccess = true;
pointer p = objptrToPointer(op, NULL);
-
assert(!hasFwdPtr(p));
assert(HM_HH_isLevelHead(tgtHeap));
+ GC_header header = getHeader(p);
+ assert (!isFwdHeader(header));
+
+ if (pinType(header) != PIN_NONE)
+ {
+ // object is pinned, so can't relocate
+ // this case must happen from a down pointer or as a down pointer.
+ *relocSuccess = false;
+ assert(args->concurrent);
+ return op;
+ }
HM_chunkList tgtChunkList = HM_HH_getChunkList(tgtHeap);
@@ -908,37 +1053,57 @@ objptr relocateObject(
/* compute object size and bytes to be copied */
computeObjectCopyParameters(s,
+ header,
p,
&objectBytes,
©Bytes,
&metaDataBytes);
- if (!HM_getChunkOf(p)->mightContainMultipleObjects) {
+ if (!HM_getChunkOf(p)->mightContainMultipleObjects)
+ {
/* This chunk contains *only* this object, so no need to copy. Instead,
* just move the chunk. Don't forget to update the levelHead, too! */
HM_chunk chunk = HM_getChunkOf(p);
- HM_unlinkChunk(HM_HH_getChunkList(HM_getLevelHead(chunk)), chunk);
+ HM_unlinkChunkPreserveLevelHead(HM_HH_getChunkList(HM_getLevelHead(chunk)), chunk);
HM_appendChunk(tgtChunkList, chunk);
chunk->levelHead = HM_HH_getUFNode(tgtHeap);
LOG(LM_HH_COLLECTION, LL_DEBUGMORE,
- "Moved single-object chunk %p of size %zu",
- (void*)chunk,
- HM_getChunkSize(chunk));
+ "Moved single-object chunk %p of size %zu",
+ (void *)chunk,
+ HM_getChunkSize(chunk));
args->bytesMoved += copyBytes;
args->objectsMoved++;
return op;
}
+ /* Otherwise try copying the object */
pointer copyPointer = copyObject(p - metaDataBytes,
objectBytes,
copyBytes,
tgtHeap);
/* Store the forwarding pointer in the old object metadata. */
- *(getFwdPtrp(p)) = pointerToObjptr (copyPointer + metaDataBytes,
- NULL);
- assert (hasFwdPtr(p));
+ objptr newPointer = pointerToObjptr(copyPointer + metaDataBytes, NULL);
+ if (!args->concurrent)
+ {
+ assert(!isPinned(op));
+ assert (__sync_bool_compare_and_swap(getFwdPtrp(p), header, newPointer));
+ *(getFwdPtrp(p)) = newPointer;
+ }
+ else
+ {
+ bool success = __sync_bool_compare_and_swap(getFwdPtrp(p), header, newPointer);
+ if (!success)
+ {
+ delLastObj(newPointer, objectBytes, tgtHeap);
+ assert(isPinned(op));
+ *relocSuccess = false;
+ return op;
+ }
+ }
+ assert (getFwdPtr(p) == newPointer);
+ assert(hasFwdPtr(p));
args->bytesCopied += copyBytes;
args->objectsCopied++;
@@ -1065,41 +1230,369 @@ void copySuspect(
assert(isObjptr(op));
pointer p = objptrToPointer(op, NULL);
objptr new_ptr = op;
- if (hasFwdPtr(p)) {
+ if (hasFwdPtr(p))
+ {
new_ptr = getFwdPtr(p);
}
- else if (!isPinned(op)) {
+ else if (!isPinned(op))
+ {
/* the suspect does not have a fwd-ptr and is not pinned
* ==> its garbage, so skip it
*/
return;
}
uint32_t opDepth = args->toDepth;
- if (NULL == args->toSpace[opDepth])
+ HM_storeInChunkListWithPurpose(
+ HM_HH_getSuspects(toSpaceHH(s, args, opDepth)),
+ &new_ptr,
+ sizeof(objptr),
+ BLOCK_FOR_SUSPECTS);
+}
+
+bool headerForwarded(GC_header h)
+{
+ return (!(GC_VALID_HEADER_MASK & h));
+}
+
+void markAndAdd(
+ GC_state s,
+ objptr *opp,
+ objptr op,
+ void *rawArgs)
+{
+ struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+ pointer p = objptrToPointer(op, NULL);
+ HM_chunk chunk = HM_getChunkOf(p);
+ uint32_t opDepth = HM_HH_getDepth(HM_getLevelHead(chunk));
+ bool isInToSpace = isObjptrInToSpace(op, args);
+ if ((opDepth > args->maxDepth) || (opDepth < args->minDepth))
+ {
+ /*object is outside the scope of collection*/
+ return;
+ }
+ else if (isInToSpace) {
+ assert(!hasFwdPtr(p));
+ return;
+ }
+ else if (args->fromSpace[opDepth] != HM_getLevelHead(chunk))
+ {
+ /*object is outside the scope of collection*/
+ return;
+ }
+
+ if (hasFwdPtr(p))
+ {
+ objptr fop = getFwdPtr(p);
+ assert(!hasFwdPtr(objptrToPointer(fop, NULL)));
+ assert(isObjptrInToSpace(fop, args));
+ assert(HM_getObjptrDepth(fop) == opDepth);
+ *opp = fop; // SAM_UNSAFE :: potential bug here because race with reader
+ return;
+ }
+ else if (CC_isPointerMarked(p)) {
+ assert (pinType(getHeader(p)) == PIN_ANY);
+ return;
+ }
+
+ disentangleObject(s, op, opDepth);
+ enum PinType pt = pinType(getHeader(p));
+
+ if (pt == PIN_DOWN)
+ {
+ // it is okay to not trace PIN_DOWN objects because the remembered set will have them
+ // and we will definitely trace; this relies on the failure of unpinning in disentangleObject.
+ // it is dangerous to skip PIN_ANY objects here because the remSet entry for them might be created
+ // concurrently to LGC and LGC may miss them.
+ return;
+ }
+ else
+ {
+ assert (!CC_isPointerMarked(p));
+ assert(!hasFwdPtr(p));
+ assert(args->concurrent);
+ HM_HierarchicalHeap tgtHeap = toSpaceHH(s, args, opDepth);
+ assert(p == objptrToPointer(op, NULL));
+ bool relocateSuccess;
+ objptr op_new = relocateObject(s, op, tgtHeap, args, &relocateSuccess);
+ if (relocateSuccess)
+ {
+ chunk->retireChunk = true;
+ *opp = op_new;
+ assert(!hasFwdPtr(objptrToPointer(op_new, NULL)));
+ CC_workList_push(s, &(args->worklist), op_new);
+ }
+ else
+ {
+ assert (isPinned(op));
+ assert (pinType(getHeader(p)) == PIN_ANY);
+ // this is purely an optimization to prevent retracing of PIN_ANY objects
+ // so it is okay if this header read is racy. worst case the object is retraced.
+ addEntangledToRemSet(s, op, opDepth, args);
+
+ if (!chunk->pinnedDuringCollection)
+ {
+ chunk->pinnedDuringCollection = TRUE;
+
+ if (chunk->levelHead != HM_HH_getUFNode(args->fromSpace[opDepth]))
+ {
+ chunk->levelHead = HM_HH_getUFNode(args->fromSpace[opDepth]);
+ }
+ HM_unlinkChunkPreserveLevelHead(
+ HM_HH_getChunkList(args->fromSpace[opDepth]),
+ chunk);
+ HM_appendChunk(&(args->pinned[opDepth]), chunk);
+ }
+ CC_workList_push(s, &(args->worklist), op);
+ }
+ }
+ return;
+}
+
+void unmarkAndAdd(
+ GC_state s,
+ __attribute__((unused)) objptr *opp,
+ objptr op,
+ void *rawArgs)
+{
+ struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+ pointer p = objptrToPointer(op, NULL);
+ HM_chunk chunk = HM_getChunkOf(p);
+ uint32_t opDepth = HM_HH_getDepth(HM_getLevelHead(chunk));
+ assert(!hasFwdPtr(p));
+ if ((opDepth > args->maxDepth) || (opDepth < args->minDepth))
+ {
+ return;
+ }
+ else if (args->fromSpace[opDepth] != HM_getLevelHead(chunk) && !isObjptrInToSpace(op, args))
+ {
+ return;
+ }
+ else if (CC_isPointerMarked(p))
+ {
+ markObj(p);
+ CC_workList_push(s, &(args->worklist), op);
+ }
+}
+
+void unmark(
+ GC_state s,
+ __attribute__((unused)) objptr *opp,
+ objptr op,
+ void *rawArgs)
+{
+ struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+ pointer p = objptrToPointer(op, NULL);
+ HM_chunk chunk = HM_getChunkOf(p);
+ uint32_t opDepth = HM_HH_getDepth(HM_getLevelHead(chunk));
+ assert(!hasFwdPtr(p));
+ if ((opDepth > args->maxDepth) || (opDepth < args->minDepth))
+ {
+ return;
+ }
+ else if (args->fromSpace[opDepth] != HM_getLevelHead(chunk))
+ {
+ return;
+ }
+ else if (CC_isPointerMarked(p))
+ {
+ markObj(p);
+ struct GC_foreachObjptrClosure unmarkClosure = {.fun = unmark, .env = rawArgs};
+ foreachObjptrInObject(s, p, &trueObjptrPredicateClosure, &unmarkClosure, FALSE);
+ }
+}
+
+void phaseLoop(GC_state s, void *rawArgs, GC_foreachObjptrClosure fClosure)
+{
+ struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+
+ CC_workList worklist = &(args->worklist);
+ objptr *current = CC_workList_pop(s, worklist);
+ while (NULL != current)
{
- args->toSpace[opDepth] = HM_HH_new(s, opDepth);
+ callIfIsObjptr(s, fClosure, current);
+ current = CC_workList_pop(s, worklist);
}
- HM_storeInchunkList(HM_HH_getSuspects(args->toSpace[opDepth]), &new_ptr, sizeof(objptr));
+ assert(CC_workList_isEmpty(s, worklist));
}
-void tryUnpinOrKeepPinned(GC_state s, HM_remembered remElem, void* rawArgs) {
- struct ForwardHHObjptrArgs* args = (struct ForwardHHObjptrArgs*)rawArgs;
+void addEntangledToRemSet(
+ GC_state s,
+ objptr op,
+ uint32_t opDepth,
+ struct ForwardHHObjptrArgs *args) {
+ pointer p = objptrToPointer(op, NULL);
+ GC_header header = getHeader(p);
+
+ if (pinType(header) == PIN_ANY && !CC_isPointerMarked(p))
+ {
+ markObj(p);
+ struct HM_remembered remElem_ = {.object = op, .from = BOGUS_OBJPTR};
+ HM_remember (HM_HH_getRemSet(toSpaceHH(s, args, opDepth)), &remElem_, true);
+
+ size_t metaDataBytes;
+ size_t objectBytes;
+ size_t copyBytes;
+
+ /* compute object size and bytes to be copied */
+ computeObjectCopyParameters(s,
+ header,
+ p,
+ &objectBytes,
+ ©Bytes,
+ &metaDataBytes);
+ args->entangledBytes += copyBytes;
+ }
+}
+
+void LGC_markAndScan(
+ GC_state s,
+ HM_remembered remElem,
+ void *rawArgs)
+{
objptr op = remElem->object;
+ pointer p = objptrToPointer(op, NULL);
+ HM_chunk chunk = HM_getChunkOf(p);
+ struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+ uint32_t opDepth = HM_HH_getDepth(HM_getLevelHead(chunk));
+ assert(!hasFwdPtr(p));
+ assert(isPinned(op));
+ addEntangledToRemSet(s, op, opDepth, args);
-#if ASSERT
- HM_chunk fromChunk = HM_getChunkOf(objptrToPointer(remElem->from, NULL));
- HM_HierarchicalHeap fromHH = HM_getLevelHead(fromChunk);
- assert(HM_HH_getDepth(fromHH) <= args->toDepth);
-#endif
+ if (!isObjptrInToSpace(op, args) && !chunk->pinnedDuringCollection)
+ {
+ chunk->pinnedDuringCollection = TRUE;
+
+ if (chunk->levelHead != HM_HH_getUFNode(args->fromSpace[opDepth]))
+ {
+ chunk->levelHead = HM_HH_getUFNode(args->fromSpace[opDepth]);
+ }
+ HM_unlinkChunkPreserveLevelHead(
+ HM_HH_getChunkList(args->fromSpace[opDepth]),
+ chunk);
+ HM_appendChunk(&(args->pinned[opDepth]), chunk);
+ }
+
+ CC_workList_push(s, &(args->worklist), op);
+ struct GC_foreachObjptrClosure markClosure =
+ {.fun = markAndAdd, .env = (void *)args};
+ phaseLoop(s, rawArgs, &markClosure);
+ assert(CC_workList_isEmpty(s, &(args->worklist)));
+}
+// void LGC_markAndScan(
+// GC_state s,
+// __attribute__((unused)) objptr *opp,
+// objptr op,
+// void *rawArgs)
+// {
+// struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+// pointer p = objptrToPointer(op, NULL);
+// HM_chunk chunk = HM_getChunkOf(p);
+// uint32_t opDepth = HM_HH_getDepth(HM_getLevelHead(chunk));
+// if ((opDepth > args->maxDepth) || (opDepth < args->minDepth))
+// {
+// // DOUBLE CHECK
+// return;
+// }
+// else if (args->fromSpace[opDepth] != HM_getLevelHead(chunk)) {
+// return;
+// }
+// else if (!CC_isPointerMarked(p))
+// {
+// assert(args->fromSpace[opDepth] == HM_getLevelHead(chunk));
+// markObj(p);
+// if (!chunk->pinnedDuringCollection)
+// {
+// chunk->pinnedDuringCollection = TRUE;
+
+// if (chunk->levelHead != HM_HH_getUFNode(args->fromSpace[opDepth])) {
+// chunk->levelHead = HM_HH_getUFNode(args->fromSpace[opDepth]);
+// }
+
+// // HM_unlinkChunkPreserveLevelHead(
+// // HM_HH_getChunkList(args->fromSpace[opDepth]),
+// // chunk);
+// // HM_appendChunk(&(args->pinned[opDepth]), chunk);
+
+// HM_unlinkChunkPreserveLevelHead(
+// HM_HH_getChunkList(args->fromSpace[opDepth]),
+// chunk);
+// HM_appendChunk(&(args->pinned[opDepth]), chunk);
+// }
+// struct GC_foreachObjptrClosure msClosure =
+// {.fun = LGC_markAndScan, .env = rawArgs};
+// foreachObjptrInObject(s, p, &trueObjptrPredicateClosure, &msClosure, FALSE);
+// }
+// else
+// {
+// assert(args->fromSpace[opDepth] == HM_getLevelHead(chunk));
+// assert(chunk->pinnedDuringCollection);
+// }
+// }
+
+// void unmarkLoop(
+// __attribute__((unused)) GC_state s,
+// __attribute__((unused)) objptr *opp,
+// objptr op,
+// void *rawArgs)
+// {
+// struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+// pointer p = objptrToPointer(op, NULL);
+// HM_chunk chunk = HM_getChunkOf(p);
+// uint32_t opDepth = HM_HH_getDepth(HM_getLevelHead(chunk));
+// assert(!hasFwdPtr(p));
+// if ((opDepth > args->maxDepth) || (opDepth < args->minDepth))
+// {
+// return;
+// }
+// if (CC_isPointerMarked(p))
+// {
+// markObj(p);
+// struct GC_foreachObjptrClosure unmarkClosure = {.fun = unmark, .env = rawArgs};
+// foreachObjptrInObject(s, p, &trueObjptrPredicateClosure, &unmarkClosure, FALSE);
+// }
+// }
+
+void unmarkWrapper(
+ __attribute__((unused)) GC_state s,
+ HM_remembered remElem,
+ __attribute__((unused)) void *rawArgs)
+{
+ objptr op = remElem->object;
+ pointer p = objptrToPointer (op, NULL);
+ assert (pinType(getHeader(p)) == PIN_ANY);
+ if (CC_isPointerMarked(p)) {markObj(p);}
+ // struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+ // struct GC_foreachObjptrClosure unmarkClosure =
+ // {.fun = unmarkAndAdd, .env = args};
+
+ // unmarkAndAdd(s, &(remElem->object), remElem->object, rawArgs);
+ // CC_workList_push(s, &(args->worklist), remElem->object);
+ // phaseLoop(s, rawArgs, &unmarkClosure);
+ // unmark(s, &(remElem->object), remElem->object, rawArgs);
+}
- if (!isPinned(op)) {
+void tryUnpinOrKeepPinned(GC_state s, HM_remembered remElem, void *rawArgs)
+{
+ struct ForwardHHObjptrArgs *args = (struct ForwardHHObjptrArgs *)rawArgs;
+ objptr op = remElem->object;
+
+ // #if ASSERT
+ // HM_chunk fromChunk = HM_getChunkOf(objptrToPointer(remElem->from, NULL));
+ // HM_HierarchicalHeap fromHH = HM_getLevelHead(fromChunk);
+ // assert(HM_HH_getDepth(fromHH) <= args->toDepth);
+ // #endif
+
+ if (!isPinned(op))
+ {
// If previously unpinned, then no need to remember this object.
- assert(HM_getLevelHead(fromChunk) == args->fromSpace[args->toDepth]);
+ // assert(HM_getLevelHead(fromChunk) == args->fromSpace[args->toDepth]);
LOG(LM_HH_PROMOTION, LL_INFO,
- "forgetting remset entry from "FMTOBJPTR" to "FMTOBJPTR,
- remElem->from, op);
+ "forgetting remset entry from " FMTOBJPTR " to " FMTOBJPTR,
+ remElem->from, op);
+ return;
+ } else if ((isObjptrInToSpace(op, args))) {
return;
}
@@ -1109,13 +1602,8 @@ void tryUnpinOrKeepPinned(GC_state s, HM_remembered remElem, void* rawArgs) {
* (getLevelHead, etc.), but this should be faster. The toDepth field
* is set by the loop that calls this function */
uint32_t opDepth = args->toDepth;
- HM_chunk chunk = HM_getChunkOf(objptrToPointer(op, NULL));
-
- if (NULL == args->toSpace[opDepth]) {
- args->toSpace[opDepth] = HM_HH_new(s, opDepth);
- }
-
#if ASSERT
+ HM_chunk chunk = HM_getChunkOf(objptrToPointer(op, NULL));
assert(opDepth <= args->maxDepth);
HM_HierarchicalHeap hh = HM_getLevelHead(chunk);
assert(args->fromSpace[opDepth] == hh);
@@ -1123,136 +1611,152 @@ void tryUnpinOrKeepPinned(GC_state s, HM_remembered remElem, void* rawArgs) {
assert(listContainsChunk(&(args->pinned[opDepth]), chunk));
else
assert(hhContainsChunk(args->fromSpace[opDepth], chunk));
+ assert(HM_getObjptrDepth(op) == opDepth);
+ assert(HM_getLevelHead(chunk) == args->fromSpace[opDepth]);
#endif
-#if 0
- /** If it's not in our from-space, then it's entangled.
- * KEEP THE ENTRY but don't do any of the other nasty stuff.
- *
- * SAM_NOTE: pinned chunks still have their HH set to the from-space,
- * despite living in separate chunklists.
- */
- if (opDepth > args->maxDepth || args->fromSpace[opDepth] != hh) {
- assert(s->controls->manageEntanglement);
- /** TODO: assert entangled here */
+ bool unpin = tryUnpinWithDepth(op, opDepth);
- HM_remember(HM_HH_getRemSet(args->toSpace[opDepth]), remElem);
+ if (unpin)
+ {
return;
}
-#endif
-
- assert(HM_getObjptrDepth(op) == opDepth);
- assert(HM_getLevelHead(chunk) == args->fromSpace[opDepth]);
uint32_t unpinDepth = unpinDepthOf(op);
- uint32_t fromDepth = HM_getObjptrDepth(remElem->from);
-
- assert(fromDepth <= opDepth);
- if (opDepth <= unpinDepth) {
- unpinObject(op);
- assert(fromDepth == opDepth);
-
- LOG(LM_HH_PROMOTION, LL_INFO,
- "forgetting remset entry from "FMTOBJPTR" to "FMTOBJPTR,
- remElem->from, op);
-
- return;
- }
-
- if (fromDepth > unpinDepth) {
- /** If this particular remembered entry came from deeper than some other
- * down-pointer, then we don't need to keep it around. There will be some
- * other remembered entry coming from the unpinDepth level.
- *
- * But note that it is very important that the condition is a strict
- * inequality: we need to keep all remembered entries that came from the
- * same shallowest level. (CC-chaining depends on this.)
- */
+ if (remElem->from != BOGUS_OBJPTR)
+ {
+ uint32_t fromDepth = HM_getObjptrDepth(remElem->from);
+ assert(fromDepth <= opDepth);
+ if (fromDepth > unpinDepth)
+ {
+ /** If this particular remembered entry came from deeper than some other
+ * down-pointer, then we don't need to keep it around. There will be some
+ * other remembered entry coming from the unpinDepth level.
+ *
+ * But note that it is very important that the condition is a strict
+ * inequality: we need to keep all remembered entries that came from the
+ * same shallowest level. (CC-chaining depends on this.)
+ */
- LOG(LM_HH_PROMOTION, LL_INFO,
- "forgetting remset entry from "FMTOBJPTR" to "FMTOBJPTR,
- remElem->from, op);
+ LOG(LM_HH_PROMOTION, LL_INFO,
+ "forgetting remset entry from " FMTOBJPTR " to " FMTOBJPTR,
+ remElem->from, op);
- return;
+ return;
+ }
}
- assert(fromDepth == unpinDepth);
-
/* otherwise, object stays pinned, and we have to scavenge this remembered
- * entry into the toSpace. */
-
- HM_remember(HM_HH_getRemSet(args->toSpace[opDepth]), remElem);
-
- if (chunk->pinnedDuringCollection) {
- return;
+ * entry into the toSpace.
+ * Entangled entries are added later because we use mark&sweep on them
+ * and use the rememebered set later for unmarking.
+ */
+ if (remElem->from != BOGUS_OBJPTR) {
+ HM_remember(HM_HH_getRemSet(toSpaceHH(s, args, opDepth)), remElem, false);
}
+ // if (remElem->from != BOGUS_OBJPTR) {
+ // uint32_t fromDepth = HM_getObjptrDepth(remElem->from);
+ // if ((fromDepth <= args->maxDepth) && (fromDepth >= args->minDepth)) {
+ // HM_chunk chunk = HM_getChunkOf(objptrToPointer(op, NULL));
+ // uint32_t opDepth = HM_HH_getDepth(HM_getLevelHead(chunk));
+ // /* if this is a down-ptr completely inside the scope,
+ // * no need to in-place things reachable from it
+ // */
+ // if (!chunk->pinnedDuringCollection)
+ // {
+ // chunk->pinnedDuringCollection = TRUE;
+ // HM_unlinkChunkPreserveLevelHead(
+ // HM_HH_getChunkList(args->fromSpace[opDepth]),
+ // chunk);
+ // HM_appendChunk(&(args->pinned[opDepth]), chunk);
+ // }
+ // return;
+ // }
+ // }
+ LGC_markAndScan(s, remElem, rawArgs);
- chunk->pinnedDuringCollection = TRUE;
- assert(hhContainsChunk(args->fromSpace[opDepth], chunk));
- assert(HM_getLevelHead(chunk) == args->fromSpace[opDepth]);
+ // LGC_markAndScan(s, &(remElem->from), rawArgs);
- HM_unlinkChunkPreserveLevelHead(
- HM_HH_getChunkList(args->fromSpace[opDepth]),
- chunk);
- HM_appendChunk(&(args->pinned[opDepth]), chunk);
+ // if (chunk->pinnedDuringCollection)
+ // {
+ // return;
+ // }
+ // if (chunk->levelHead != HM_HH_getUFNode(args->fromSpace[opDepth]))
+ // {
+ // chunk->levelHead = HM_HH_getUFNode(args->fromSpace[opDepth]);
+ // }
- assert(HM_getLevelHead(chunk) == args->fromSpace[opDepth]);
+ // HM_unlinkChunkPreserveLevelHead(
+ // HM_HH_getChunkList(args->fromSpace[opDepth]),
+ // chunk);
+ // HM_appendChunk(&(args->pinned[opDepth]), chunk);
+ // chunk->pinnedDuringCollection = TRUE;
+ // assert(HM_getLevelHead(chunk) == args->fromSpace[opDepth]);
+ // /** stronger version of previous assertion, needed for safe freeing of
+ // * hh dependants after LGC completes
+ // */
+ // assert(chunk->levelHead == HM_HH_getUFNode(args->fromSpace[opDepth]));
}
/* ========================================================================= */
-void forwardObjptrsOfRemembered(GC_state s, HM_remembered remElem, void* rawArgs) {
+// JATIN_TODO: CHANGE NAME TO FORWARD FROM
+/// COULD BE HEKLPFUL FOR DEGBUGGING TO FORWARD THE OBJECTS ANYWAY
+void forwardFromObjsOfRemembered(GC_state s, HM_remembered remElem, void *rawArgs)
+{
+#if ASSERT
objptr op = remElem->object;
-
assert(isPinned(op));
+#endif
- struct GC_foreachObjptrClosure closure =
- {.fun = forwardHHObjptr, .env = rawArgs};
-
- foreachObjptrInObject(
- s,
- objptrToPointer(op, NULL),
- &trueObjptrPredicateClosure,
- &closure,
- FALSE
- );
+ // struct GC_foreachObjptrClosure closure =
+ // {.fun = forwardHHObjptr, .env = rawArgs};
+ // foreachObjptrInObject(
+ // s,
+ // objptrToPointer(op, NULL),
+ // &trueObjptrPredicateClosure,
+ // &closure,
+ // FALSE);
+ assert (remElem->from != BOGUS_OBJPTR);
forwardHHObjptr(s, &(remElem->from), remElem->from, rawArgs);
}
/* ========================================================================= */
void forwardHHObjptr(
- GC_state s,
- objptr* opp,
- objptr op,
- void* rawArgs)
+ GC_state s,
+ objptr *opp,
+ objptr op,
+ void *rawArgs)
{
- struct ForwardHHObjptrArgs* args = ((struct ForwardHHObjptrArgs*)(rawArgs));
- pointer p = objptrToPointer (op, NULL);
+ struct ForwardHHObjptrArgs *args = ((struct ForwardHHObjptrArgs *)(rawArgs));
+ pointer p = objptrToPointer(op, NULL);
assert(args->toDepth == HM_HH_INVALID_DEPTH);
- if (DEBUG_DETAILED) {
- fprintf (stderr,
- "forwardHHObjptr opp = "FMTPTR" op = "FMTOBJPTR" p = "
- ""FMTPTR"\n",
- (uintptr_t)opp,
- op,
- (uintptr_t)p);
+ if (DEBUG_DETAILED)
+ {
+ fprintf(stderr,
+ "forwardHHObjptr opp = " FMTPTR " op = " FMTOBJPTR " p = "
+ "" FMTPTR "\n",
+ (uintptr_t)opp,
+ op,
+ (uintptr_t)p);
}
LOG(LM_HH_COLLECTION, LL_DEBUGMORE,
- "opp = "FMTPTR" op = "FMTOBJPTR" p = "FMTPTR,
+ "opp = " FMTPTR " op = " FMTOBJPTR " p = " FMTPTR,
(uintptr_t)opp,
op,
(uintptr_t)p);
- if (!isObjptr(op) || isObjptrInRootHeap(s, op)) {
+ if (!isObjptr(op) || isObjptrInRootHeap(s, op))
+ {
/* does not point to an HH objptr, so not in scope for collection */
LOG(LM_HH_COLLECTION, LL_DEBUGMORE,
- "skipping opp = "FMTPTR" op = "FMTOBJPTR" p = "FMTPTR": not in HH.",
+ "skipping opp = " FMTPTR " op = " FMTOBJPTR " p = " FMTPTR ": not in HH.",
(uintptr_t)opp,
op,
(uintptr_t)p);
@@ -1261,71 +1765,84 @@ void forwardHHObjptr(
uint32_t opDepth = HM_getObjptrDepthPathCompress(op);
- if (opDepth > args->maxDepth) {
- DIE("entanglement detected during collection: %p is at depth %u, below %u",
- (void *)p,
- opDepth,
- args->maxDepth);
- }
+ // if (opDepth > args->maxDepth)
+ // {
+ // DIE("entanglement detected during collection: %p is at depth %u, below %u",
+ // (void *)p,
+ // opDepth,
+ // args->maxDepth);
+ // }
/* RAM_NOTE: This is more nuanced with non-local collection */
if ((opDepth > args->maxDepth) ||
/* cannot forward any object below 'args->minDepth' */
- (opDepth < args->minDepth)) {
- LOG(LM_HH_COLLECTION, LL_DEBUGMORE,
- "skipping opp = "FMTPTR" op = "FMTOBJPTR" p = "FMTPTR
- ": depth %d not in [minDepth %d, maxDepth %d].",
- (uintptr_t)opp,
- op,
- (uintptr_t)p,
- opDepth,
- args->minDepth,
- args->maxDepth);
- return;
+ (opDepth < args->minDepth))
+ {
+ LOG(LM_HH_COLLECTION, LL_DEBUGMORE,
+ "skipping opp = " FMTPTR " op = " FMTOBJPTR " p = " FMTPTR
+ ": depth %d not in [minDepth %d, maxDepth %d].",
+ (uintptr_t)opp,
+ op,
+ (uintptr_t)p,
+ opDepth,
+ args->minDepth,
+ args->maxDepth);
+ return;
}
assert(HM_getObjptrDepth(op) >= args->minDepth);
- if (isObjptrInToSpace(op, args)) {
+ if (isObjptrInToSpace(op, args))
+ {
assert(!hasFwdPtr(objptrToPointer(op, NULL)));
- assert(!isPinned(op));
+ // to space objects may be pinned now.
+ // assert(!isPinned(op));
+ return;
+ }
+ else if (HM_getLevelHead(HM_getChunkOf(objptrToPointer(op, NULL))) !=
+ args->fromSpace[HM_getObjptrDepth(op)])
+ {
+ // assert (!decheck(s, op));
return;
}
- /* Assert is in from space. This holds for pinned objects, too, because
- * their levelHead is still set to the fromSpace HH. (Pinned objects are
- * stored in a different chunklist during collection through.) */
- assert( HM_getLevelHead(HM_getChunkOf(objptrToPointer(op, NULL)))
- ==
- args->fromSpace[HM_getObjptrDepth(op)] );
-
- if (hasFwdPtr(p)) {
+ if (hasFwdPtr(p))
+ {
objptr fop = getFwdPtr(p);
assert(!hasFwdPtr(objptrToPointer(fop, NULL)));
assert(isObjptrInToSpace(fop, args));
assert(HM_getObjptrDepth(fop) == opDepth);
- assert(!isPinned(fop));
+ // assert(!isPinned(fop));
+ // assert(!CC_isPointerMarked(fop));
*opp = fop;
return;
}
assert(!hasFwdPtr(p));
+ if (CC_isPointerMarked(p))
+ {
+ // this object is collected in-place.
+ return;
+ }
+
/** REALLY SUBTLE. CC clears out remset entries, but can't safely perform
- * unpinning. So, there could be objects that (for the purposes of LC) are
- * semantically unpinned, but just haven't been marked as such yet. Here,
- * we are lazily checking to see if this object should have been unpinned.
- */
- if (isPinned(op) && unpinDepthOf(op) < opDepth) {
+ * unpinning. So, there could be objects that (for the purposes of LC) are
+ * semantically unpinned, but just haven't been marked as such yet. Here,
+ * we are lazily checking to see if this object should have been unpinned.
+ */
+ if (isPinned(op) && unpinDepthOf(op) < opDepth)
+ {
// This is a truly pinned object
- assert(listContainsChunk( &(args->pinned[opDepth]),
- HM_getChunkOf(objptrToPointer(op, NULL))
- ));
+ assert(listContainsChunk(&(args->pinned[opDepth]),
+ HM_getChunkOf(objptrToPointer(op, NULL))));
return;
}
- else {
+ else
+ {
+ disentangleObject(s, op, opDepth);
// This object should have been previously unpinned
- unpinObject(op);
+ // unpinObject(op);
}
/* ========================================================================
@@ -1348,38 +1865,38 @@ void forwardHHObjptr(
/* compute object size and bytes to be copied */
tag = computeObjectCopyParameters(s,
+ getHeader(p),
p,
&objectBytes,
©Bytes,
&metaDataBytes);
- switch (tag) {
+ switch (tag)
+ {
case STACK_TAG:
- args->stacksCopied++;
- break;
+ args->stacksCopied++;
+ break;
case WEAK_TAG:
- die(__FILE__ ":%d: "
- "forwardHHObjptr() does not support WEAK_TAG objects!",
- __LINE__);
- break;
+ die(__FILE__ ":%d: "
+ "forwardHHObjptr() does not support WEAK_TAG objects!",
+ __LINE__);
+ break;
default:
- break;
+ break;
}
- HM_HierarchicalHeap tgtHeap = args->toSpace[opDepth];
- if (tgtHeap == NULL) {
- /* Level does not exist, so create it */
- tgtHeap = HM_HH_new(s, opDepth);
- args->toSpace[opDepth] = tgtHeap;
- }
+ HM_HierarchicalHeap tgtHeap = toSpaceHH(s, args, opDepth);
assert(p == objptrToPointer(op, NULL));
/* use the forwarding pointer */
- *opp = relocateObject(s, op, tgtHeap, args);
+ bool relocateSuccess;
+ assert(!args->concurrent);
+ *opp = relocateObject(s, op, tgtHeap, args, &relocateSuccess);
+ assert(relocateSuccess);
}
LOG(LM_HH_COLLECTION, LL_DEBUGMORE,
- "opp "FMTPTR" set to "FMTOBJPTR,
+ "opp " FMTPTR " set to " FMTOBJPTR,
((uintptr_t)(opp)),
*opp);
}
@@ -1387,10 +1904,11 @@ void forwardHHObjptr(
pointer copyObject(pointer p,
size_t objectSize,
size_t copySize,
- HM_HierarchicalHeap tgtHeap) {
+ HM_HierarchicalHeap tgtHeap)
+{
-// check if you can add to existing chunk --> mightContain + size
-// If not, allocate new chunk and copy.
+ // check if you can add to existing chunk --> mightContain + size
+ // If not, allocate new chunk and copy.
assert(HM_HH_isLevelHead(tgtHeap));
assert(copySize <= objectSize);
@@ -1402,24 +1920,31 @@ pointer copyObject(pointer p,
bool mustExtend = false;
HM_chunk chunk = HM_getChunkListLastChunk(tgtChunkList);
- if(chunk == NULL || !chunk->mightContainMultipleObjects){
+ if (chunk == NULL || !chunk->mightContainMultipleObjects)
+ {
mustExtend = true;
}
- else {
+ else
+ {
pointer frontier = HM_getChunkFrontier(chunk);
pointer limit = HM_getChunkLimit(chunk);
assert(frontier <= limit);
mustExtend = ((size_t)(limit - frontier) < objectSize) ||
- (frontier + GC_SEQUENCE_METADATA_SIZE
- >= (pointer)chunk + HM_BLOCK_SIZE);
+ (frontier + GC_SEQUENCE_METADATA_SIZE >= (pointer)chunk + HM_BLOCK_SIZE);
}
- if (mustExtend) {
+ if (mustExtend)
+ {
/* Need to allocate a new chunk. Safe to use the dechecker state of where
* the object came from, as all objects in the same heap can be safely
* reassigned to any dechecker state of that heap. */
- chunk = HM_allocateChunk(tgtChunkList, objectSize);
- if (NULL == chunk) {
+ chunk = HM_allocateChunkWithPurpose(
+ tgtChunkList,
+ objectSize,
+ BLOCK_FOR_HEAP_CHUNK);
+
+ if (NULL == chunk)
+ {
DIE("Ran out of space for Hierarchical Heap!");
}
chunk->decheckState = HM_getChunkOf(p)->decheckState;
@@ -1442,50 +1967,66 @@ pointer copyObject(pointer p,
return frontier;
}
+
+void delLastObj(objptr op, size_t objectSize, HM_HierarchicalHeap tgtHeap)
+{
+ HM_chunkList tgtChunkList = HM_HH_getChunkList(tgtHeap);
+ HM_chunk chunk = HM_getChunkOf(objptrToPointer(op, NULL));
+ assert(listContainsChunk(tgtChunkList, chunk));
+ HM_updateChunkFrontierInList(tgtChunkList, chunk, HM_getChunkFrontier(chunk) - objectSize);
+}
+
#endif /* MLTON_GC_INTERNAL_FUNCS */
-GC_objectTypeTag computeObjectCopyParameters(GC_state s, pointer p,
+GC_objectTypeTag computeObjectCopyParameters(GC_state s,
+ GC_header header,
+ pointer p,
size_t *objectSize,
size_t *copySize,
- size_t *metaDataSize) {
- GC_header header;
- GC_objectTypeTag tag;
- uint16_t bytesNonObjptrs;
- uint16_t numObjptrs;
- header = getHeader(p);
- splitHeader(s, header, &tag, NULL, &bytesNonObjptrs, &numObjptrs);
-
- /* Compute the space taken by the metadata and object body. */
- if ((NORMAL_TAG == tag) or (WEAK_TAG == tag)) { /* Fixed size object. */
- if (WEAK_TAG == tag) {
- die(__FILE__ ":%d: "
- "computeObjectSizeAndCopySize() #define does not support"
- " WEAK_TAG objects!",
- __LINE__);
- }
- *metaDataSize = GC_NORMAL_METADATA_SIZE;
- *objectSize = bytesNonObjptrs + (numObjptrs * OBJPTR_SIZE);
- *copySize = *objectSize;
- } else if (SEQUENCE_TAG == tag) {
- *metaDataSize = GC_SEQUENCE_METADATA_SIZE;
- *objectSize = sizeofSequenceNoMetaData (s, getSequenceLength (p),
- bytesNonObjptrs, numObjptrs);
- *copySize = *objectSize;
- } else {
- /* Stack. */
- // bool current;
- // size_t reservedNew;
- GC_stack stack;
-
- assert (STACK_TAG == tag);
- *metaDataSize = GC_STACK_METADATA_SIZE;
- stack = (GC_stack)p;
-
- /* SAM_NOTE:
- * I am disabling shrinking here because it assumes that
- * the stack is going to be copied, which doesn't work with the
- * "stacks-in-their-own-chunks" strategy.
- */
+ size_t *metaDataSize)
+{
+ GC_objectTypeTag tag;
+ uint16_t bytesNonObjptrs;
+ uint16_t numObjptrs;
+ splitHeader(s, header, &tag, NULL, &bytesNonObjptrs, &numObjptrs);
+
+ /* Compute the space taken by the metadata and object body. */
+ if ((NORMAL_TAG == tag) or (WEAK_TAG == tag))
+ { /* Fixed size object. */
+ if (WEAK_TAG == tag)
+ {
+ die(__FILE__ ":%d: "
+ "computeObjectSizeAndCopySize() #define does not support"
+ " WEAK_TAG objects!",
+ __LINE__);
+ }
+ *metaDataSize = GC_NORMAL_METADATA_SIZE;
+ *objectSize = bytesNonObjptrs + (numObjptrs * OBJPTR_SIZE);
+ *copySize = *objectSize;
+ }
+ else if (SEQUENCE_TAG == tag)
+ {
+ *metaDataSize = GC_SEQUENCE_METADATA_SIZE;
+ *objectSize = sizeofSequenceNoMetaData(s, getSequenceLength(p),
+ bytesNonObjptrs, numObjptrs);
+ *copySize = *objectSize;
+ }
+ else
+ {
+ /* Stack. */
+ // bool current;
+ // size_t reservedNew;
+ GC_stack stack;
+
+ assert(STACK_TAG == tag);
+ *metaDataSize = GC_STACK_METADATA_SIZE;
+ stack = (GC_stack)p;
+
+ /* SAM_NOTE:
+ * I am disabling shrinking here because it assumes that
+ * the stack is going to be copied, which doesn't work with the
+ * "stacks-in-their-own-chunks" strategy.
+ */
#if 0
/* RAM_NOTE: This changes with non-local collection */
/* Check if the pointer is the current stack of my processor. */
@@ -1502,34 +2043,37 @@ GC_objectTypeTag computeObjectCopyParameters(GC_state s, pointer p,
stack->reserved = reservedNew;
}
#endif
- *objectSize = sizeof (struct GC_stack) + stack->reserved;
- *copySize = sizeof (struct GC_stack) + stack->used;
- }
+ *objectSize = sizeof(struct GC_stack) + stack->reserved;
+ *copySize = sizeof(struct GC_stack) + stack->used;
+ }
- *objectSize += *metaDataSize;
- *copySize += *metaDataSize;
+ *objectSize += *metaDataSize;
+ *copySize += *metaDataSize;
- return tag;
+ return tag;
}
-
bool skipStackAndThreadObjptrPredicate(GC_state s,
pointer p,
- void* rawArgs) {
+ void *rawArgs)
+{
/* silence compliler */
((void)(s));
/* extract expected stack */
- LOCAL_USED_FOR_ASSERT const struct SSATOPredicateArgs* args =
- ((struct SSATOPredicateArgs*)(rawArgs));
+ LOCAL_USED_FOR_ASSERT const struct SSATOPredicateArgs *args =
+ ((struct SSATOPredicateArgs *)(rawArgs));
/* run through FALSE cases */
GC_header header;
header = getHeader(p);
- if (header == GC_STACK_HEADER) {
+ if (header == GC_STACK_HEADER)
+ {
assert(args->expectedStackPointer == p);
return FALSE;
- } else if (header == GC_THREAD_HEADER) {
+ }
+ else if (header == GC_THREAD_HEADER)
+ {
assert(args->expectedThreadPointer == p);
return FALSE;
}
@@ -1540,10 +2084,11 @@ bool skipStackAndThreadObjptrPredicate(GC_state s,
#if ASSERT
void checkRememberedEntry(
- __attribute__((unused)) GC_state s,
- HM_remembered remElem,
- void* args)
+ __attribute__((unused)) GC_state s,
+ HM_remembered remElem,
+ void *args)
{
+ return;
objptr object = remElem->object;
HM_HierarchicalHeap hh = (HM_HierarchicalHeap)args;
@@ -1557,11 +2102,14 @@ void checkRememberedEntry(
assert(HM_getLevelHead(theChunk) == hh);
assert(!hasFwdPtr(objptrToPointer(object, NULL)));
- assert(!hasFwdPtr(objptrToPointer(remElem->from, NULL)));
+ if (remElem->from != BOGUS_OBJPTR)
+ {
+ assert(!hasFwdPtr(objptrToPointer(remElem->from, NULL)));
- HM_chunk fromChunk = HM_getChunkOf(objptrToPointer(remElem->from, NULL));
- HM_HierarchicalHeap fromHH = HM_getLevelHead(fromChunk);
- assert(HM_HH_getDepth(fromHH) <= HM_HH_getDepth(hh));
+ HM_chunk fromChunk = HM_getChunkOf(objptrToPointer(remElem->from, NULL));
+ HM_HierarchicalHeap fromHH = HM_getLevelHead(fromChunk);
+ assert(HM_HH_getDepth(fromHH) <= HM_HH_getDepth(hh));
+ }
}
bool hhContainsChunk(HM_HierarchicalHeap hh, HM_chunk theChunk)
diff --git a/runtime/gc/hierarchical-heap-collection.h b/runtime/gc/hierarchical-heap-collection.h
index 8999c5031..a5014f559 100644
--- a/runtime/gc/hierarchical-heap-collection.h
+++ b/runtime/gc/hierarchical-heap-collection.h
@@ -14,22 +14,25 @@
* Definition of the HierarchicalHeap collection interface
*/
-
#ifndef HIERARCHICAL_HEAP_COLLECTION_H_
#define HIERARCHICAL_HEAP_COLLECTION_H_
#include "chunk.h"
+#include "cc-work-list.h"
-#if (defined (MLTON_GC_INTERNAL_TYPES))
-struct ForwardHHObjptrArgs {
- struct HM_HierarchicalHeap* hh;
+#if (defined(MLTON_GC_INTERNAL_TYPES))
+struct ForwardHHObjptrArgs
+{
+ struct HM_HierarchicalHeap *hh;
uint32_t minDepth;
uint32_t maxDepth;
uint32_t toDepth; /* if == HM_HH_INVALID_DEPTH, preserve level of the forwarded object */
/* arrays of HH objects, e.g. HM_HH_getDepth(toSpace[i]) == i */
- HM_HierarchicalHeap* fromSpace;
- HM_HierarchicalHeap* toSpace;
+ HM_HierarchicalHeap *fromSpace;
+ HM_HierarchicalHeap *toSpace;
+ pointer *toSpaceStart;
+ HM_chunk *toSpaceStartChunk;
/* an array of pinned chunklists */
struct HM_chunkList *pinned;
@@ -37,18 +40,24 @@ struct ForwardHHObjptrArgs {
objptr containingObject;
size_t bytesCopied;
+ size_t entangledBytes;
uint64_t objectsCopied;
uint64_t stacksCopied;
/* large objects are "moved" (rather than copied). */
size_t bytesMoved;
uint64_t objectsMoved;
+
+ /*worklist for mark and scan*/
+ struct CC_workList worklist;
+ bool concurrent;
};
-struct checkDEDepthsArgs {
+struct checkDEDepthsArgs
+{
int32_t minDisentangledDepth;
- HM_HierarchicalHeap* fromSpace;
- HM_HierarchicalHeap* toSpace;
+ HM_HierarchicalHeap *fromSpace;
+ HM_HierarchicalHeap *toSpace;
uint32_t maxDepth;
};
@@ -56,10 +65,10 @@ struct checkDEDepthsArgs {
#endif /* MLTON_GC_INTERNAL_TYPES */
-#if (defined (MLTON_GC_INTERNAL_BASIS))
+#if (defined(MLTON_GC_INTERNAL_BASIS))
#endif /* MLTON_GC_INTERNAL_BASIS */
-#if (defined (MLTON_GC_INTERNAL_FUNCS))
+#if (defined(MLTON_GC_INTERNAL_FUNCS))
/**
* This function performs a local collection on the current hierarchical heap
*/
@@ -73,12 +82,12 @@ void HM_HHC_collectLocal(uint32_t desiredScope);
* @param opp The objptr to forward
* @param args The struct ForwardHHObjptrArgs* for this call, cast as a void*
*/
-void forwardHHObjptr(GC_state s, objptr* opp, objptr op, void* rawArgs);
+void forwardHHObjptr(GC_state s, objptr *opp, objptr op, void *rawArgs);
/* check if `op` is in args->toSpace[depth(op)] */
bool isObjptrInToSpace(objptr op, struct ForwardHHObjptrArgs *args);
-objptr relocateObject(GC_state s, objptr obj, HM_HierarchicalHeap tgtHeap, struct ForwardHHObjptrArgs *args);
+objptr relocateObject(GC_state s, objptr obj, HM_HierarchicalHeap tgtHeap, struct ForwardHHObjptrArgs *args, bool *relocSuccess);
pointer copyObject(pointer p, size_t objectSize, size_t copySize, HM_HierarchicalHeap tgtHeap);
#endif /* MLTON_GC_INTERNAL_FUNCS */
diff --git a/runtime/gc/hierarchical-heap-ebr.c b/runtime/gc/hierarchical-heap-ebr.c
index 8aa1bb666..ff96120ba 100644
--- a/runtime/gc/hierarchical-heap-ebr.c
+++ b/runtime/gc/hierarchical-heap-ebr.c
@@ -1,4 +1,5 @@
/* Copyright (C) 2021 Sam Westrick
+ * Copyright (C) 2022 Jatin Arora
*
* MLton is released under a HPND-style license.
* See the file MLton-LICENSE for details.
@@ -6,153 +7,28 @@
#if (defined (MLTON_GC_INTERNAL_FUNCS))
-/** Helpers for packing/unpacking announcements. DEBRA packs epochs with a
- * "quiescent" bit, the idea being that processors should set the bit during
- * quiescent periods (between operations) and have it unset otherwise (i.e.
- * during an operation). Being precise about quiescent periods in this way
- * is helpful for reclamation, because in order to advance the epoch, all we
- * need to know is that every processor has been in a quiescent period since
- * the beginning of the last epoch.
- *
- * But note that updating the quiescent bits is only efficient if we can
- * amortize the cost of the setting/unsetting the bit with other nearby
- * operations. If we assumed that the typical state for each processor
- * is quiescent and then paid for non-quiescent periods, this would
- * be WAY too expensive. In our case, processors are USUALLY NON-QUIESCENT,
- * due to depth queries at the write-barrier.
- *
- * So
- */
-#define PACK(epoch, qbit) ((((size_t)(epoch)) << 1) | ((qbit) & 1))
-#define UNPACK_EPOCH(announcement) ((announcement) >> 1)
-#define UNPACK_QBIT(announcement) ((announcement) & 1)
-#define SET_Q_TRUE(announcement) ((announcement) | (size_t)1)
-#define SET_Q_FALSE(announcement) ((announcement) & (~(size_t)1))
-
-#define ANNOUNCEMENT_PADDING 16
-
-static inline size_t getAnnouncement(GC_state s, uint32_t pid) {
- return s->hhEBR->announce[ANNOUNCEMENT_PADDING*pid];
-}
-
-static inline void setAnnouncement(GC_state s, uint32_t pid, size_t ann) {
- s->hhEBR->announce[ANNOUNCEMENT_PADDING*pid] = ann;
-}
-
void HH_EBR_enterQuiescentState(GC_state s) {
- uint32_t mypid = s->procNumber;
- setAnnouncement(s, mypid, SET_Q_TRUE(getAnnouncement(s, mypid)));
+ EBR_enterQuiescentState(s, s->hhEBR);
}
-static void rotateAndReclaim(GC_state s) {
- HH_EBR_shared ebr = s->hhEBR;
- uint32_t mypid = s->procNumber;
-
- int limboIdx = (ebr->local[mypid].limboIdx + 1) % 3;
- ebr->local[mypid].limboIdx = limboIdx;
- HM_chunkList limboBag = &(ebr->local[mypid].limboBags[limboIdx]);
-
- // Free all HH records in the limbo bag.
- for (HM_chunk chunk = HM_getChunkListFirstChunk(limboBag);
- NULL != chunk;
- chunk = chunk->nextChunk)
- {
- for (pointer p = HM_getChunkStart(chunk);
- p < HM_getChunkFrontier(chunk);
- p += sizeof(HM_UnionFindNode *))
- {
- freeFixedSize(getUFAllocator(s), *(HM_UnionFindNode*)p);
- }
- }
-
- HM_freeChunksInList(s, limboBag);
- HM_initChunkList(limboBag); // clear it out
+void freeUnionFind (GC_state s, void *ptr) {
+ HM_UnionFindNode hufp = (HM_UnionFindNode)ptr;
+ assert(hufp->payload != NULL);
+ freeFixedSize(getHHAllocator(s), hufp->payload);
+ freeFixedSize(getUFAllocator(s), hufp);
}
-
void HH_EBR_init(GC_state s) {
- HH_EBR_shared ebr = malloc(sizeof(struct HH_EBR_shared));
- s->hhEBR = ebr;
-
- ebr->epoch = 0;
- ebr->announce =
- malloc(s->numberOfProcs * ANNOUNCEMENT_PADDING * sizeof(size_t));
- ebr->local =
- malloc(s->numberOfProcs * sizeof(struct HH_EBR_local));
-
- for (uint32_t i = 0; i < s->numberOfProcs; i++) {
- // Everyone starts by announcing epoch = 0 and is non-quiescent
- setAnnouncement(s, i, PACK(0,0));
- ebr->local[i].limboIdx = 0;
- ebr->local[i].checkNext = 0;
- for (int j = 0; j < 3; j++)
- HM_initChunkList(&(ebr->local[i].limboBags[j]));
- }
+ s->hhEBR = EBR_new(s, &freeUnionFind);
}
void HH_EBR_leaveQuiescentState(GC_state s) {
- HH_EBR_shared ebr = s->hhEBR;
- uint32_t mypid = s->procNumber;
- uint32_t numProcs = s->numberOfProcs;
-
- size_t globalEpoch = ebr->epoch;
- size_t myann = getAnnouncement(s, mypid);
- size_t myEpoch = UNPACK_EPOCH(myann);
- assert(globalEpoch >= myEpoch);
-
- if (myEpoch != globalEpoch) {
- ebr->local[mypid].checkNext = 0;
- /** Advance into the current epoch. To do so, we need to clear the limbo
- * bag of the epoch we're moving into.
- */
- rotateAndReclaim(s);
- }
-
- uint32_t otherpid = (ebr->local[mypid].checkNext) % numProcs;
- size_t otherann = getAnnouncement(s, otherpid);
- if ( UNPACK_EPOCH(otherann) == globalEpoch || UNPACK_QBIT(otherann) ) {
- uint32_t c = ++ebr->local[mypid].checkNext;
- if (c >= numProcs) {
- __sync_val_compare_and_swap(&(ebr->epoch), globalEpoch, globalEpoch+1);
- }
- }
-
- setAnnouncement(s, mypid, PACK(globalEpoch, 0));
+ EBR_leaveQuiescentState(s, s->hhEBR);
}
-
void HH_EBR_retire(GC_state s, HM_UnionFindNode hhuf) {
- HH_EBR_shared ebr = s->hhEBR;
- uint32_t mypid = s->procNumber;
- int limboIdx = ebr->local[mypid].limboIdx;
- HM_chunkList limboBag = &(ebr->local[mypid].limboBags[limboIdx]);
- HM_chunk chunk = HM_getChunkListLastChunk(limboBag);
-
- // fast path: bump frontier in chunk
-
- if (NULL != chunk &&
- HM_getChunkSizePastFrontier(chunk) >= sizeof(HM_UnionFindNode *))
- {
- pointer p = HM_getChunkFrontier(chunk);
- *(HM_UnionFindNode *)p = hhuf;
- HM_updateChunkFrontierInList(limboBag, chunk, p + sizeof(HM_UnionFindNode *));
- return;
- }
-
- // slow path: allocate new chunk
-
- chunk = HM_allocateChunk(limboBag, sizeof(HM_UnionFindNode *));
-
- assert(NULL != chunk &&
- HM_getChunkSizePastFrontier(chunk) >= sizeof(HM_UnionFindNode *));
-
- pointer p = HM_getChunkFrontier(chunk);
- *(HM_UnionFindNode *)p = hhuf;
- HM_updateChunkFrontierInList(limboBag, chunk, p + sizeof(HM_UnionFindNode *));
- return;
+ EBR_retire(s, s->hhEBR, (void *)hhuf);
}
-
-
#endif // MLTON_GC_INTERNAL_FUNCS
diff --git a/runtime/gc/hierarchical-heap-ebr.h b/runtime/gc/hierarchical-heap-ebr.h
index 541528684..9e8792d45 100644
--- a/runtime/gc/hierarchical-heap-ebr.h
+++ b/runtime/gc/hierarchical-heap-ebr.h
@@ -10,34 +10,6 @@
#ifndef HIERARCHICAL_HEAP_EBR_H_
#define HIERARCHICAL_HEAP_EBR_H_
-#if (defined (MLTON_GC_INTERNAL_TYPES))
-
-struct HH_EBR_local {
- struct HM_chunkList limboBags[3];
- int limboIdx;
- uint32_t checkNext;
-} __attribute__((aligned(128)));
-
-// There is exactly one of these! Everyone shares a reference to it.
-typedef struct HH_EBR_shared {
- size_t epoch;
-
- // announcement array, length = num procs
- // each announcement is packed: 63 bits for epoch, 1 bit for quiescent bit
- size_t *announce;
-
- // processor-local data, length = num procs
- struct HH_EBR_local *local;
-} * HH_EBR_shared;
-
-#else
-
-struct HH_EBR_local;
-struct HH_EBR_shared;
-typedef struct HH_EBR_shared * HH_EBR_shared;
-
-#endif // MLTON_GC_INTERNAL_TYPES
-
#if (defined (MLTON_GC_INTERNAL_FUNCS))
void HH_EBR_init(GC_state s);
diff --git a/runtime/gc/hierarchical-heap.c b/runtime/gc/hierarchical-heap.c
index 492c69276..987cb26f1 100644
--- a/runtime/gc/hierarchical-heap.c
+++ b/runtime/gc/hierarchical-heap.c
@@ -230,16 +230,17 @@ HM_HierarchicalHeap HM_HH_zip(
if (depth1 == depth2)
{
- HM_appendChunkList(HM_HH_getChunkList(hh1), HM_HH_getChunkList(hh2));
- HM_appendChunkList(HM_HH_getRemSet(hh1), HM_HH_getRemSet(hh2));
- ES_move(HM_HH_getSuspects(hh1), HM_HH_getSuspects(hh2));
- linkCCChains(s, hh1, hh2);
-
// This has to happen before linkInto (which frees hh2)
HM_HierarchicalHeap hh2anc = hh2->nextAncestor;
CC_freeStack(s, HM_HH_getConcurrentPack(hh2));
+ linkCCChains(s, hh1, hh2);
linkInto(s, hh1, hh2);
+ HM_appendChunkList(HM_HH_getChunkList(hh1), HM_HH_getChunkList(hh2));
+ ES_move(HM_HH_getSuspects(hh1), HM_HH_getSuspects(hh2));
+ HM_appendRemSet(HM_HH_getRemSet(hh1), HM_HH_getRemSet(hh2));
+
+
*cursor = hh1;
cursor = &(hh1->nextAncestor);
@@ -339,6 +340,26 @@ void HM_HH_merge(
assertInvariants(parentThread);
}
+
+void HM_HH_clearSuspectsAtDepth(
+ GC_state s,
+ GC_thread thread,
+ uint32_t targetDepth)
+{
+ // walk to find heap; only clear suspects at the target depth
+ for (HM_HierarchicalHeap cursor = thread->hierarchicalHeap;
+ NULL != cursor;
+ cursor = cursor->nextAncestor)
+ {
+ uint32_t d = HM_HH_getDepth(cursor);
+ if (d <= targetDepth) {
+ if (d == targetDepth) ES_clear(s, cursor);
+ return;
+ }
+ }
+}
+
+
void HM_HH_promoteChunks(
GC_state s,
GC_thread thread)
@@ -349,7 +370,6 @@ void HM_HH_promoteChunks(
{
/* no need to do anything; this function only guarantees that the
* current depth has been completely evacuated. */
- ES_clear(s, HM_HH_getSuspects(thread->hierarchicalHeap));
return;
}
@@ -378,18 +398,20 @@ void HM_HH_promoteChunks(
if (NULL == hh->subHeapForCC) {
assert(NULL == hh->subHeapCompletedCC);
+ /* don't need the snapshot for this heap now. */
+ CC_freeStack(s, HM_HH_getConcurrentPack(hh));
+ linkCCChains(s, parent, hh);
+ linkInto(s, parent, hh);
+
HM_appendChunkList(HM_HH_getChunkList(parent), HM_HH_getChunkList(hh));
- HM_appendChunkList(HM_HH_getRemSet(parent), HM_HH_getRemSet(hh));
ES_move(HM_HH_getSuspects(parent), HM_HH_getSuspects(hh));
- linkCCChains(s, parent, hh);
+ HM_appendRemSet(HM_HH_getRemSet(parent), HM_HH_getRemSet(hh));
/* shortcut. */
thread->hierarchicalHeap = parent;
- /* don't need the snapshot for this heap now. */
- CC_freeStack(s, HM_HH_getConcurrentPack(hh));
- linkInto(s, parent, hh);
hh = parent;
}
- else {
+ else
+ {
assert(HM_getLevelHead(thread->currentChunk) == hh);
#if ASSERT
@@ -453,8 +475,7 @@ void HM_HH_promoteChunks(
assert(HM_HH_getDepth(hh) == currentDepth-1);
}
-
- ES_clear(s, HM_HH_getSuspects(thread->hierarchicalHeap));
+ assert(hh == thread->hierarchicalHeap);
#if ASSERT
assert(hh == thread->hierarchicalHeap);
@@ -465,6 +486,7 @@ void HM_HH_promoteChunks(
#endif
}
+
bool HM_HH_isLevelHead(HM_HierarchicalHeap hh)
{
return (NULL != hh)
@@ -503,7 +525,7 @@ HM_HierarchicalHeap HM_HH_new(GC_state s, uint32_t depth)
hh->heightDependants = 0;
HM_initChunkList(HM_HH_getChunkList(hh));
- HM_initChunkList(HM_HH_getRemSet(hh));
+ HM_initRemSet(HM_HH_getRemSet(hh));
HM_initChunkList(HM_HH_getSuspects(hh));
return hh;
@@ -583,7 +605,10 @@ bool HM_HH_extend(GC_state s, GC_thread thread, size_t bytesRequested)
hh = newhh;
}
- chunk = HM_allocateChunk(HM_HH_getChunkList(hh), bytesRequested);
+ chunk = HM_allocateChunkWithPurpose(
+ HM_HH_getChunkList(hh),
+ bytesRequested,
+ BLOCK_FOR_HEAP_CHUNK);
if (NULL == chunk) {
return FALSE;
@@ -596,6 +621,17 @@ bool HM_HH_extend(GC_state s, GC_thread thread, size_t bytesRequested)
#endif
chunk->levelHead = HM_HH_getUFNode(hh);
+ // hh->chunkList <--> og
+ // toList --> hh
+ // 1. in-place collection of unionFind nodes?
+ // 2. How do you make the hh fully concurrent?
+ // how do you make the union-find fully concurrent and collectible?
+ // what is the hh?? list of heaps
+ // ->
+ // ->
+ // ->
+ // ->
+ // 3.
thread->currentChunk = chunk;
HM_HH_addRecentBytesAllocated(thread, HM_getChunkSize(chunk));
@@ -674,8 +710,11 @@ void splitHeapForCC(GC_state s, GC_thread thread) {
HM_HierarchicalHeap newHH = HM_HH_new(s, HM_HH_getDepth(hh));
thread->hierarchicalHeap = newHH;
- HM_chunk chunk =
- HM_allocateChunk(HM_HH_getChunkList(newHH), GC_HEAP_LIMIT_SLOP);
+ HM_chunk chunk = HM_allocateChunkWithPurpose(
+ HM_HH_getChunkList(newHH),
+ GC_HEAP_LIMIT_SLOP,
+ BLOCK_FOR_HEAP_CHUNK);
+
chunk->levelHead = HM_HH_getUFNode(newHH);
#ifdef DETECT_ENTANGLEMENT
@@ -755,12 +794,21 @@ void mergeCompletedCCs(GC_state s, HM_HierarchicalHeap hh) {
HM_HierarchicalHeap completed = hh->subHeapCompletedCC;
while (completed != NULL) {
HM_HierarchicalHeap next = completed->subHeapCompletedCC;
+
+ /* consider using max instead of addition */
HM_HH_getConcurrentPack(hh)->bytesSurvivedLastCollection +=
HM_HH_getConcurrentPack(completed)->bytesSurvivedLastCollection;
- HM_appendChunkList(HM_HH_getChunkList(hh), HM_HH_getChunkList(completed));
- HM_appendChunkList(HM_HH_getRemSet(hh), HM_HH_getRemSet(completed));
+
+ /*
+ HM_HH_getConcurrentPack(hh)->bytesSurvivedLastCollection =
+ max(HM_HH_getConcurrentPack(hh)->bytesSurvivedLastCollection,
+ HM_HH_getConcurrentPack(completed)->bytesSurvivedLastCollection);
+ */
+
CC_freeStack(s, HM_HH_getConcurrentPack(completed));
linkInto(s, hh, completed);
+ HM_appendChunkList(HM_HH_getChunkList(hh), HM_HH_getChunkList(completed));
+ HM_appendRemSet(HM_HH_getRemSet(hh), HM_HH_getRemSet(completed));
completed = next;
}
@@ -809,6 +857,8 @@ bool checkPolicyforRoot(
}
size_t bytesSurvived = HM_HH_getConcurrentPack(hh)->bytesSurvivedLastCollection;
+
+ /* consider removing this: */
for (HM_HierarchicalHeap cursor = hh->subHeapCompletedCC;
NULL != cursor;
cursor = cursor->subHeapCompletedCC)
@@ -858,7 +908,11 @@ objptr copyCurrentStack(GC_state s, GC_thread thread) {
assert(isStackReservedAligned(s, reserved));
size_t stackSize = sizeofStackWithMetaData(s, reserved);
- HM_chunk newChunk = HM_allocateChunk(HM_HH_getChunkList(hh), stackSize);
+ HM_chunk newChunk = HM_allocateChunkWithPurpose(
+ HM_HH_getChunkList(hh),
+ stackSize,
+ BLOCK_FOR_HEAP_CHUNK);
+
if (NULL == newChunk) {
DIE("Ran out of space to copy stack!");
}
@@ -940,7 +994,7 @@ void HM_HH_cancelCC(GC_state s, pointer threadp, pointer hhp) {
mainhh->subHeapForCC = heap->subHeapForCC;
HM_appendChunkList(HM_HH_getChunkList(mainhh), HM_HH_getChunkList(heap));
- HM_appendChunkList(HM_HH_getRemSet(mainhh), HM_HH_getRemSet(heap));
+ HM_appendRemSet(HM_HH_getRemSet(mainhh), HM_HH_getRemSet(heap));
linkInto(s, mainhh, heap);
@@ -953,7 +1007,7 @@ void HM_HH_cancelCC(GC_state s, pointer threadp, pointer hhp) {
HM_HH_getConcurrentPack(mainhh)->bytesSurvivedLastCollection +=
HM_HH_getConcurrentPack(completed)->bytesSurvivedLastCollection;
HM_appendChunkList(HM_HH_getChunkList(mainhh), HM_HH_getChunkList(completed));
- HM_appendChunkList(HM_HH_getRemSet(mainhh), HM_HH_getRemSet(completed));
+ HM_appendRemSet(HM_HH_getRemSet(mainhh), HM_HH_getRemSet(completed));
linkInto(s, mainhh, completed);
completed = next;
}
@@ -1168,6 +1222,32 @@ void HM_HH_addRootForCollector(GC_state s, HM_HierarchicalHeap hh, pointer p) {
}
}
+void HM_HH_rememberAtLevel(HM_HierarchicalHeap hh, HM_remembered remElem, bool conc) {
+ assert(hh != NULL);
+ if (!conc) {
+ HM_remember(HM_HH_getRemSet(hh), remElem, conc);
+ } else {
+ HM_UnionFindNode cursor = HM_HH_getUFNode(hh);
+ while(true) {
+ while (NULL != cursor->representative) {
+ cursor = cursor->representative;
+ }
+ hh = cursor->payload;
+ if (hh == NULL){
+ /* race with a join that changed the cursor, iterate again */
+ /*should not happen if we retire hh just like ufnodes*/
+ assert (false);
+ continue;
+ }
+ HM_remember(HM_HH_getRemSet(hh), remElem, conc);
+ if (NULL == cursor->representative) {
+ return;
+ }
+ }
+ }
+
+}
+
void HM_HH_freeAllDependants(
GC_state s,
@@ -1259,7 +1339,7 @@ void HM_HH_freeAllDependants(
/*******************************/
static inline void linkInto(
- GC_state s,
+ __attribute__((unused)) GC_state s,
HM_HierarchicalHeap left,
HM_HierarchicalHeap right)
{
@@ -1279,8 +1359,9 @@ static inline void linkInto(
assert(NULL == HM_HH_getUFNode(left)->dependant2);
- HM_HH_getUFNode(right)->payload = NULL;
- freeFixedSize(getHHAllocator(s), right);
+ // HM_HH_getUFNode(right)->payload = NULL;
+ // freeFixedSize(getHHAllocator(s), right);
+ // HH_EBR_retire(s, HM_HH_getUFNode(right));
assert(HM_HH_isLevelHead(left));
}
@@ -1307,9 +1388,7 @@ void assertInvariants(GC_thread thread)
NULL != chunk;
chunk = chunk->nextChunk)
{
- assert(HM_getLevelHead(chunk) == cursor);
- assert(chunk->disentangledDepth >= 1);
- }
+ assert(HM_getLevelHead(chunk) == cursor); }
}
/* check sorted by depth */
diff --git a/runtime/gc/hierarchical-heap.h b/runtime/gc/hierarchical-heap.h
index 147377811..7add64a41 100644
--- a/runtime/gc/hierarchical-heap.h
+++ b/runtime/gc/hierarchical-heap.h
@@ -10,6 +10,7 @@
#include "chunk.h"
#include "concurrent-collection.h"
+#include "remembered-set.h"
#if (defined (MLTON_GC_INTERNAL_TYPES))
@@ -62,7 +63,7 @@ typedef struct HM_HierarchicalHeap {
struct HM_HierarchicalHeap *subHeapForCC;
struct HM_HierarchicalHeap *subHeapCompletedCC;
- struct HM_chunkList rememberedSet;
+ struct HM_remSet rememberedSet;
struct ConcurrentPackage concurrentPack;
struct HM_chunkList entanglementSuspects;
@@ -104,7 +105,7 @@ static inline HM_chunkList HM_HH_getChunkList(HM_HierarchicalHeap hh)
return &(hh->chunkList);
}
-static inline HM_chunkList HM_HH_getRemSet(HM_HierarchicalHeap hh)
+static inline HM_remSet HM_HH_getRemSet(HM_HierarchicalHeap hh)
{
return &(hh->rememberedSet);
}
@@ -122,6 +123,7 @@ bool HM_HH_isLevelHead(HM_HierarchicalHeap hh);
bool HM_HH_isCCollecting(HM_HierarchicalHeap hh);
void HM_HH_addRootForCollector(GC_state s, HM_HierarchicalHeap hh, pointer p);
+void HM_HH_rememberAtLevel(HM_HierarchicalHeap hh, HM_remembered remElem, bool conc);
void HM_HH_merge(GC_state s, GC_thread parent, GC_thread child);
void HM_HH_promoteChunks(GC_state s, GC_thread thread);
@@ -158,6 +160,11 @@ void HM_HH_resetList(pointer threadp);
void mergeCompletedCCs(GC_state s, HM_HierarchicalHeap hh);
+void HM_HH_clearSuspectsAtDepth(
+ GC_state s,
+ GC_thread thread,
+ uint32_t targetDepth);
+
/** Very fancy (constant-space) loop that frees each dependant union-find
* node of hh. Specifically, calls this on each dependant ufnode:
diff --git a/runtime/gc/init.c b/runtime/gc/init.c
index 96d87a826..a39c793d8 100644
--- a/runtime/gc/init.c
+++ b/runtime/gc/init.c
@@ -80,6 +80,49 @@ static size_t stringToBytes(const char *s) {
die ("Invalid @MLton/@mpl memory amount: %s.", s);
}
+
+static void stringToTime(const char *s, struct timespec *t) {
+ double d;
+ char *endptr;
+ size_t factor;
+
+ d = strtod (s, &endptr);
+ if (s == endptr)
+ goto bad;
+
+ switch (*endptr++) {
+ case 's':
+ factor = 1;
+ break;
+ case 'm':
+ factor = 1000;
+ break;
+ case 'u':
+ factor = 1000 * 1000;
+ break;
+ case 'n':
+ factor = 1000 * 1000 * 1000;
+ break;
+ default:
+ goto bad;
+ }
+
+ d /= (double)factor;
+ size_t sec = (size_t)d;
+ size_t nsec = (size_t)((d - (double)sec) * 1000000000.0);
+
+ unless (*endptr == '\0'
+ and 0.0 <= d)
+ goto bad;
+
+ t->tv_sec = sec;
+ t->tv_nsec = nsec;
+ return;
+
+bad:
+ die ("Invalid @MLton/@mpl time spec: %s.", s);
+}
+
/* ---------------------------------------------------------------- */
/* GC_init */
/* ---------------------------------------------------------------- */
@@ -327,6 +370,14 @@ int processAtMLton (GC_state s, int start, int argc, char **argv,
die("%s megablock-threshold must be at least 1", atName);
}
s->controls->megablockThreshold = xx;
+ } else if (0 == strcmp(arg, "block-usage-sample-interval")) {
+ i++;
+ if (i == argc || (0 == strcmp (argv[i], "--"))) {
+ die ("%s block-usage-sample-interval missing argument.", atName);
+ }
+ struct timespec tm;
+ stringToTime(argv[i++], &tm);
+ s->controls->blockUsageSampleInterval = tm;
} else if (0 == strcmp (arg, "collection-type")) {
i++;
if (i == argc || (0 == strcmp (argv[i], "--"))) {
@@ -479,7 +530,11 @@ int GC_init (GC_state s, int argc, char **argv) {
s->controls->emptinessFraction = 0.25;
s->controls->superblockThreshold = 7; // superblocks of 128 blocks
s->controls->megablockThreshold = 18;
- s->controls->manageEntanglement = FALSE;
+ s->controls->manageEntanglement = TRUE;
+
+ // default: sample block usage once a second
+ s->controls->blockUsageSampleInterval.tv_sec = 1;
+ s->controls->blockUsageSampleInterval.tv_nsec = 0;
/* Not arbitrary; should be at least the page size and must also respect the
* limit check coalescing amount in the compiler. */
@@ -509,8 +564,8 @@ int GC_init (GC_state s, int argc, char **argv) {
s->rootsLength = 0;
s->savedThread = BOGUS_OBJPTR;
- initFixedSizeAllocator(getHHAllocator(s), sizeof(struct HM_HierarchicalHeap));
- initFixedSizeAllocator(getUFAllocator(s), sizeof(struct HM_UnionFindNode));
+ initFixedSizeAllocator(getHHAllocator(s), sizeof(struct HM_HierarchicalHeap), BLOCK_FOR_HH_ALLOCATOR);
+ initFixedSizeAllocator(getUFAllocator(s), sizeof(struct HM_UnionFindNode), BLOCK_FOR_UF_ALLOCATOR);
s->numberDisentanglementChecks = 0;
s->signalHandlerThread = BOGUS_OBJPTR;
@@ -605,8 +660,10 @@ void GC_lateInit(GC_state s) {
HM_configChunks(s);
HH_EBR_init(s);
+ HM_EBR_init(s);
initLocalBlockAllocator(s, initGlobalBlockAllocator(s));
+ s->blockUsageSampler = newBlockUsageSampler(s);
s->nextChunkAllocSize = s->controls->allocChunkSize;
@@ -642,9 +699,11 @@ void GC_duplicate (GC_state d, GC_state s) {
d->wsQueueTop = BOGUS_OBJPTR;
d->wsQueueBot = BOGUS_OBJPTR;
initLocalBlockAllocator(d, s->blockAllocatorGlobal);
- initFixedSizeAllocator(getHHAllocator(d), sizeof(struct HM_HierarchicalHeap));
- initFixedSizeAllocator(getUFAllocator(d), sizeof(struct HM_UnionFindNode));
+ d->blockUsageSampler = s->blockUsageSampler;
+ initFixedSizeAllocator(getHHAllocator(d), sizeof(struct HM_HierarchicalHeap), BLOCK_FOR_HH_ALLOCATOR);
+ initFixedSizeAllocator(getUFAllocator(d), sizeof(struct HM_UnionFindNode), BLOCK_FOR_UF_ALLOCATOR);
d->hhEBR = s->hhEBR;
+ d->hmEBR = s->hmEBR;
d->nextChunkAllocSize = s->nextChunkAllocSize;
d->lastMajorStatistics = newLastMajorStatistics();
d->numberOfProcs = s->numberOfProcs;
diff --git a/runtime/gc/logger.c b/runtime/gc/logger.c
index d48a245ab..432ae0254 100644
--- a/runtime/gc/logger.c
+++ b/runtime/gc/logger.c
@@ -205,6 +205,7 @@ bool stringToLogModule(enum LogModule* module, const char* moduleString) {
struct Conversion conversions[] =
{{.string = "allocation", .module = LM_ALLOCATION},
+ {.string = "block-allocator", .module = LM_BLOCK_ALLOCATOR},
{.string = "chunk", .module = LM_CHUNK},
{.string = "chunk-pool", .module = LM_CHUNK_POOL},
{.string = "dfs-mark", .module = LM_DFS_MARK},
diff --git a/runtime/gc/logger.h b/runtime/gc/logger.h
index 5d7680a32..5a5cce8a7 100644
--- a/runtime/gc/logger.h
+++ b/runtime/gc/logger.h
@@ -18,6 +18,7 @@
enum LogModule {
LM_ALLOCATION,
+ LM_BLOCK_ALLOCATOR,
LM_CHUNK,
LM_CHUNK_POOL,
LM_DFS_MARK,
diff --git a/runtime/gc/new-object.c b/runtime/gc/new-object.c
index ed9e0d500..9083b7593 100644
--- a/runtime/gc/new-object.c
+++ b/runtime/gc/new-object.c
@@ -147,8 +147,16 @@ GC_thread newThreadWithHeap(
* yet. */
HM_HierarchicalHeap hh = HM_HH_new(s, depth);
- HM_chunk tChunk = HM_allocateChunk(HM_HH_getChunkList(hh), threadSize);
- HM_chunk sChunk = HM_allocateChunk(HM_HH_getChunkList(hh), stackSize);
+ HM_chunk tChunk = HM_allocateChunkWithPurpose(
+ HM_HH_getChunkList(hh),
+ threadSize,
+ BLOCK_FOR_HEAP_CHUNK);
+
+ HM_chunk sChunk = HM_allocateChunkWithPurpose(
+ HM_HH_getChunkList(hh),
+ stackSize,
+ BLOCK_FOR_HEAP_CHUNK);
+
if (NULL == sChunk || NULL == tChunk) {
DIE("Ran out of space for thread+stack allocation!");
}
diff --git a/runtime/gc/object.c b/runtime/gc/object.c
index 4dc652e89..91d6ccf91 100644
--- a/runtime/gc/object.c
+++ b/runtime/gc/object.c
@@ -36,7 +36,17 @@ GC_header* getHeaderp (pointer p) {
* Returns the header for the object pointed to by p.
*/
GC_header getHeader (pointer p) {
- return *(getHeaderp(p));
+ GC_header h = *(getHeaderp(p));
+ return h;
+}
+
+GC_header getRacyHeader (pointer ptr) {
+ GC_header header = getHeader(ptr);
+ while (isFwdHeader(header)) {
+ ptr = (pointer) header;
+ header = getHeader(ptr);
+ }
+ return header;
}
/*
@@ -90,6 +100,20 @@ void splitHeader(GC_state s, GC_header header,
*numObjptrsRet = numObjptrs;
}
+static inline bool isMutableH(GC_state s, GC_header header) {
+ GC_objectTypeTag tag;
+ uint16_t bytesNonObjptrs;
+ uint16_t numObjptrs;
+ bool hasIdentity;
+ splitHeader(s, header, &tag, &hasIdentity, &bytesNonObjptrs, &numObjptrs);
+ return hasIdentity;
+}
+
+static inline bool isMutable(GC_state s, pointer p) {
+ GC_header header = getHeader(p);
+ return isMutableH(s, header);
+}
+
/* advanceToObjectData (s, p)
*
* If p points at the beginning of an object, then advanceToObjectData
diff --git a/runtime/gc/object.h b/runtime/gc/object.h
index 0df5ce702..1250b2abe 100644
--- a/runtime/gc/object.h
+++ b/runtime/gc/object.h
@@ -82,7 +82,8 @@ COMPILE_TIME_ASSERT(sizeof_objptr__eq__sizeof_header,
static inline GC_header* getHeaderp (pointer p);
static inline GC_header getHeader (pointer p);
-static inline GC_header buildHeaderFromTypeIndex (uint32_t t);
+static inline GC_header getRacyHeader (pointer p);
+static inline GC_header buildHeaderFromTypeIndex(uint32_t t);
#endif /* (defined (MLTON_GC_INTERNAL_FUNCS)) */
@@ -180,6 +181,8 @@ enum {
static inline void splitHeader (GC_state s, GC_header header,
GC_objectTypeTag *tagRet, bool *hasIdentityRet,
uint16_t *bytesNonObjptrsRet, uint16_t *numObjptrsRet);
+static inline bool isMutable(GC_state s, pointer p);
+static inline bool isMutableH(GC_state s, GC_header h);
static inline pointer advanceToObjectData (GC_state s, pointer p);
static inline size_t objectSize(GC_state s, pointer p);
diff --git a/runtime/gc/pin.c b/runtime/gc/pin.c
index b564d5c0c..fca2c138f 100644
--- a/runtime/gc/pin.c
+++ b/runtime/gc/pin.c
@@ -6,51 +6,133 @@
#if (defined (MLTON_GC_INTERNAL_FUNCS))
-bool pinObject(objptr op, uint32_t unpinDepth)
+enum PinType pinType(GC_header h) {
+ if (0 == (h & GC_VALID_HEADER_MASK))
+ return PIN_NONE;
+
+ int t = ((h & PIN_MASK) >> PIN_SHIFT);
+ if (t == 0)
+ return PIN_NONE;
+ else if (t == 2)
+ return PIN_DOWN;
+ else if (t == 3)
+ return PIN_ANY;
+ else
+ DIE("NOT supposed to reach here!");
+}
+
+enum PinType maxPT(enum PinType pt1, enum PinType pt2) {
+ if (pt1 == PIN_NONE)
+ return pt2;
+ else if (pt1 == PIN_ANY || pt2 == PIN_ANY)
+ return PIN_ANY;
+ else
+ return PIN_DOWN;
+}
+
+static inline GC_header getRep(enum PinType pt)
+{
+ GC_header h;
+ if (pt == PIN_NONE)
+ h = 0;
+ else if (pt == PIN_DOWN)
+ h = 0x20000000;
+ else
+ h = 0x30000000;
+ return h;
+}
+
+uint32_t unpinDepthOfH(GC_header h)
+{
+ return (h & UNPIN_DEPTH_MASK) >> UNPIN_DEPTH_SHIFT;
+}
+
+bool pinObject(GC_state s, objptr op, uint32_t unpinDepth, enum PinType pt)
+{
+ bool a, b;
+ pinObjectInfo(s, op, unpinDepth, pt, &a, &b);
+ return a;
+}
+
+objptr pinObjectInfo(
+ GC_state s,
+ objptr op,
+ uint32_t unpinDepth,
+ enum PinType pt,
+ bool *headerChange,
+ bool *pinChange)
{
pointer p = objptrToPointer(op, NULL);
+ assert(pt != PIN_NONE);
+
+ /*initialize with false*/
+ *headerChange = false;
+ *pinChange = false;
uint32_t maxUnpinDepth = TWOPOWER(UNPIN_DEPTH_BITS) - 1;
- if (unpinDepth > maxUnpinDepth) {
- DIE("unpinDepth %"PRIu32" exceeds max possible value %"PRIu32,
+ if (unpinDepth > maxUnpinDepth)
+ {
+ DIE("unpinDepth %" PRIu32 " exceeds max possible value %" PRIu32,
unpinDepth,
maxUnpinDepth);
- return FALSE;
+ return op;
}
assert(
- ((GC_header)unpinDepth) << UNPIN_DEPTH_SHIFT
- == (UNPIN_DEPTH_MASK & ((GC_header)unpinDepth) << UNPIN_DEPTH_SHIFT)
- );
-
- while (TRUE) {
+ ((GC_header)unpinDepth) << UNPIN_DEPTH_SHIFT == (UNPIN_DEPTH_MASK & ((GC_header)unpinDepth) << UNPIN_DEPTH_SHIFT));
+ while (true)
+ {
GC_header header = getHeader(p);
+ uint32_t newUnpinDepth;
+ if (isFwdHeader(header))
+ {
+ assert(pt != PIN_DOWN);
+ op = getFwdPtr(p);
+ p = objptrToPointer(op, NULL);
+ continue;
+ }
+ else if (pinType(header) != PIN_NONE)
+ {
+ uint32_t previousUnpinDepth = unpinDepthOfH(header);
+ newUnpinDepth = min(previousUnpinDepth, unpinDepth);
+ } else {
+ newUnpinDepth = unpinDepth;
+ }
- bool notPinned = (0 == (header & PIN_MASK) >> PIN_SHIFT);
- uint32_t previousUnpinDepth =
- (header & UNPIN_DEPTH_MASK) >> UNPIN_DEPTH_SHIFT;
+ /* if we are changing the unpinDepth, then the new pinType (nt) is
+ * equal to the function argument pt. Otherwise its the max. */
+ enum PinType nt = newUnpinDepth < unpinDepthOfH(header) ? pt : maxPT(pt, pinType(header));
+ GC_header unpinnedHeader = header & (~UNPIN_DEPTH_MASK) & (~PIN_MASK);
GC_header newHeader =
- (header & (~UNPIN_DEPTH_MASK)) // clear unpin bits
- | ((GC_header)unpinDepth << UNPIN_DEPTH_SHIFT) // put in new unpinDepth
- | PIN_MASK; // set pin bit
-
- if (notPinned) {
- /* first, handle case where this object was not already pinned */
- if (__sync_bool_compare_and_swap(getHeaderp(p), header, newHeader))
- return TRUE;
+ unpinnedHeader
+ | ((GC_header)newUnpinDepth << UNPIN_DEPTH_SHIFT) // put in new unpinDepth
+ | getRep(nt); // setup the pin type
+
+ if(newHeader == header) {
+ assert (!hasFwdPtr(p));
+ return op;
}
else {
- /* if the object was previously pinned, we still need to do a writeMin */
- if (previousUnpinDepth <= unpinDepth)
- return FALSE;
-
- if (__sync_bool_compare_and_swap(getHeaderp(p), header, newHeader))
- return TRUE;
+ if (__sync_bool_compare_and_swap(getHeaderp(p), header, newHeader)) {
+ *headerChange = true;
+ bool didPinChange = (nt != pinType(header));
+ *pinChange = didPinChange;
+ if (nt == PIN_ANY && didPinChange) {
+ size_t sz = objectSize(s, p);
+ s->cumulativeStatistics->bytesPinnedEntangled += sz;
+ __sync_fetch_and_add(
+ &(s->cumulativeStatistics->currentPhaseBytesPinnedEntangled),
+ (uintmax_t)sz
+ );
+ }
+ assert (!hasFwdPtr(p));
+ assert(pinType(newHeader) == nt);
+ return op;
+ }
}
}
-
DIE("should be impossible to reach here");
- return FALSE;
+ return op;
}
void unpinObject(objptr op) {
@@ -72,14 +154,51 @@ bool isPinned(objptr op) {
* (otherwise, there could be a forward pointer in this spot)
* ...and then check the mark
*/
- return (1 == (h & GC_VALID_HEADER_MASK)) &&
- (1 == ((h & PIN_MASK) >> PIN_SHIFT));
+ bool result = (1 == (h & GC_VALID_HEADER_MASK)) &&
+ (((h & PIN_MASK) >> PIN_SHIFT) > 0);
+ assert (result == (pinType(h) != PIN_NONE));
+ return result;
}
uint32_t unpinDepthOf(objptr op) {
pointer p = objptrToPointer(op, NULL);
- uint32_t d = (getHeader(p) & UNPIN_DEPTH_MASK) >> UNPIN_DEPTH_SHIFT;
+ uint32_t d = unpinDepthOfH(getHeader(p));
return d;
}
+bool tryUnpinWithDepth(objptr op, uint32_t opDepth) {
+
+ pointer p = objptrToPointer(op, NULL);
+ GC_header header = getHeader(p);
+ uint32_t d = unpinDepthOfH(header);
+
+ if (d >= opDepth) {
+ GC_header newHeader =
+ getHeader(p)
+ & (~UNPIN_DEPTH_MASK) // clear counter bits
+ & (~PIN_MASK); // clear mark bit
+
+ return __sync_bool_compare_and_swap(getHeaderp(p), header, newHeader);
+ }
+ return false;
+}
+
+
+// bool tryPinDec(objptr op, uint32_t opDepth) {
+// pointer p = objptrToPointer(op, NULL);
+// GC_header header = getHeader(p);
+// uint32_t d = (header & UNPIN_DEPTH_MASK) >> UNPIN_DEPTH_SHIFT;
+
+// if (d >= opDepth && pinType(header) == PIN_ANY) {
+// GC_header newHeader =
+// getHeader(p)
+// & (~UNPIN_DEPTH_MASK) // clear counter bits
+// & (~PIN_MASK); // clear mark bit
+
+// return __sync_bool_compare_and_swap(getHeaderp(p), header, newHeader));
+// }
+
+// return false;
+// }
+
#endif
diff --git a/runtime/gc/pin.h b/runtime/gc/pin.h
index cc69a86e2..7591d3044 100644
--- a/runtime/gc/pin.h
+++ b/runtime/gc/pin.h
@@ -11,19 +11,25 @@
*
* +------+-------------------------+----------+--------------+
* header fields | mark | counter | type-tag | valid-header |
- * +------+-----------+-------------+----------+--------------+
- * sub-fields | | sus | pin | unpin-depth | | |
- * +------+-----+-----+-------------+----------+--------------+
- * ^ ^ ^ ^ ^ ^ ^
- * offsets 32 31 30 29 20 1 0
+ * +------+------------+-------------+----------+--------------+
+ * sub-fields | | sus | pin | unpin-depth | | |
+ * +------+-----+------+-------------+----------+--------------+
+ * ^ ^ ^ ^ ^ ^ ^
+ * offsets 32 31 30 28 20 1 0
*
*/
-#define UNPIN_DEPTH_BITS 9
-#define UNPIN_DEPTH_MASK ((GC_header)0x1FF00000)
+#define UNPIN_DEPTH_BITS 8
+#define UNPIN_DEPTH_MASK ((GC_header)0xFF00000)
#define UNPIN_DEPTH_SHIFT 20
-#define PIN_BITS 1
-#define PIN_MASK ((GC_header)0x20000000)
-#define PIN_SHIFT 29
+#define PIN_MASK ((GC_header)0x30000000)
+#define PIN_SHIFT 28
+
+enum PinType
+{
+ PIN_NONE,
+ PIN_DOWN,
+ PIN_ANY
+};
/* Pin this object, making it immovable (by GC) until it reaches
* unpinDepth (or shallower). Returns TRUE if the object was
@@ -32,11 +38,22 @@
* Note that regardless of whether or not the object was previously
* pinned, this does a writeMin on the unpinDepth of the object.
*/
-bool pinObject(objptr op, uint32_t unpinDepth);
+bool pinObject(GC_state s, objptr op, uint32_t unpinDepth, enum PinType pt);
+
+objptr pinObjectInfo(
+ GC_state s,
+ objptr op,
+ uint32_t unpinDepth,
+ enum PinType pt,
+ bool* headerChange,
+ bool* pinChange);
/* check if an object is pinned */
bool isPinned(objptr op);
+/* */
+enum PinType pinType(GC_header header);
+
/* Unpin an object by clearing the mark and counter bits in its header.
* This is only safe if the object is not being concurrently pinned.
* As long as we only call this on objects that are local, it's safe.
@@ -45,5 +62,14 @@ void unpinObject(objptr op);
/* read the current unpin-depth of an object */
uint32_t unpinDepthOf(objptr op);
+uint32_t unpinDepthOfH(GC_header header);
+
+
+/* unpin an object if its depth allows. Because the unpinDepth can change
+ * concurrently, we want to make sure we use the logic in this function.
+ * If unpin is successful, then it returns true. Otherwise, false.
+ */
+bool tryUnpinWithDepth(objptr op, uint32_t opDepth);
+// bool tryPinDec(objptr op, uint32_t opDepth);
#endif
diff --git a/runtime/gc/remembered-set.c b/runtime/gc/remembered-set.c
index 3b5bb2932..f6ef64414 100644
--- a/runtime/gc/remembered-set.c
+++ b/runtime/gc/remembered-set.c
@@ -4,26 +4,32 @@
* See the file MLton-LICENSE for details.
*/
-void HM_remember(HM_chunkList remSet, HM_remembered remElem) {
- HM_storeInchunkList(remSet, (void*)remElem, sizeof(struct HM_remembered));
+void HM_initRemSet(HM_remSet remSet) {
+ HM_initChunkList(&(remSet->private));
+ CC_initConcList(&(remSet->public));
}
-void HM_rememberAtLevel(HM_HierarchicalHeap hh, HM_remembered remElem) {
- assert(hh != NULL);
- HM_remember(HM_HH_getRemSet(hh), remElem);
+void HM_remember(HM_remSet remSet, HM_remembered remElem, bool conc) {
+ if (!conc) {
+ HM_storeInChunkListWithPurpose(&(remSet->private), (void*)remElem, sizeof(struct HM_remembered), BLOCK_FOR_REMEMBERED_SET);
+ }
+ else {
+ CC_storeInConcListWithPurpose(&(remSet->public), (void *)remElem, sizeof(struct HM_remembered), BLOCK_FOR_REMEMBERED_SET);
+ }
}
-void HM_foreachRemembered(
+void HM_foreachPrivate(
GC_state s,
- HM_chunkList remSet,
+ HM_chunkList chunkList,
HM_foreachDownptrClosure f)
{
- assert(remSet != NULL);
- HM_chunk chunk = HM_getChunkListFirstChunk(remSet);
- while (chunk != NULL) {
+ HM_chunk chunk = HM_getChunkListFirstChunk(chunkList);
+ while (chunk != NULL)
+ {
pointer p = HM_getChunkStart(chunk);
pointer frontier = HM_getChunkFrontier(chunk);
- while (p < frontier) {
+ while (p < frontier && ((HM_remembered)p)->object != 0)
+ {
f->fun(s, (HM_remembered)p, f->env);
p += sizeof(struct HM_remembered);
}
@@ -31,10 +37,142 @@ void HM_foreachRemembered(
}
}
-size_t HM_numRemembered(HM_chunkList remSet) {
+typedef struct FishyChunk
+{
+ HM_chunk chunk;
+ pointer scanned;
+} FishyChunk;
+
+void makeChunkFishy(FishyChunk * fc, HM_chunk chunk, pointer frontier, int* numFishyChunks) {
+ (fc[*numFishyChunks]).chunk = chunk;
+ (fc[*numFishyChunks]).scanned = frontier;
+ *numFishyChunks = *numFishyChunks + 1;
+ return;
+}
+
+FishyChunk * resizeFishyArray (FishyChunk * fishyChunks, int * currentSize) {
+ int cs = *currentSize;
+ int new_size = 2 * cs;
+ FishyChunk * fc = malloc(sizeof(struct FishyChunk) * new_size);
+ memcpy(fc, fishyChunks, sizeof(struct FishyChunk) * cs);
+ *currentSize = new_size;
+ free(fishyChunks);
+ return fc;
+}
+
+void checkFishyChunks(GC_state s,
+ FishyChunk * fishyChunks,
+ int numFishyChunks,
+ HM_foreachDownptrClosure f)
+{
+ if (fishyChunks == NULL) {
+ return;
+ }
+ bool changed = true;
+ while (changed) {
+ int i = numFishyChunks - 1;
+ changed = false;
+ while (i >= 0)
+ {
+ HM_chunk chunk = fishyChunks[i].chunk;
+ pointer p = fishyChunks[i].scanned;
+ pointer frontier = HM_getChunkFrontier(chunk);
+ while (TRUE)
+ {
+ while (p < frontier && ((HM_remembered)p)->object != 0)
+ {
+ f->fun(s, (HM_remembered)p, f->env);
+ p += sizeof(struct HM_remembered);
+ }
+ frontier = HM_getChunkFrontier(chunk);
+ if (p >= frontier) {
+ break;
+ }
+ }
+ if (p != fishyChunks[i].scanned) {
+ fishyChunks[i].scanned = p;
+ changed = true;
+ }
+ i --;
+ }
+ }
+}
+
+void HM_foreachPublic (
+ GC_state s,
+ HM_remSet remSet,
+ HM_foreachDownptrClosure f,
+ bool trackFishyChunks)
+{
+
+ if ((remSet->public).firstChunk == NULL) {
+ return;
+ }
+
+ if (!trackFishyChunks) {
+ struct HM_chunkList _chunkList;
+ HM_chunkList chunkList = &(_chunkList);
+ CC_popAsChunkList(&(remSet->public), chunkList);
+ HM_foreachPrivate(s, chunkList, f);
+ HM_appendChunkList(&(remSet->private), chunkList);
+ return;
+ }
+
+ HM_chunk chunk = (remSet->public).firstChunk;
+ HM_chunk lastChunk = CC_getLastChunk (&(remSet->public));
+ int array_size = 2 * s->numberOfProcs;
+ FishyChunk* fishyChunks = malloc(sizeof(struct FishyChunk) * array_size);
+ int numFishyChunks = 0;
+ while (chunk != NULL)
+ {
+ while (chunk != NULL)
+ {
+ pointer p = HM_getChunkStart(chunk);
+ pointer frontier = HM_getChunkFrontier(chunk);
+ while (p < frontier && ((HM_remembered)p)->object != 0)
+ {
+ f->fun(s, (HM_remembered)p, f->env);
+ p += sizeof(struct HM_remembered);
+ }
+ if ((chunk->retireChunk || chunk->nextChunk == NULL))
+ {
+ if (numFishyChunks >= array_size) {
+ fishyChunks = resizeFishyArray(fishyChunks, &array_size);
+ }
+ makeChunkFishy(fishyChunks, chunk, p, &numFishyChunks);
+ }
+ chunk = chunk->nextChunk;
+ }
+ checkFishyChunks(s, fishyChunks, numFishyChunks, f);
+ lastChunk = CC_getLastChunk(&(remSet->public));
+ if (lastChunk != fishyChunks[numFishyChunks - 1].chunk) {
+ assert (chunk->nextChunk != NULL);
+ chunk = chunk->nextChunk;
+ }
+ }
+ free(fishyChunks);
+ struct HM_chunkList _chunkList;
+ HM_chunkList chunkList = &(_chunkList);
+ CC_popAsChunkList(&(remSet->public), chunkList);
+ HM_appendChunkList(&(remSet->private), chunkList);
+}
+
+void HM_foreachRemembered(
+ GC_state s,
+ HM_remSet remSet,
+ HM_foreachDownptrClosure f,
+ bool trackFishyChunks)
+{
+ assert(remSet != NULL);
+ HM_foreachPrivate(s, &(remSet->private), f);
+ HM_foreachPublic(s, remSet, f, trackFishyChunks);
+}
+
+
+size_t HM_numRemembered(HM_remSet remSet) {
assert(remSet != NULL);
size_t count = 0;
- HM_chunk chunk = HM_getChunkListFirstChunk(remSet);
+ HM_chunk chunk = HM_getChunkListFirstChunk(&(remSet->private)); // ignore public for now.
while (chunk != NULL) {
pointer p = HM_getChunkStart(chunk);
pointer frontier = HM_getChunkFrontier(chunk);
@@ -44,3 +182,13 @@ size_t HM_numRemembered(HM_chunkList remSet) {
return count;
}
+
+void HM_appendRemSet(HM_remSet r1, HM_remSet r2) {
+ HM_appendChunkList(&(r1->private), &(r2->private));
+ CC_appendConcList(&(r1->public), &(r2->public));
+}
+
+void HM_freeRemSetWithInfo(GC_state s, HM_remSet remSet, void* info) {
+ HM_freeChunksInListWithInfo(s, &(remSet->private), info, BLOCK_FOR_REMEMBERED_SET);
+ CC_freeChunksInConcListWithInfo(s, &(remSet->public), info, BLOCK_FOR_REMEMBERED_SET);
+}
diff --git a/runtime/gc/remembered-set.h b/runtime/gc/remembered-set.h
index c1859d3f6..ce0cfad40 100644
--- a/runtime/gc/remembered-set.h
+++ b/runtime/gc/remembered-set.h
@@ -9,6 +9,7 @@
#if (defined (MLTON_GC_INTERNAL_TYPES))
+#include "gc/concurrent-list.h"
/* Remembering that there exists a downpointer to this object. The unpin
* depth of the object will be stored in the object header. */
@@ -17,6 +18,24 @@ typedef struct HM_remembered {
objptr object;
} * HM_remembered;
+
+/*
+1. How do we do this public remSet in a hh changing away?
+2. What's a simple ds that does the right thing? -> global lookup, which maps
+ a position in the heap hierarchy to a remSet.
+3. Each chunk keeps track of which remSet?
+4.
+
+hh->chunkList <--> ogList
+toList
+=============================
+
+*/
+typedef struct HM_remSet {
+ struct HM_chunkList private;
+ struct CC_concList public;
+} * HM_remSet;
+
typedef void (*HM_foreachDownptrFun)(GC_state s, HM_remembered remElem, void* args);
typedef struct HM_foreachDownptrClosure {
@@ -29,10 +48,14 @@ typedef struct HM_foreachDownptrClosure {
#if (defined (MLTON_GC_INTERNAL_BASIS))
-void HM_remember(HM_chunkList remSet, HM_remembered remElem);
-void HM_rememberAtLevel(HM_HierarchicalHeap hh, HM_remembered remElem);
-void HM_foreachRemembered(GC_state s, HM_chunkList remSet, HM_foreachDownptrClosure f);
-size_t HM_numRemembered(HM_chunkList remSet);
+void HM_initRemSet(HM_remSet remSet);
+void HM_freeRemSetWithInfo(GC_state s, HM_remSet remSet, void* info);
+void HM_remember(HM_remSet remSet, HM_remembered remElem, bool conc);
+void HM_appendRemSet(HM_remSet r1, HM_remSet r2);
+void HM_foreachRemembered(GC_state s, HM_remSet remSet, HM_foreachDownptrClosure f, bool trackFishyChunks);
+size_t HM_numRemembered(HM_remSet remSet);
+void HM_foreachPublic(GC_state s, HM_remSet remSet, HM_foreachDownptrClosure f, bool trackFishyChunks);
+void HM_foreachPrivate(GC_state s, HM_chunkList list,HM_foreachDownptrClosure f);
#endif /* defined (MLTON_GC_INTERNAL_BASIS) */
diff --git a/runtime/gc/sampler.c b/runtime/gc/sampler.c
new file mode 100644
index 000000000..bc7b7ac5f
--- /dev/null
+++ b/runtime/gc/sampler.c
@@ -0,0 +1,63 @@
+/* Copyright (C) 2022 Sam Westrick
+ *
+ * MLton is released under a HPND-style license.
+ * See the file MLton-LICENSE for details.
+ */
+
+
+void initSampler(
+ __attribute__((unused)) GC_state s,
+ Sampler samp,
+ SamplerClosure func,
+ struct timespec *desiredInterval)
+{
+ samp->func = *func;
+ samp->desiredInterval = *desiredInterval;
+ samp->currentEpoch = 0;
+ timespec_now(&(samp->absoluteStart));
+}
+
+
+static void timespec_mul(struct timespec *dst, size_t multiplier) {
+ size_t sec = dst->tv_sec;
+ size_t nsec = dst->tv_nsec;
+
+ size_t nps = 1000L * 1000 * 1000;
+
+ size_t new_nsec = (nsec * multiplier) % nps;
+ size_t add_sec = (nsec * multiplier) / nps;
+
+ dst->tv_sec = (sec * multiplier) + add_sec;
+ dst->tv_nsec = new_nsec;
+}
+
+
+static inline double timespec_to_seconds(struct timespec *tm) {
+ return (double)tm->tv_sec + ((double)tm->tv_nsec * 0.000000001);
+}
+
+
+void maybeSample(GC_state s, Sampler samp) {
+ size_t oldEpoch = samp->currentEpoch;
+
+ // compute the time of the last successful sample (relative to start)
+ struct timespec lastSample;
+ lastSample = samp->desiredInterval;
+ timespec_mul(&lastSample, oldEpoch);
+
+ // compare against current time by computing epoch diff
+ struct timespec now;
+ timespec_now(&now);
+ timespec_sub(&now, &(samp->absoluteStart));
+ double diff = timespec_to_seconds(&now) - timespec_to_seconds(&lastSample);
+ long epochDiff = (long)(diff / timespec_to_seconds(&samp->desiredInterval));
+
+ if (epochDiff < 1)
+ return;
+
+ size_t newEpoch = oldEpoch + epochDiff;
+
+ if (__sync_bool_compare_and_swap(&samp->currentEpoch, oldEpoch, newEpoch)) {
+ samp->func.fun(s, &now, samp->func.env);
+ }
+}
\ No newline at end of file
diff --git a/runtime/gc/sampler.h b/runtime/gc/sampler.h
new file mode 100644
index 000000000..c7f9b88c2
--- /dev/null
+++ b/runtime/gc/sampler.h
@@ -0,0 +1,34 @@
+/* Copyright (C) 2022 Sam Westrick
+ *
+ * MLton is released under a HPND-style license.
+ * See the file MLton-LICENSE for details.
+ */
+
+#ifndef SAMPLER_H_
+#define SAMPLER_H_
+
+#if (defined (MLTON_GC_INTERNAL_FUNCS))
+
+typedef void (*SamplerFun) (GC_state s, struct timespec *tm, void *env);
+
+typedef struct SamplerClosure {
+ SamplerFun fun;
+ void *env;
+} *SamplerClosure;
+
+typedef struct Sampler {
+ struct SamplerClosure func;
+ struct timespec desiredInterval;
+ struct timespec absoluteStart;
+ size_t currentEpoch;
+} * Sampler;
+
+
+void initSampler(GC_state s, Sampler samp, SamplerClosure func, struct timespec *desiredInterval);
+
+void maybeSample(GC_state s, Sampler samp);
+
+#endif /* MLTON_GC_INTERNAL_FUNCS */
+
+
+#endif /* SAMPLER_H_ */
\ No newline at end of file
diff --git a/runtime/gc/statistics.c b/runtime/gc/statistics.c
index 176908d4d..fc3fc2880 100644
--- a/runtime/gc/statistics.c
+++ b/runtime/gc/statistics.c
@@ -47,8 +47,9 @@ struct GC_cumulativeStatistics *newCumulativeStatistics(void) {
cumulativeStatistics->bytesScannedMinor = 0;
cumulativeStatistics->bytesHHLocaled = 0;
cumulativeStatistics->bytesReclaimedByLocal = 0;
- cumulativeStatistics->bytesReclaimedByRootCC = 0;
- cumulativeStatistics->bytesReclaimedByInternalCC = 0;
+ cumulativeStatistics->bytesReclaimedByCC = 0;
+ cumulativeStatistics->bytesInScopeForLocal = 0;
+ cumulativeStatistics->bytesInScopeForCC = 0;
cumulativeStatistics->maxBytesLive = 0;
cumulativeStatistics->maxBytesLiveSinceReset = 0;
cumulativeStatistics->maxHeapSize = 0;
@@ -67,22 +68,23 @@ struct GC_cumulativeStatistics *newCumulativeStatistics(void) {
cumulativeStatistics->numMarkCompactGCs = 0;
cumulativeStatistics->numMinorGCs = 0;
cumulativeStatistics->numHHLocalGCs = 0;
- cumulativeStatistics->numRootCCs = 0;
- cumulativeStatistics->numInternalCCs = 0;
+ cumulativeStatistics->numCCs = 0;
cumulativeStatistics->numDisentanglementChecks = 0;
+ cumulativeStatistics->numEntanglements = 0;
cumulativeStatistics->numChecksSkipped = 0;
cumulativeStatistics->numSuspectsMarked = 0;
cumulativeStatistics->numSuspectsCleared = 0;
- cumulativeStatistics->numEntanglementsDetected = 0;
+ cumulativeStatistics->bytesPinnedEntangled = 0;
+ cumulativeStatistics->currentPhaseBytesPinnedEntangled = 0;
+ cumulativeStatistics->bytesPinnedEntangledWatermark = 0;
+ cumulativeStatistics->approxRaceFactor = 0;
cumulativeStatistics->timeLocalGC.tv_sec = 0;
cumulativeStatistics->timeLocalGC.tv_nsec = 0;
cumulativeStatistics->timeLocalPromo.tv_sec = 0;
cumulativeStatistics->timeLocalPromo.tv_nsec = 0;
- cumulativeStatistics->timeRootCC.tv_sec = 0;
- cumulativeStatistics->timeRootCC.tv_nsec = 0;
- cumulativeStatistics->timeInternalCC.tv_sec = 0;
- cumulativeStatistics->timeInternalCC.tv_nsec = 0;
+ cumulativeStatistics->timeCC.tv_sec = 0;
+ cumulativeStatistics->timeCC.tv_nsec = 0;
rusageZero (&cumulativeStatistics->ru_gc);
rusageZero (&cumulativeStatistics->ru_gcCopying);
diff --git a/runtime/gc/statistics.h b/runtime/gc/statistics.h
index 349aaf7f7..391d7f647 100644
--- a/runtime/gc/statistics.h
+++ b/runtime/gc/statistics.h
@@ -36,8 +36,9 @@ struct GC_cumulativeStatistics {
uintmax_t bytesScannedMinor;
uintmax_t bytesHHLocaled;
uintmax_t bytesReclaimedByLocal;
- uintmax_t bytesReclaimedByRootCC;
- uintmax_t bytesReclaimedByInternalCC;
+ uintmax_t bytesReclaimedByCC;
+ uintmax_t bytesInScopeForLocal;
+ uintmax_t bytesInScopeForCC;
size_t maxBytesLive;
size_t maxBytesLiveSinceReset;
@@ -63,19 +64,21 @@ struct GC_cumulativeStatistics {
uintmax_t numMarkCompactGCs;
uintmax_t numMinorGCs;
uintmax_t numHHLocalGCs;
- uintmax_t numRootCCs;
- uintmax_t numInternalCCs;
- uintmax_t numDisentanglementChecks;
+ uintmax_t numCCs;
+ uintmax_t numDisentanglementChecks; // count full read barriers
+ uintmax_t numEntanglements; // count instances entanglement is detected
uintmax_t numChecksSkipped;
uintmax_t numSuspectsMarked;
uintmax_t numSuspectsCleared;
- uintmax_t numEntanglementsDetected;
+ uintmax_t bytesPinnedEntangled;
+ uintmax_t currentPhaseBytesPinnedEntangled;
+ uintmax_t bytesPinnedEntangledWatermark;
+ float approxRaceFactor;
struct timespec timeLocalGC;
struct timespec timeLocalPromo;
- struct timespec timeRootCC;
- struct timespec timeInternalCC;
+ struct timespec timeCC;
struct rusage ru_gc; /* total resource usage in gc. */
struct rusage ru_gcCopying; /* resource usage in major copying gcs. */
diff --git a/runtime/gc/thread.c b/runtime/gc/thread.c
index 495843cde..4cbc6b209 100644
--- a/runtime/gc/thread.c
+++ b/runtime/gc/thread.c
@@ -164,6 +164,75 @@ void GC_HH_promoteChunks(pointer threadp) {
HM_HH_promoteChunks(s, thread);
}
+void GC_HH_clearSuspectsAtDepth(GC_state s, pointer threadp, uint32_t depth) {
+ getStackCurrent(s)->used = sizeofGCStateCurrentStackUsed(s);
+ getThreadCurrent(s)->exnStack = s->exnStack;
+ HM_HH_updateValues(getThreadCurrent(s), s->frontier);
+ assert(threadAndHeapOkay(s));
+
+ GC_thread thread = threadObjptrToStruct(s, pointerToObjptr(threadp, NULL));
+ assert(thread != NULL);
+ assert(thread->hierarchicalHeap != NULL);
+ HM_HH_clearSuspectsAtDepth(s, thread, depth);
+}
+
+Word64 GC_HH_numSuspectsAtDepth(GC_state s, pointer threadp, uint32_t targetDepth) {
+ getStackCurrent(s)->used = sizeofGCStateCurrentStackUsed(s);
+ getThreadCurrent(s)->exnStack = s->exnStack;
+ HM_HH_updateValues(getThreadCurrent(s), s->frontier);
+ assert(threadAndHeapOkay(s));
+ GC_thread thread = threadObjptrToStruct(s, pointerToObjptr(threadp, NULL));
+ assert(thread != NULL);
+ assert(thread->hierarchicalHeap != NULL);
+
+ for (HM_HierarchicalHeap cursor = thread->hierarchicalHeap;
+ NULL != cursor;
+ cursor = cursor->nextAncestor)
+ {
+ uint32_t d = HM_HH_getDepth(cursor);
+ if (d <= targetDepth) {
+ if (d == targetDepth) return (Word64)ES_numSuspects(s, cursor);
+ return 0;
+ }
+ }
+
+ return 0;
+}
+
+Pointer /*ES_clearSet*/
+GC_HH_takeClearSetAtDepth(GC_state s, pointer threadp, uint32_t targetDepth) {
+ getStackCurrent(s)->used = sizeofGCStateCurrentStackUsed(s);
+ getThreadCurrent(s)->exnStack = s->exnStack;
+ HM_HH_updateValues(getThreadCurrent(s), s->frontier);
+ assert(threadAndHeapOkay(s));
+ GC_thread thread = threadObjptrToStruct(s, pointerToObjptr(threadp, NULL));
+ assert(thread != NULL);
+ assert(thread->hierarchicalHeap != NULL);
+ return (pointer)ES_takeClearSet(s, HM_HH_getHeapAtDepth(s, thread, targetDepth));
+}
+
+Word64 GC_HH_numChunksInClearSet(GC_state s, pointer clearSet) {
+ return (Word64)ES_numChunksInClearSet(s, (ES_clearSet)clearSet);
+}
+
+Pointer /*ES_finishedClearSetGrain*/
+GC_HH_processClearSetGrain(GC_state s, pointer clearSet, Word64 start, Word64 stop) {
+ return (pointer)ES_processClearSetGrain(s, (ES_clearSet)clearSet, (size_t)start, (size_t)stop);
+}
+
+void GC_HH_commitFinishedClearSetGrain(GC_state s, pointer threadp, pointer finClearSetGrain) {
+ getStackCurrent(s)->used = sizeofGCStateCurrentStackUsed(s);
+ getThreadCurrent(s)->exnStack = s->exnStack;
+ HM_HH_updateValues(getThreadCurrent(s), s->frontier);
+ assert(threadAndHeapOkay(s));
+ GC_thread thread = threadObjptrToStruct(s, pointerToObjptr(threadp, NULL));
+ ES_commitFinishedClearSetGrain(s, thread, (ES_finishedClearSetGrain)finClearSetGrain);
+}
+
+void GC_HH_deleteClearSet(GC_state s, pointer clearSet) {
+ ES_deleteClearSet(s, (ES_clearSet)clearSet);
+}
+
void GC_HH_moveNewThreadToDepth(pointer threadp, uint32_t depth) {
GC_state s = pthread_getspecific(gcstate_key);
GC_thread thread = threadObjptrToStruct(s, pointerToObjptr(threadp, NULL));
diff --git a/runtime/gc/thread.h b/runtime/gc/thread.h
index 0c7c03c23..68047b5ad 100644
--- a/runtime/gc/thread.h
+++ b/runtime/gc/thread.h
@@ -129,6 +129,15 @@ PRIVATE void GC_HH_setMinLocalCollectionDepth(pointer thread, Word32 depth);
*/
PRIVATE void GC_HH_moveNewThreadToDepth(pointer thread, Word32 depth);
+PRIVATE void GC_HH_clearSuspectsAtDepth(GC_state s, pointer threadp, uint32_t depth);
+
+PRIVATE Word64 GC_HH_numSuspectsAtDepth(GC_state s, pointer threadp, uint32_t depth);
+PRIVATE Pointer /*ES_clearSet*/ GC_HH_takeClearSetAtDepth(GC_state s, pointer threadp, uint32_t depth);
+PRIVATE Word64 GC_HH_numChunksInClearSet(GC_state s, pointer clearSet);
+PRIVATE Pointer /*ES_finishedClearSetGrain*/ GC_HH_processClearSetGrain(GC_state s, pointer clearSet, Word64 start, Word64 stop);
+PRIVATE void GC_HH_commitFinishedClearSetGrain(GC_state s, pointer threadp, pointer finClearSetGrain);
+PRIVATE void GC_HH_deleteClearSet(GC_state s, pointer clearSet);
+
PRIVATE Bool GC_HH_checkFinishedCCReadyToJoin(GC_state s);
#endif /* MLTON_GC_INTERNAL_BASIS */