To adapt MTBDD data structure in order to be profitable in GPU architecture. Why ? MTBDD is a rooted, directed acyclic graph. A symbolic data structure which is apparently difficult to decompose in subproblems with coalesced memory access (GPU performance requirements).
To improve performance of DTMC and CTMC system verification on PRISM model checker without a big impact in memory usage.
(Discrete-Time Markov Chains and Continuos-Time Markov Chains)
$$\mathbf{A} \underline{x} = \underline{b} \;\; \to \;\; \mathbf{A} = \mathbf{D} + \mathbf{R} \;\;\;\; where \;\;\;\; \mathbf{D} = Diagonal$$
$$\underline{x}^{k+1} = \mathbf{D}^{-1} (\underline{b} - \mathbf{R} \underline{x}^{k})$$ $$max ( \frac{| \underline{x}^{k+1} - \underline{x}^{k} |}{\underline{x}^{k}} ) < epsilon \;\; or \;\; max ( | \underline{x}^{k+1} - \underline{x}^{k} | ) < epsilon$$
Also enable automatic scalability, i.e. each block of threads can be scheduled on any of the available multiprocessors within a GPU, in any order, concurrently or sequentially
Define a especial kind of function call kernel, that, when called, is executed N times in parallel by N differents CUDA threads (programmer defines N)
Thread Hierarchy :
Blocks are organized in a, 1-dim, 2-dim or 3-dim grid of blocks.
Thread blocks are required to execute independently : any order, in parallel or in series (allow scalability)
Threads within a block can cooperate using some (fast) shared memory and by synchronizing their execution to coordinate memory accesses.
(rooted, directed acyclic graph)
An MTBDD represents a function $f:\mathbb{B}^{n} \to \mathbb{R}$ where variables are ordered: $x_1, \dots , x_n \in \mathbb{B}$, $x_1 < .. < x_n$
The reason that MTBDDs can often provide compact storage is because they are stored in a reduced form.
It results in a cononical form.
Map matrix or vector indices into a their boolean representation.
Define,
$enc : Integer \to \mathbb{B}^{n}$ is an encoding function and
$f : \mathbb{B}^{n} \times \mathbb{B}^{n} \to \mathbb{R}$ is an MTBDD function
Then,
$M(i, j) = f(enc(i), enc(j)) \;\;\; \forall i,j : 0 \leq i,j < 2^{n}$
The transition matrix is represented with an MTBDD
Vectors are represented with arrays
Problem to solve
Algorithm :
res[r] := res[r] + v . b[c]
MTBDD is a recursive structure. Each node is an MTBDD.
Try to divide the problem in parallel independent tasks, each task must be solved by a block of threads.
The final amount of memory is the sum of task representation and injected matrix representation.
I believe this was the "main observation of the work".
Injected matrices are in COO representation $\to$ vector of tuple (r, c, value).
Divide the result vector in slices of fixed size [256].
Each task processes a slice of the result vector (Group of path instances).
New memory requirements :
where to inject matrices ?
Figure out a euristc to define
Table of Contents | t |
---|---|
Exposé | ESC |
Full screen slides | e |
Presenter View | p |
Source Files | s |
Slide Numbers | n |
Toggle screen blanking | b |
Show/hide slide context | c |
Notes | 2 |
Help | h |