  ## A brief introduction to the theory of the Integrative Model for Parallelism

### Distributed data

For purposes of discussion, by object we mean array, and so an object can be indexed by an integer that runs $0\ldots N-1$.

A distribution is then a mapping from processors to indices, that is, $p\mapsto 2^N$
(where $2^N$ is the usual shorthand for all subsets of $0\ldots N-1$).
Note that mapping processors to data, rather than the other way around, easily allows redundant assignment of data,
and later redundant replication of work.

### Data parallel work

A data parallel computation $y=f(x)$, where $x,y$ are distributed objects as just defined,
is a computation of the form $\forall_i\colon y_i=f(x_{i_1},\ldots,x_{i_k})$ for some $k$, possibly dependent on $i$.

We define the signature function of a data parallel computation as the function
$\sigma_f(i) = \{ i_1,\ldots,i_k \}$
describing for each output index, what input indices are needed.

### Beta distribution

Now let $x,y$ have distributions $\alpha,\gamma$ respectively. It is straightforward to extend the definition of signature function
to map between distributions. We can now define the $\beta$ distribution as $\beta=\sigma_f \gamma$
Informally, the $\beta$ distribution describes for each processor $p$ which indices in $x$ it needs to compute its share of $y$.

The $\beta$ distribution is a dynamically defined distribution that combines knowledge of the data (the $\gamma$ distribution) and of
the structure of the algorithm (the $\sigma_f$ function).

Since $x$ has the distribution $\alpha$, not $\beta$, some data needs to be communicated/synchronized before the local computation
of $y$ can commence.

### Communication and synchronization

Let $q$ be a process that has just computed part of $x$, and $p$ be a process that is about to compute part of $y$, then
$q$ sends data to / is a predecessor of $p$ if $\alpha(q) \cap \beta(p) \not = \emptyset$
That is, all messages and predecessor relations are derived by the system from, ultimately the user-specified $\alpha,\gamma$
data distributions and the $\sigma_f$ function that describes the structure of the operation.

### Practical importance

A programming system that allows for user specification of data distributions and the signature function can
derive all communication and synchronization from that.

If we define a task as the computation of an object on a specific processor, then the IMP system can derive the
task data flow graph from a sequential program.