A Formal Model for Distributed Computations (Parallel Substitution Algorithm - PSA)

Summary:

List of the participants:

O.Bandman - leading researcher, supervisor of the project,
S.Achasova - senior researcher,
V.Markova - senior researcher,
S.Piskunov - senior researcher.

A Parallel Substitution Algorithm (PSA) is a new model of distributed (cellular) computations. It provides a concise mapping of distributed computation processes into cellular arrays. A PSA is specified by a set of parallel substitutions operating over a cellular array. The main goal of the project is to develop a complete theory of PSA which comprises the following:

PSA representation analitic and graphic language,
conditions of computation process validity for synchronous and asynchronous operation modes,
methods of PSA composition and decomposition,
equivalent space-time, space-structure, global-local transformations,
method of interpretation with automata nets.

On the basis of PSA theory a variety of tools and techniques is developed for designing algorithmic-oriented cellular VLSI and optical architecture. An original experimental system for simula- ting computation processes and designing cellular algorithm is also elaborated.

Theoretical results:

PSA properties

The following fundamental concepts form the background of the Parallel Substitution Algorithm.

Fine-grained parallelism. Each data--item is introduced being attached to a point in the computation space represented as a countable discrete set of names. The computation is an iterative procedure over a data array in this space. At each step certain data subarrays are replaced by other ones, these actions being done all over the whole data--array.

Decentralized control. No order of operation execution is defined in the model. Each substitution is performed when and where the ready conditions coincide with a data--pattern in the space. Thus, the associative mechanism of the time--spatial control of the computation process is performed, the spatial relations being explicitly represented by means of the functions defined in the computational space.

Synchronous mode of execution. An abstract outer clock is presumed to exist, making the computational process to obey the following rules:

all operations ready to be executed should be executed simultaneously with the first coming clock pulse,
no changes in the array are allowed out of the clock pulses. The asynchronous mode of execution is also of great theoretical interest. Its investigation is very important for understanding the behavioral properties of computations in the cellular spaces.

Interpretability by automata nets. A set of substitutions representing the cellular computation admits the direct mapping onto a net of automata. This allows one to construct methods and tools for architectural design of hardware implementation of cellular algorithms.

The direction of the development of the PSA theory is stipulated by the objective of its creation: to constitute the fundamentals for methods of synthesis of algorithm--oriented architectures of cellular processors.

Equivalent Transformations of PSAs

2D -> 3D PSA Transformation

The implementation of optical data processing is expected to give a qualitative shift in the connections problem solution. The most efficient solution is in using some kind of multiplanar structure organized as follows. Inside the layers the connections are made with metallized strips like those in solid--state chips.

Between the layers the switching elements are connected by light signals. The whole length of connections is reduced firstly due to the cellular architecture and, secondly, due to the transfer of a great deal of connections to the third dimension.

Three methods have been elaborated:

Coarse-Grained Stratification is to partition the set of parallel substitutions into several subsets, each subset being put into correspondence to a layer of the 3D naming set.

Fine-Grained Stratification is the method of 2D -> 3D transformation when each substitution is stratified apart being decomposed into a set of more simple ones corresponding to one or several components of left--hand side and right--hand side configurations.

Computation Space Reshaping results in stratification in which the template pattern becomes stretched along the third axis.

Synchronous-Asynchronous Transformation of PSAs

The main goal is to get answer on three following questions.

What are the features a parallel substitution set (PSS) is to possess in order to meet validity conditions.
How to transform the PSS of a given PSA into an equivalent PSS which meets the validity conditions.
What price in hardware and in execution time is to be paid for the abandonment of synchronizing clock.

A method of constructing a PSS, such that the asynchronous mode of its execution simulates a given PSA is presented. The method is based on the concepts of behavioral properties and validity conditions.

Space--Time Transformation of PSAs

A procedure is created which transforms a PSA, intended to process an r--dimensional cellular array into a result equivalent PSA, which deals with a (r-1)--dimensional cellular array. Obviously, the time needed for obtaining the result increases.

The proposed procedure is shown to be effective (in the sense that it results in a PSA), if the initial PSA meets certain conditions, which are proved to be necessary and sufficient. These conditions are based on the PSS validity properties for the asynchronous mode of computation.

Computer Tools for Simulating a Cellular Computation

The computer simulating system, which is called ALT (Animating Language Tools), combines textual and graphical programming tools for representing fine-grained parallel computations.

ALT has a multiwindow interface. It includes tools for editing the textual and graphical forms and a translator from a special high-level language which uses C-language as a prototype, into the interior representation, as well as a number of tools for displaying the parameters of computation process under simulation.

Such a system have been urgently needed for experimenting with fine-grained parallel algorithms and watching distributed computation processes.

Of even greater importance is the possibility to improve and modify a PSA according to the technical requirements when performing the architectural design of a cellular processor.

The main peculiarity of ALT is the independence of the program representation of the three following things used:

the graphical objects, representing cellular arrays and configurations,
the PSA execution program,
the procedures for computing values of functions when functional PSA are simulated.

Applications:

Obtained theoretical results have the following applications:

computer aided design of fine-grained parallel algorithms,

Investigation of Cellular Algorithms

2D Cellular Processors
Pipelined Addition of Many Binary Numbers Based on 3 -> 2 Algorithm is convenient for being used as a base of pipelined summation of several groups of numbers.
The organization of the pipeline is based on the fact, that at each step a row is excluded from the summation process.
So, it may be loaded by a new number. When a new triplet of items are loaded a summation of a new group of numbers may start.
2D Cellular Algorithm for Sorting Integers is based on the well known sorting algorithm referred to as even-odd permutation sorting.
The simulation results confirm the suggestion that the cellular array may be constructed which performs the sorting of many lists of numbers with the performance equal to one sorted list at each step.
Such a performance is achieved if the proper length of the sorting field (the size along the i axis) is provided, and after the first result enters the output buffer.
An Algorithm for Obtaining the Skeleton of an Image for reducing the line thickness of drawn symbols. The algorithm is developed whose performance is proportional to the amount of pixels in the most thick part of the image.
The automata net interpreting this algorithm is completely homogeneous all connection being local. Each automaton has one bit of internal memory and an amount of logic, implementing two functions of 15 boolean variables.
computer aided design of algorithmical oriented cellular architecture for VLSI and optical devices,

Designing of Architectures of 3D Cellular Processors
3D Pipelined Cellular Addition of Many Groups of Binary Numbers is developed which allows to get the period of the pipeline equal to the time of one step, which is just the time of obtaining a resulting sum of a group of numbers.
This time depends neither of the number of items to be summed up, nor of the number of bits in them. So, a very high performance may be achieved if such a PSA is implemented in a cellular processor and used in signal processing systems.
Processor for Hopfield Neuromachine for making the computational work in a Hopfield type neural machine is investigated. The period of the pipeline equals to the time needed for summing up two n-bit numbers, and such is the time needed for obtaining a component of a resulting vector.
In the neural type computing system this is the time for calculating the sum of connection weights for each neuron.
3D cellular algorithm for complex number multiplication based on Knuth number system allows to multiply two complex numbers with the speed close to the multiplication speed of conventional binary numbers of the same length.
Such a speed is achieved due to the possibility of representing a complex number as a single vector, on the one hand, and overcoming the main bottleneck of this system, on the other hand. It is the reduction of the intermediate results (the modification unstaff digits, i.e., digits non belonging to the base) which precedes by their summation.
In the algorithm, the time needed for the reduction is kept to a minimum by means of the processing the staff and the unstaff digits of the intermediate results in parallel in different layers of a 3D array, i.e., the summation of the intermediate results are performed without waiting for them to be reduced.
3D Cellular Pipelined Algorithm for Many Number Pairs Multiplication has a minimal period (4 steps) and is well adapted to electrooptical implementation. A small period is accomplished first by using the binary signed-digit number system, and second by pipelining at both the initial data and the computation process levels.
Pipelining at the computation process level is achieved by the stratification of a 2D pairwise summation over a 3D array. Here the stratification is based on the replacement of the neighborhood in the rows in the 2D array by the neighborhood in the layers in the 3D one. As a result, the one-layer summation is transformed into the four-layer one.
The use of the third dimension allows not only to speed up the 2D algorithm, but to simplify the topology of each layer of the 3D structure, interpreting the 3D algorithm, and to transfer the greater part of interconnections of the 2D cellular structure into interlayer connections of the 3D cellular one as well.
simulation of computation processes in cellular arrays.

The main result:

A new formal model of fine-grained cellular parallel algorithm is proposed. Based on it a set of methods for synthesis of algorithmical-oriented architecture are developed. Computer tools for cellular algorithms simulation are elaborated.

Perspectives:

Time and space complexity of fine-grained parallel algorithms is to be assessed. To accomplish this, some new methods for PSA investigation both theoretically and experimentally (by means of simulating) are to be developed.
Cellular algorithms for solving the following problems should be elaborated and studied:
- arithmetic and vector-matrix operations in the nontraditional scales of notation,
- cellular-neural algorithms for pattern retrieval and restoration,
- universal programmable arrays for optical implementation.

List of publications:

S.M.Achasova, O.L.Bandman, V.P.Markova, S.V.Piskunov Parallel Substitution Algorithm. Theory and Application. //- World Scientific. Singapore. 1994. 220 p.
O.L.Bandman, V.P.Markova, S.V.Piskunov 2D -> 3D Transformation of Cellular Algorithms // Parallel Computing Technologies. Proceedings of the International Conference (Ed. V.E. Malyshkin). Obninsk, Russia, Aug. 30 - Sept.4,1993, vol.1, p.117-134.
S.M.Achasova Correctness of Synchronized Cellular Computations // Parallel Computing Technologies. Proceedings of the International Conference (Ed. V.E.Malyshkin). Obninsk, Russia, Aug. 30 - Sept. 4, 1993, vol.3, p. 543-556.
S.V.Piskunov Construction of maximally stratified electrooptical cell devices // Avtometriya.-1993.- N3.-p.3-14.
V.P.Markova Electrooptical implementation of cellular multiplier // Avtometriya. - 1993.- N 3. - p.14-27.
O.L.Bandman Synchronous-asynchronous transformations of cellular algorithms // Bulletin of the Novosibirsk Computing Center, series: Computer Science, issue: 2(1993), p. 25-44.
V.P.Markova The Cellular Knuth Algorithm for Complex Number Multiplication // Parcella'94. Proceedings of the VI International Workshop on Parallel Processing by Cellular Automata and Arrays, Potsdam, Sept. 21-23, 1994, p.91-98.
S.M.Achasova Correctness of cellular computations in the pyramid and hypercube spaces // Programmirovanie.- 1995.-N4.- p.3-12.
V.P.Markova, S.V.Piskunov Computer Models of 3D Cellular Structures // Lect. Notes Comput. Sci., N 964, 1995, p.70-84.
O.L.Bandman Cellular-Neural Computations. Formal Model and Possible Applications // Lect. Notes Comput. Sci., 964, 1995, p.21-35.
S.M.Achasova Synchronous-Asynchronous Cellular Compu- tations // Lect. Notes Comput. Sci., 964, 1995.P.1-6.
O.L.Bandman Cellular-Neural Computations: Formal Model // Bulletin of the Novosibirsk Computing Center, series: Computer Science, issue: 3(1995), p.13-30.
V.P.Markova Multilayer Cellular Algorithm for Complex Number Multiplication // Proceedings of ASAP-95, Strasburg, France, July 16-20, 1995.- IEEE Computer Society Press: Los-Alamos, USA.

Please contact Dr. Olga Bandman for all the questions concerning this project.
E-mail: bandman@ssd.sscc.ru

Last update: June 27, 1996