The following fundamental concepts form the background of the Parallel Substitution Algorithm.
Fine-grained parallelism. Each data--item is introduced being attached to a point in the computation space represented as a countable discrete set of names. The computation is an iterative procedure over a data array in this space. At each step certain data subarrays are replaced by other ones, these actions being done all over the whole data--array.
Decentralized control. No order of operation execution is defined in the model. Each substitution is performed when and where the ready conditions coincide with a data--pattern in the space. Thus, the associative mechanism of the time--spatial control of the computation process is performed, the spatial relations being explicitly represented by means of the functions defined in the computational space.
Synchronous mode of execution. An abstract outer clock is presumed to exist, making the computational process to obey the following rules:
Interpretability by automata nets. A set of substitutions representing the cellular computation admits the direct mapping onto a net of automata. This allows one to construct methods and tools for architectural design of hardware implementation of cellular algorithms.
The direction of the development of the PSA theory is stipulated by the objective of its creation: to constitute the fundamentals for methods of synthesis of algorithm--oriented architectures of cellular processors.
The implementation of optical data processing is expected to give a qualitative shift in the connections problem solution. The most efficient solution is in using some kind of multiplanar structure organized as follows. Inside the layers the connections are made with metallized strips like those in solid--state chips.
Between the layers the switching elements are connected by light signals. The whole length of connections is reduced firstly due to the cellular architecture and, secondly, due to the transfer of a great deal of connections to the third dimension.
Three methods have been elaborated:
Coarse-Grained Stratification is to partition the set of parallel substitutions into several subsets, each subset being put into correspondence to a layer of the 3D naming set.
Fine-Grained Stratification is the method of 2D -> 3D transformation when each substitution is stratified apart being decomposed into a set of more simple ones corresponding to one or several components of left--hand side and right--hand side configurations.
Computation Space Reshaping results in stratification in which the template pattern becomes stretched along the third axis.
A method of constructing a PSS, such that the asynchronous mode of its execution simulates a given PSA is presented. The method is based on the concepts of behavioral properties and validity conditions.
A procedure is created which transforms a PSA, intended to process an r--dimensional cellular array into a result equivalent PSA, which deals with a (r-1)--dimensional cellular array. Obviously, the time needed for obtaining the result increases.
The proposed procedure is shown to be effective (in the sense that it results in a PSA), if the initial PSA meets certain conditions, which are proved to be necessary and sufficient. These conditions are based on the PSS validity properties for the asynchronous mode of computation.
ALT has a multiwindow interface. It includes tools for editing the textual and graphical forms and a translator from a special high-level language which uses C-language as a prototype, into the interior representation, as well as a number of tools for displaying the parameters of computation process under simulation.
Such a system have been urgently needed for experimenting with fine-grained parallel algorithms and watching distributed computation processes.
Of even greater importance is the possibility to improve and modify a PSA according to the technical requirements when performing the architectural design of a cellular processor.
The main peculiarity of ALT is the independence of the program representation of the three following things used:
The organization of the pipeline is based on the fact, that at each step a row is excluded from the summation process.
So, it may be loaded by a new number. When a new triplet of items are loaded a summation of a new group of numbers may start.
2D Cellular Algorithm for Sorting Integers is based on the well known sorting algorithm referred to as even-odd permutation sorting.
The simulation results confirm the suggestion that the cellular array may be constructed which performs the sorting of many lists of numbers with the performance equal to one sorted list at each step.
Such a performance is achieved if the proper length of the sorting field (the size along the i axis) is provided, and after the first result enters the output buffer.
An Algorithm for Obtaining the Skeleton of an Image for reducing the line thickness of drawn symbols. The algorithm is developed whose performance is proportional to the amount of pixels in the most thick part of the image.
The automata net interpreting this algorithm is completely homogeneous all connection being local. Each automaton has one bit of internal memory and an amount of logic, implementing two functions of 15 boolean variables.
This time depends neither of the number of items to be summed up, nor of the number of bits in them. So, a very high performance may be achieved if such a PSA is implemented in a cellular processor and used in signal processing systems.
Processor for Hopfield Neuromachine for making the computational work in a Hopfield type neural machine is investigated. The period of the pipeline equals to the time needed for summing up two n-bit numbers, and such is the time needed for obtaining a component of a resulting vector.
In the neural type computing system this is the time for calculating the sum of connection weights for each neuron.
3D cellular algorithm for complex number multiplication based on Knuth number system allows to multiply two complex numbers with the speed close to the multiplication speed of conventional binary numbers of the same length.
Such a speed is achieved due to the possibility of representing a complex number as a single vector, on the one hand, and overcoming the main bottleneck of this system, on the other hand. It is the reduction of the intermediate results (the modification unstaff digits, i.e., digits non belonging to the base) which precedes by their summation.
In the algorithm, the time needed for the reduction is kept to a minimum by means of the processing the staff and the unstaff digits of the intermediate results in parallel in different layers of a 3D array, i.e., the summation of the intermediate results are performed without waiting for them to be reduced.
3D Cellular Pipelined Algorithm for Many Number Pairs Multiplication has a minimal period (4 steps) and is well adapted to electrooptical implementation. A small period is accomplished first by using the binary signed-digit number system, and second by pipelining at both the initial data and the computation process levels.
Pipelining at the computation process level is achieved by the stratification of a 2D pairwise summation over a 3D array. Here the stratification is based on the replacement of the neighborhood in the rows in the 2D array by the neighborhood in the layers in the 3D one. As a result, the one-layer summation is transformed into the four-layer one.
The use of the third dimension allows not only to speed up the 2D algorithm, but to simplify the topology of each layer of the 3D structure, interpreting the 3D algorithm, and to transfer the greater part of interconnections of the 2D cellular structure into interlayer connections of the 3D cellular one as well.
Please contact Dr. Olga Bandman for all the questions concerning this project.
E-mail: bandman@ssd.sscc.ru