Ya. Fet, D. Pospelov. Parallel Computing in Russia.

Parallel Computing in Russia

by Ya. Fet ¹ and D. Pospelov ²

¹ Computing Center, Siberian Division of the
Russian Academy of Sciences, Novosibirsk, Russia

² Computing Center of the
Russian Academy of Sciences, Moscow, Russia

Published in: Lect. Notes in Comp. Sci., Vol. 964, Springer-Verlag, Berlin, 1995. pp. 464-476

Abstract.  A brief sketch of the history of parallel  computing  in
Russia is presented.  Due to certain circumstances, this history is
practically unknown in  the  West.  Meanwhile,  Russian  scientists
seem to have made a valuable  and  original  contribution  to  this
important field of computer science.
The overview contains  short  summaries  of  the  most  interesting
russian investigations in  the  domains  of  Parallel  Programming,
Parallel Computing Systems, and Distributed Computing  in  Cellular
Structures.
The list of references covers about 50 works.

CONTENT

1. INTRODUCTION 2. PARALLEL PROGRAMMING 2.1. LARGE-BLOCK PROGRAMMING SYSTEM 2.2. GRAPH STRUCTURE OF PARALLEL PROGRAMS 2.3. NETWORK STRUCTURE OF COMPUTATIONAL PROCESSES. WAVE ALGORITHMS 2.4. MODELS OF COLLECTIVE BEHAVIOUR IN ORGANIZING PARALLEL PROCESSES 2.5. PARALLEL HIGH-ACCURACY COMPUTATIONS 3. PARALLEL COMPUTING SYSTEMS 3.1. VECTOR PIPELINE PROCESSOR AM 3.2. HOMOGENEOUS COMPUTING SYSTEMS 3.3. MULTIPROCESSORS WITH PROGRAMMABLE ARCHITECTURE 3.4. SIMD COMPUTER PS-2000 3.5. HIGH-PERFORMANCE HETEROGENEOUS SYSTEM "SIBERIA" 3.6. COMBINED ARCHITECTURE SYSTEMS 4. DISTRIBUTED COMPUTING IN CELLULAR STRUCTURES 4.1. HOMOGENEOUS COMPUTING MEDIA 4.2. PARALLEL SUBSTITUTION SYSTEMS 4.3. DESIGN AND ANALYSIS OF SYSTOLIC ARRAYS 4.4. DISTRIBUTED FUNCTIONAL STRUCTURES 5. CONCLUSION References


1. INTRODUCTION
     In various periods of the history of science, Russia presented 
to the world prominent, original works in  mathematics   and  other 
fields of exact sciences.  This concerns as well the  comparatively 
young Computer Science. 
     The history  of  computers  and  computer  science  in  Russia 
abounds  in  contradictions.  From  the  very  beginning   of   the 
foundation of computer science, the leading Russian scientists made 
a  valuable  contribution   to   the   development   of   numerical 
mathematics, mathematical  logic,  linear  programming,  theory  of 
automata, etc. They originated new trends in computer hardware  and 
software, particularly concerning the parallel  paradigms.  Notable 
were also the  research  in  cellular  arrays,  formal  methods  of 
design and analysis of digital devices, computer  aided  design  of 
computers, etc. 
     The early works of Mikhail Gavrilov  should  be  mentioned  on 
application of Boolean algebra to the design of  digital  circuits. 
These works have been made simultaneously  and  independently  from 
Claude Shannon. A very important role in the development  of  thery 
of automata and of logical  design  in  Russia  played  the  famous 
"Schools" organized by Gavrilov. 
     Fundamental research in logic and theory of automata has  been 
done in the late 50s and early 60s by Boris Trakhtenbrot and Victor 
Glushkov. 
     In the early research of  Andrey  Lyapunov,  Yuri  Yanov,  and 
Andrey  Ershov  have  been  laid  the  theoretical  foundations  of 
programming as such, as well as of automation of  programming,  and 
of parallel  programming. 
     Unfortunately,   the   poor   Russian   technology   and 
incompetence of Soviet management, left Russia persistently  behind 
the West in building and using computers. 

     There exists a number of  papers  on  the  history  of  Soviet 
computers (see, for instance, [1-4]). As a rule, these  papers  are 
restricted to the description and analysis of production models and 
families of computers, compiled from official  Soviet  sources.  An 
interesting survey on Russian  research  in  programming  has  been 
published by Andrey Ershov [5].  Some  information  concerning  the 
period of  late  80s  is  contained  in  a  special  issue  of  the 
"Communications of the ACM" [6]. Recently, two interesting books by 
Boris Malinovsky have  been  published  in  Kiev,  presenting  some 
important pages of the history of Soviet computers [7,8].
     Of course, each of the mentioned surveys concerns the  subject 
of computer performance. However, one can  hardly  find  there  any 
specific information on the state and the development  of  parallel 
computing in Russia. Meanwhile, Russian  scientists  seem  to  have 
made a valuable and original contribution to this  important  field 
of computer science. These investigations are  practically  unknown 
in the West. 
     Generally, the Russian research in  computer  science  can  be 
divided into five  fields:  1.  Theory  of  automata.  2.  Parallel 
Programming.  3.  Parallel  Computing   Systems.   4.   Distributed 
Processing. 5. Artificial Intelligence. In the present survey, most 
attention will be paid to the second,  third,  and  fourth  of  the 
mentioned topics. 
     It should be noted that the contents of this  paper  reflects, 
in the first place, the authors' point of view.  Besides,  it  does 
not pretend to be exhaustive. 
     Historically,  several  centers  of   computer   science   and 
technology have appeared in Russia, the main of these being Moscow, 
Leningrad,  Kiev,  Minsk,  Novosibirsk.  The  Siberian   Scientific 
Center, created near the city of Novosibirsk in the late 50s,  also 
known as "Akademgorodok" ("Academic Village"), was  conceived  from 
the beginning, as a complex Cybernetic Center,  where  the  applied 
research in different areas  would  be  supported  by  the  leading 
development of mathematics and computer science. During the 60s and 
70s, in the Academic Village have been working  such  distinguished 
scientists as Sergey Sobolev, Leonid Kantorovich, Alexey  Lyapunov, 
Andrey Ershov, and others. 
     In the beginning of the 60s, we have had in  Novosibirsk  only 
some models of first-generation  computers  of  Soviet  production. 
Later, in 1968, the comparatively powerful second-generation BESM-6 
appeared, with the peak performance of 1 MFlops. It  was  clear  to 
all of us that  the  clock  frequency  has  a   definite  limit  of 
increasing, while the requirements to the performance will ever  be 
rising.  Hence,  the  only  way  to  the  future   high-performance 
computers had to be in parallel computation, in parallel systems. 
     In  retrospect,  we  can  state  that,  without  exaggeration, 
beginning from a certain moment, the Russian research  in  parallel 
computing became concentrated in Siberia. 



2. PARALLEL PROGRAMMING
2.1. LARGE-BLOCK PROGRAMMING SYSTEM
     A valuable contribution to the formation of ideas of  parallel 
programming and parallel  computations  has  been  made  by  Leonid 
Kantorovich, a noted Russian  mathematician  and  economist,  Nobel 
laureate. As early as in 1949 he practically used what we would now 
call a "multiprocessing system" made of a large number  of  punched 
card tabulating  machines,  to  compute  simultaneously  tables  of 
Bessel functions for all integer values of indices from 0 to 120. 
     One  of  the  earliest  researches   in   massively   parallel 
processing was due to Leonid Kantorovich who described in 1957  the 
so-called  "large-block programming system" [9].  He  suggested  to 
consider as basic objects  operated  by  the  system  ordered  sets 
called  quantities (such as  vectors,  matrices,  etc.),  a  single 
number being the simplest quantity, called an element. Some special 
operations on quantities were introduced:  arithmetical  operations 
as extensions of usual arithmetic on any element of  the  quantity, 
and geometrical operations  which  do  not  change  the  values  of 
quantities but only transform their structures. 
     Later on, some of the ideas of the large-block  approach  were 
developed further in  such  programming  languages  as  APL,  PL/1, 
Algol-68, etc. Recently, the  need  for  efficient  use  of  highly 
parallel  systems  led  to  the  appearance   of   "data   parallel 
programming" which has much similarity  with  Kantorovich's  large- 
block approach. 

2.2. GRAPH STRUCTURE OF PARALLEL PROGRAMS
     In early 60s in the Power Engineering Institute  (Moscow)  was 
started, under the leadership  of  Dimitri  Pospelov,  research  of 
creating  models  for  description  of  structures  of  complicated 
programs and, particularly, for selection of those program branches 
which could be executed independently and concurrently. In 1966 the 
first publications appeared concerning these models  called  level- 
parallel forms (LPF).  The LPF language, allowing for formalization 
of most important issues in the field of  parallel  computing,  has 
been widely used in the USSR by researchers working in this field. 
     The level-parallel form is a graph the verteces of  which  are 
identified  with  the  segments   of   the   program   subject   to 
parallelizing  while  the  arcs  correspond   to   the   functional 
dependences between the segments and the communications between the 
processes. A typology of LPFs  has  been  developed  based  on  the 
topology of corresponding  graphs,  and  the  requirements  to  the 
segmantation  of  the  initial  program  were  formulated  ensuring 
effective  execution  of  LPFs  in   parallel   computing   systems 
consisting of identical or different computers [10]. 
     The notion of LPF led to the correct statement of the  problem 
of optimizing the distribution of a  program  in  a  multiprocessor 
system with a given number of computers, as well as to  the  search 
for an optimal configuration of the system executing  a  given  LPF 
[10,11]. 

2.3. NETWORK STRUCTURE OF COMPUTATIONAL PROCESSES. 
     WAVE ALGORITHMS
     The inefficiency of programs written beforehand motivated  the 
search for such means of  description  of  algorithms  which  could 
explicitly contain all possibilities of parallel execution  of  the 
future program. Several versions  of  specific  languages  for  the 
description of parallel features of algorithms have been suggested. 
One of the most interesting was the  computational models  language  
proposed in the late 60s by Enn Tyugu [12].
     Tyugu treated a computational model as a network the nodes  of 
which correspond  to  some  functional  modules,  while  the  edges 
characterize the interconnections between these modules  reflecting 
possible ways of organization of computations. The final version of 
the program to be realized (either in  sequential  or  in  parallel 
form) is derived from the model by  means  of  appropriate  logical 
inference. 
     Based on these models, conceptual programming  languages  have 
been constructed the main feature  of  which  is  the  presence  of  
semantic  memory   intended  for  storing   concepts   of  a  given 
application domain. This memory is accessed by  the  system  during 
compilation [13]. 
     The computational models turned the attention  of  specialists 
in parallel programming  to  the  possibility  of  exploiting  more 
elaborate semantic networks, in order to  specify  the  variety  of 
alternatives of execution of  the  computational  process.  Further 
investigations in this field led to the design of a  wave model  of 
computations in semantic networks, where the  proper  computing  is 
substituted  by  procedures  of  pattern   matching   and   logical 
inference   well-known  in  artificial  intelligence.  The  pattern 
matching became  basic  procedures in  the  VOLNA-0 language  [14], 
ensuering highest possible parallelising. 
     Later  on,  various  powerful   parallel   logical   inference 
procedures in  semantic  networks  have  been  designed  [15].  The 
research in models of parallel computing based on semantic networks 
led also to the design  of  a  high-performance  computing  system, 
under an international project PAMIR [16]. 

2.4. MODELS OF COLLECTIVE BEHAVIOUR IN ORGANIZING 
     PARALLEL PROCESSES
     A unique direction in the  theory  of  parallel  processes  is 
associated with the ideas  of   collective  behaviour  of  automata  
promoted by Michail Tsetlin and his followers  beginning  ftom  the 
late 50s. These works outran more then  by  30  years  the  Western 
research in  multiagence systems. In the frame  of  the  theory  of 
collective behaviour of automata, many problems of  functioning  of 
distributed computing systems without centralized control have been 
stated and solved. 
     If the system exploits  at  times  synchronization,  than  the 
range of problems of decentralized control reduces to the classical 
problem of  spreading of signals in a chain of shots. This problem, 
formulated for the first time  by  J.Myhill,  found  its  efficient 
solution in the works of Victor Varshavsky and his followers  [17]. 
The latest results on this subject are contained in [18]. 
     If, however,  the distributed system operates in a  completely 
asynchronous mode, than the models of collective  behaviour  ensure 
efficient  control  algorithms  overperforming  to  a  considerable 
extent the known "notice board" procedure [19]. 
     The technique of collective  behaviour  allowed  to  create  a 
theory of aperiodic automata able to master numerous hard  problems 
in organizing parallel computations, in particular, the   arbitrage 
problem [20].
     In [21] fundamental conceptions are stated of  the  theory  of 
asynchronous parallel processes. 

2.5. PARALLEL HIGH-ACCURACY COMPUTATIONS
     The main cause of inaccuracy in computer calculations is known 
to be the rounding errors. The necessety of rounding is  attributed 
to the fixed and relatively low word length  of  operands  in  most 
computers. 
     In  the  end  of  the  80s  Alexander  Vazhenin  (Novosibirsk) 
developed a virtual vector processor  for  implementation  of  high 
accuracy  arithmetic  called   SPARTH   (Super-precision   Parallel 
ARiTHmetic) [22].
     In SPARTH, high accuracy is achieved by the use of  ultra-high 
length of operands, as well as by dynamic control of their capacity 
in  the  course  of  computations.  This   technique   allows   for 
elimination of rounding errors. The overall high performance of the 
SPARTH-processor  is  due  to  concurrent  processing  of  multiple 
operands. 
     An example of this approach is the implementation  of  SPARTH- 
processor within a basic fine-grained SIMD architecture oriented at 
solving problems containing many vector and matrix operations [23]. 
A  number of new parallel  algorithms  was  developed  for  solving 
problems of  linear  algebra.  Comparison   with   known  dedicated  
programming   systems   for   high-precision    computations     in 
sequential  computers  shows that  the   SPARTH-processor   ensures 
similar accuracy of results. Moreover, this accuracy  is   achieved  
in  this   case  simultaneously   for   numerous   data   sets   in  
corresponding  processing elements of a massively parallel system. 



3. PARALLEL COMPUTING SYSTEMS
3.1. VECTOR PIPELINE PROCESSOR AM
     In 1960, Leonid Kantorovich proposed a conception of  attached 
units. The analysis of quantities and operations  of  Kantorovich's 
large-block programming system enables one to define  some  typical 
forms of processing, and to formulate the requirements  to  various 
specialized devices for concurrent execution of massive operations. 
     In the early 60s at the Institute of Mathematics (Novosibirsk) 
a project had been developed under Kantorovich's  direction  of  an 
attached  unit  called   Arithmetic  Machine  (AM)   [24]  intended 
primarily for speeding  up  the  solution  of  problems  of  linear 
algebra and linear programming. Accordingly, vector operations were 
emphasized in its design. 
     The main principles used in the AM computer were  as  follows: 
1) Exhaustive use of the  number  flow  obtainable  from  the  main 
memory of the host computer by direct access; 2)  Organizing  of  a 
continuous number flow with simultaneous processing  in  a  special 
high-speed arithmetic unit; 3) Use  of  special  features  of  data 
(numeric vectors of  large  dimensions)  and  those  of  the  basic 
operators (the main one being the  inner  product  of  vectors)  in 
order to get very high processing speed. 
     In the arithmetic unit of the AM, a  four-stage  pipeline  was 
implemented with four  levels  of  buffer  registers  [25].  To co- 
ordinate the  functioning  of  all  the  pipeline  stages,  it  was 
necessary to ensure a working speed of the accumulator considerably 
exceeding the abilities of logic elements available at  that  time. 
Thus,  a  novel  powerful  multiple-input  carry-save   adder   was 
developed, which made possible processing  of  six  digits  of  the 
multiplier at each cycle, ensuring the necessary speed [26]. 
     A pilot AM computer has been built and was  operating  at  the 
Computing Center of the Academy of Sciences  in  Novosibirsk.  This 
computer was one of the  first  pipeline  processors,  and  thus  a 
prototype of modern vector supercomputers. 

3.2. HOMOGENEOUS COMPUTING SYSTEMS
     In 1962, Edward Yevreinov suggested the concept of a Universal 
parallel Computing System with programmable structure  (UCS)  [27]. 
The main principles of UCSs were: the basic element  of  UCS  is  a 
general purpose computer (Elementary Machine, EM); the UCS  has  an 
homogeneous structure, that is, it consists of  identical,  equally 
connected EMs; the number of EMs in the system can be changed;  the 
instruction set, memory size and word length of an EM can  also  be 
changed. 
      It was also proposed to distinguish the UCSs:  according   to 
their  topology:  one-,  two-,  and multi-  dimensional;  according 
to the type of exchange  between  EMs:  parallel,  sequential,  and 
parallel-sequential; according to the spatial arrangement  of  EMs: 
concentrated and distributed. 
     In the Yevreinov's  concept  two  levels  of  organization  of 
parallel computing systems were  considered:  the  macrostructural, 
which has just been briefly  described,  and  the  microstructural, 
concerning the inner structure of  elementary  machines,  where  an 
homogeneous approach  was  again  proposed,  based  on  Homogeneous 
Computing Media (see below).  
     Several projects of  homogeneous  parallel  computing  systems 
have been undertaken in Russia in the late 60s - early  70s,  under 
the direction of E.Yevreinov, namely, Minsk-222, Summa, Minimax. 

3.3. MULTIPROCESSORS WITH PROGRAMMABLE ARCHITECTURE
     At the end of 70s, Anatoly Kaliaev in  Taganrog  proposed  the 
conception   of   multiprocessor    systems    with    programmable 
architecture. In these  systems  the  interconnection  between  the 
processors is accomplished  by programming of  special  commutation 
structures which can be  reconstructed  in  the  course  of  system 
operation. 
     In accordance with Kaliaev's conception, the  node  processors 
of parallel system have as well a programmable structure and can be 
configured for execution of large operators (elementary  functions, 
matrix computations, differentiation, FFT, etc.). 
     These ideas  were  used  in  Taganrog  Research  Institute  of 
Multiprocessor Computing Systems for implementation of a number  of 
experimental, as well as industrial  general-purpose  and  problem- 
oriented parallel computers. 
     The foundations of the  theory  of  programmable  architecture 
systems were formulated in [28,29].
     One of the intersting developments of the ideas just described 
was the research of neurolike networks for adaptive  robot  control 
[30]. 

3.4. SIMD COMPUTER PS-2000
     In the middle of 70s,  in  the  Moscow  Institute  of  Control 
Problems, under the direction  of  Ivery  Prangishvili  and  Sergey 
Vilenkin, a high-performance SIMD system called  PS-2000  (Parallel 
System 2000) has been designed [31]. This system could be  extended 
from 8 up to 64 PEs (in eight-PE  blocks).  Each  PE  had  a  local 
memory of 16k 24-bit words and a 24-bit ALU. Each PE was  connected 
with  two  nearest  neighbours  and  could  communicate  with  them 
independently from  the  other  PEs.  Besides,  all  the  PEs  were 
connected into a ring network; at any moment,  only  one  PE  could 
transfer data into the  ring  bus  while  an  arbitrary  number  of 
specified PEs could receive data from the bus. 
     These features, together  with  the  priority  chain  and  the 
activity control made the PS-2000 an associative processor  capable 
of efficient solving of various non-numerical problems. 
     The serial production of PS-2000 was organized in early 80s at 
the Severodonetsk Computer Plant (Ukraine).
     The experience in solving on the PS-2000 of different problems 
of geophysics, nuclear physics, aerodynamics, etc.  demonstrated  a 
gain of 1 or 2 orders in performance,  against the  general-purpose 
computers of that times. 

3.5. HIGH-PERFORMANCE HETEROGENEOUS SYSTEM "SIBERIA"
     In the end of 80s the "Siberia" project was developed  at  the 
Computing Center of SD RAS [32]. This project carried out under the 
direction  of  Nikolay  Mirenkov  was  one  of  the  first  working 
high-performance systems of heterogeneous architecture. The  system 
was built from completed large modules  (on-the  shelf  computers). 
The design of such systems was especially  important in  Russia  at 
that time  when  Soviet  research  laboratories,  universities  and 
enterprises had no adequate high-performance computers. 
     The "Siberia" system consisted of  modules  assembled  into  a 
single installation by the principle of extensibility. The  modules 
were grouped into  several  subsystems.  The  central  part  was  a 
multimachine subsystem  including  three  Soviet  ES-1066  (IBM-370 
compatible)  mainframes.  In  addition  to  its  main  function  of 
general-purpose data processing, this subsystem  acted  as  a  host 
computer  for  vector-pipeline,  vector-parallel,  and  associative 
subsystems. 
     The  vector-pipeline  subsystem  was  a   set   of   Bulgarian 
processors ES-2706 (AP-190L  compatible).  This  subsystem  enabled 
pipelined,  macro-pipelined  and  parallel  data  processing.   The 
vector-parallel subsystem consisted of Russian  PS  computers  (see 
above). The associative subsystem was a Staran-like computer  which 
enabled the use of various operations on vertical bit-slices. 
     Several novel programming tools  had  been  designed  for  the 
"Siberia" system, aimed to the echievement of maximum parallelism. 

3.6. COMBINED ARCHITECTURE SYSTEMS
     The combined architecture [33] is a cooperation  of  a  highly 
parallel host computer with a set  of  specialized  processors.  In 
this  architecture,  solving  of  any  problem  is  considered   as 
interaction of several processes, so that execution of each process 
is  delegated  to  a  specialized  subsystem,  most  efficient   in 
implementation of this process. The subsystems  are  controlled  in 
such a way that their balanced  operation  might  be  ensured,  and 
special  complementing  features  of  subsystems  might   be   best 
exploited. For each subsystem a  structure  is  chosen  which  best 
corresponds to the function it should perform. 
     In the combined architecture, the main  working  load  of  the 
processing is delegated to the coprocessors. Hence, extremely  high 
demands should be made to the performance of each  coprocessor.  It 
means  that   special   care   is   needed  in  selection  of   the 
structures of coprocessors. 
     The novelty of this approach is that  the  specific  type,  or 
"technology", of processing necessary for  efficient  execution  of 
the most labor-intensive procedures involved in the  implementation 
of a problem is used as a criterion for the selection of appropria- 
te hardware architecture.  As  a  rule,  similar  technologies  are 
encountered as well in solving problems of other classes. 
     In [34], a classification of processing  styles  is  suggested 
providing for a reasonable  mapping   Processing  Type  -  Hardware 
Module. This classification allowed to assume that the  variety  of 
"technologies" involved in machine realization of a broad range  of 
applications is not too large. 
     The conception of Combined architectures provides  for  design 
of a family  of  efficient   concentrated   heterogeneous  systems 
which, in contrast  to  the  existing   distributed   heterogeneous 
systems does not need for  high-bandwidth  communication  networks, 
and does not suffer from the delays arising in  these  networks  at 
the data transfer. 



4. DISTRIBUTED COMPUTING IN CELLULAR STRUCTURES
4.1. HOMOGENEOUS COMPUTING MEDIA
     This important  concept  was  introduced  in  1962  by  Edward 
Yevreinov [35].  The   homogeneous  computing  medium  (HCM)  is  a 
logical   network   consisting   of   identical   and   identically 
interconnected cells. Usually square  cells  are  considered,  each 
connected with its four nearest neighbours. The square form of  the 
element is essential from the viewpoint of complete utilization  of 
the chip area, though the cells can also be triangles or hexagons. 
    The main idea of computing  media  is  embedding  of  arbitrary 
automata into a planar homogeneous cellular structure. 
     The cell should be  a   universal   one,  i.e.  it  should  be 
configurable to  the  implementation  of  each  elementary  logical 
function from some complete basis (for instance, {AND,  OR,  NOT}), 
the memory element function, and interconnection functions ensuring 
construction of arbitrary graphs from accordingly configured chains 
of cells. 
     The  main  properties  of  the  HCM  are   homogeneity,  local 
interaction,  universality of the cells,   possibility  of  setting 
each cell to  implementation  of   any  function  from  the  chosen 
complete set. 
     According to Yevreinov, the HCM should be  manufactured  in  a 
single technological process, like some "computing tissue", getting 
the required "pattern" at the last stage of production, by means of 
appropriate configuring.
     It is clear now that, as early as the 60s,  Yevreinov  foresaw 
the trends of development of parallel  computing  systems  and  the 
potentialities of future VLSI. The early  ideas  of  Yevreinov  (as 
well as of Daniel Slotnick in  the  USA)  by  far  anticipated  the 
present  state  of  computer  science  and  outlined  most  of  the 
fundamental problems of development of  high-performance  computing 
systems.

4.2. PARALLEL SUBSTITUTION SYSTEMS
     A specific approach  to  distributed  (cellular)  computations 
called  parallel substitution  algorithm  (PSA)  was  suggested  in 
[36]. It represents an abstract  automata  model  providing  for  a 
concise mapping of distributed computational process into  cellular 
arrays. 
     In [37], the problems of interpretation of PSA by networks  of 
automata have been presented in detail. 
     The parallel substitution system deals with so-called cellular 
spaces, that is, sets of identical cells (automata). To each  cell, 
at each moment (cycle), two values are related: the unique name  of 
the cell, and  the  state  of  the  cell,  a  variable  essentially 
expressing the processed data. A finite set of cells forms a  word, 
or a  configuration.  Each  configiration  runs  through  different 
states in binary alphabet.   Data  processing  in  this  system  is 
specified by listing the substitutions corresponding to the  chosen 
algorithm. 
     It has been proved  that  these  systems  are  algorithmically 
complete. Based on the PSA theory,  a variety  of  techniques  were 
developed for  designing  algorithmic-oriented  cellular  VLSI  and 
optical architectures [38]. 

4.3. DESIGN AND ANALYSIS OF SYSTOLIC ARRAYS
     The systolic arrays take a  special  place  among  the  modern 
high-performance parallel data processing architectures. On the one 
hand, they present an outcome of the development of known ideas  of 
Edward Yevreinov. On the other hand, in the systolic approach,  the 
advanteges of these  models  have  been  successfully  combined  by 
H.T.Kung  with  the  fruitful  principle  of  pipelining  the  data 
streams.
     Each processing  element  of  a  systolic  matrix  is  pumping 
through itself the data, while performing some prescribed fragments 
of an  appropriate  computational  process.  Thus,  in  some  cases 
systolic   processing   could   achieve   theoretical   limits   of 
performance. 
     An important research in the field of systolic processing  has 
been made by  Stanislav  Sedukhin  (Novosibirsk).  He  developed  a 
formal method of synthesis and analysis of systolic algorithms  and 
structures based on initial specification of the algorithm given as 
a system of linear recurrent equations [39]. This method allows for 
systematic  synthesis  of   all   equivalent  systolic   structures 
admissible for a VLSI implementation with certain constraints. This 
method was used as a basis  for  an  interactive  automated  design 
system S4CAD [40]. Based on these  method and system, a  number  of 
systolic structures for VLSI implementation were obtained,  optimal 
for solving problems of linear algebra, digital signal  processing, 
graph theory, etc. 
     We would like to note here that up to the  present  the  great 
potential possibilities of systolic devices  are  not  sufficiently 
used in practice, because of difficulties in organizing appropriate 
powerful data streams. One approach to overcoming this  problem  is 
the combined architecture described above. 

4.4. DISTRIBUTED FUNCTIONAL STRUCTURES
     Most of the cellular automata  models  (including  Yevreinov's 
HCM) are  universal.  They  can  realize  arbitrary  functions  and 
algorithms, and  the  synthesis  of  necessary  logical  structures 
proceeds using classical automata theory techniques. Unfortunately, 
most specific functions will incur  time  and  hardware  redundancy 
when implemented in this way. 
     Specialized  homogeneous  structures,  which  immediately  map 
algorithms into circuits, represent an alternative to the universal 
ones. In these structures, the  given  algorithm  is  simulated  by 
signal propagation through a specialized logical net.  A  classical 
example of such structure is the content-addressed, or  associative 
memory with its special basic operation of "equality search". Other 
specialized  structures  realizing  other  basic  operations   hawe 
emerged as well. In  1971  Yakov  Fet  in  Novosibirsk  proposed  a 
specialized  cellular  array,  called   a-structure,   with   basic 
operation of "extremum search" [41]. Later on, numerous arrays have 
been designed  implementing  various  basic  operations  (threshold 
searches, nearest neighbour  searches,  component-wise  comparison, 
compression,  etc.).  Arrays  of  this  type   have   been   called  
Distributed Functional structures  (DF-structures) [42]. 
     An  important  feature   of   the   DF-structures   is   their 
multifunctionality. Thus, an a-structure can  be  efficiently  used 
not only for extremum selection, but also as an associative memory, 
a PLA, an interconnection network, etc. 
     The conception of DF-structures allows for design of efficient 
parallel accelerators for diverse computer  architectures.  Indeed, 
the modern technology allows to  implement  distributed  functional 
arrays of sufficient size, which can become  a  new  type  of  VLSI 
product, cellular microprocessors. 



5. CONCLUSION
     Due to restricted size of the paper we  needed  to  limit  our 
overview to short summaries of the above topics. There are  however 
other Russian works in the field of  parallel  computing  which  we 
would like also distinguish. 
     The examples of such works are the investigations in  parallel 
asynchronous computing processes made by Vadim Kotov and  Alexander 
Narin'yani in the late  60s  [43],  the  research  on  networks  of 
automata by Arkady Makarevski [44], the works of  Victor  Malyshkin 
in methods of program linearization for  their  efficient  parallel 
execution [45], the project MARS carried  out  at  the  Novosibirsk 
Computing Center in the middle  of  80s  by  Vadim  Kotov  and  his 
colleagues [6], the work by Vladimir Torgashov et al. on the design 
of recursive computing systems with dynamically changing  structure 
[46], and many others.

R e f e r e n c e s

Davis N.C. and Goodman S.E. The Soviet block's Unified System of computers. Computing Surveys, 1978, Vol.10, No.2, pp.93-122. Wolkott P. and Goodman S.E. High-speed computers of the Soviet Union. Computer, 1988, Vol.21, No.9, pp.32-41. Goodman S.E. The information technologies and Soviet society: problems and prospects. IEEE Trans. on Syst., Man, and Cybern., 1987, Vol. SMC-17, No.4, pp.529-552. Judy R.W. and Clough R.W. Soviet computing in 1980s. -In: Advances in Computing (M.Yovits, ed.), 1989, Vol.29, pp.251-330. Ershov A.P. A history of computing in the USSR. Datamation, 1975, Vol.21, No.6, pp.80-88. Communications of the ACM, 1991, Vol.34, No.6. Malinovsky B.N. Academician S.Lebedev, Kiev, Naukova Dumka, 1992 Malinovsky B.N. Academician V.Glushkov. Kiev, Naukova Dumka, 1993. Kantorovich L.V. On a system of mathematical symbols, convenient for computer operations. Dokl. Acad.Nauk SSSR, 1957, Vol.113, pp.738-741. (In Russian). Pospelov D.A. Introduction to the Theory of Computing Systems. Soviet Radio Publ., Moscow, 1972. (In Russian). Pashkeev S.D. Principles of Multiprogramming for Specialized Computing Systems. Soviet Radio Publ., Moscow, 1972. (In Russian). Tyugu E. Solving problems on computational models. Journal of comp. math. and math. phys., 1970, Vol.10, No.5. (In Russian). Tyugu E. Knowledge-Based Programming. Addison-Wesley, New York, 1988. Sapaty P.S. VOLNA-0 language as a basis of navigation in knowledge bases on semantic networks. Trans. of the USSR Acad. Sci., Series: Engineering Cybernetics, 1986, No.5. (In Russian). Vagin V.N. Deduction and Generalization in Decision-Making Systems. Nauka Publ., Moscow, 1988. (In Russian). Vagin V.N., Zakharov V.N., Pospelov D.A., Sapaty P.S., Uvarova T.G., and Khoroshevsky V.F. Project PAMIR. Trans. of the USSR Acad. Sci., Series: Engineering Cybernetics, 1988, No.2, pp.161-170. (In Russian). Varshavsky V.I. Collective Behaviour of Automata. Nauka Publ., Moscow, 1973. (In Russian). Varshavsky V.I., Pospelov D.A. The Orchestra is Playing Without Conductor. Reflections on the evolution of some engineering systems and their control. Nauka Publ., Moscow, 1984. (In Russian). Pospelov D.A., Eivazov A.R. Decentralized computing systems. Trans. of the USSR Acad. Sci., Series: Engineering Cybernetics, 1968, No.5. (In Russian). Aperiodic Automata. (V.Varshavsky, ed.). Nauka Publ., Moscow, 1976. (In Russian). Automata Control of Asynchronous Processes in Computers and Discrete Systems. (V.Varshavsky, ed.). Nauka Publ., Moscow, 1986. (In Russian). Vazhenin A.P. Hardware and algorithmic support of high accuracy computations in vertical processing systems. Proc. of Int. Conf. "Parallel Computing Technologies", 1993, Obninsk, Russia, 1993, Vol.1, pp.149-161. Vazhenin A.P. Efficient high-accuracy computations in massively parallel systems. Proc. of the First Int. Workshop on Parallel Scientific Computing (PARA'94). Lingby, Denmark, 1994, pp.505-519. (LNCS, Vol.879). Kantorovich L.V. and Fet Ya.I. Computing system comprising a universal digital computer and a small digital computer. USSR Inventor's Certificate No.172567, 1963. (In Russian). Kantorovich L.V., Fet Ya.I., and Ilovayski I.V. Arithmetic unit of a digital computer. USSR Inventor's Certificate No.209032, 1965. (In Russian). Kantorovich L.V., Fet Ya.I., and Ilovayski I.V. Adder for concurrent addition of several binary summands. USSR Inventor's Certificate No.188151, 1965. (In Russian). Yevreinov E.V. and Kosarev Yu.G. High-Performance Homogeneous Universal Computing Systems. Nauka Publ., Novosibirsk, 1966. (In Russian). Kaliaev A.V. Homogeneous Commutation Register Structures. Soviet Radio, Moscow, 1978. (In Russian). Kaliaev A.V. Multiprocessor Systems with Programmable Architecture. Soviet Radio, Moscow, 1984. (In Russian). Kaliaev I.A. Homogeneous neurolike structures for optimization variation problems solving. - In: Proc. of the 5th Int. Conf. "Parallel Architectures and Languages Europe (PARLE'93)", Munich, Germany, 1993, pp.438-451. (LNCS, Vol.694). Prangishvili I.V., Vilenkin S.Ya., and Medvedev I.L. Parallel Computer Systems with Common Control. Energoizdat, Moscow, 1983. Mirenkov N.N. The Siberian approach for an open-system high-performance computing architecture. Computing and Control Engineering Journal, 1992, Vol.3, No.3, pp.137-142. Vazhenin A.P., Sedukhin S.G., Fet Ya.I. High-performance computing systems of combined architecture, In: "Parallel Computing Technologies (PaCT-91)", Novosibirsk, Russia, 1991 (N.N. Mirenkov, ed.), World Scientific, Singapore, pp. 246-257, 1991. Fet Ya.I. and Vazhenin A.P. Heterogeneous processing: a combined approach, In: "Workshop on Parallel Scientific Computing (PARA'94-L)", 1994, Lingby, Denmark, (Lecture Notes in Computer Science, Vol.879), Berlin, Springer-Verlag, pp. 194-206, 1994. Yevreinov E.V. On the microstructure of the elementary machines of a computing system. - In: "Computing Systems", Inst. of Math. of Sib. Div. of USSR Acad. Sci. 1962, Vol.4, pp.3-28. (In Russian). Kornev Yu.N. Piskunov S.V., and Sergeev S.N. Algorithms of general substitutions and their interpretation in automata networks and homogeneous machines. Trans. of the USSR Acad. Sci., Series: Engineering Cybernetics, 1971, No.6, pp.131-142. (In Russian). Parallel Microprogramming Methods (O.L.Bandman, ed.). Nauka Publ., Novosibirsk, 1981. (In Russian). Achasova S.M., Bandman O.L., Markova V.P., and Piskunov S.V. Parallel Substitution Algorithm: Theory and Design. World Scientific Publ., Singapore, 1994. Sedukhin S.G. Design and analysis of systolic algorithms and structures. Programmirovanie, 1991, No.2, pp.20-40. (In Russian). Sedukhin S.G. and Sedukhin I.S. An interactive graphic CAD tool for the synthesis and analysis of VLSI systolic structures. Proc. of Int. Conf. "Parallel Computing Technologies", 1993, Obninsk, Russia, 1993, Vol.1, pp.163-175. Fet Ya.I. Data Sorting Device. USSR Inventor's Sertificate No.424141, 1971 (In Russian). Fet Ya.I. Parallel Processing in Cellular Arrays. Research Studies Press, Ltd., Taunton, UK, 1995. Kotov V.E. and Narin'yani A.S. Asynchronous processes over shared memory. Kybernetica, 1966, No.3, pp.64-71. Makarevsky A.Ya. Realization of Discrete Control Devices in Homogeneous Media. Inst. of Control Problems, Moscow, 1970. Malyshkin V.E. Linearized mass computation. In: "Parallel Computing Technologies (PaCT-91)", Novosibirsk, Russia, 1991 (N.N. Mirenkov, ed.), World Scientific, Singapore, pp. 339-353, 1991.

Torgashov V.A. and Plyusnin V.U. Dynamic architecture computers (DAC). Proc of Int. Conf. "Parallel Computing Technologies", 1993, Obninsk, Russia, 1993, Vol.1, pp.25-29.

Best viewed with Netscape Navigator. The page last updated on April 11, 1997. Web design by Oleg Yu. Repin.

Parallel Computing in Russia

by Ya. Fet 1 and D. Pospelov 2

CONTENT

1. INTRODUCTION

2. PARALLEL PROGRAMMING

2.1. LARGE-BLOCK PROGRAMMING SYSTEM

2.2. GRAPH STRUCTURE OF PARALLEL PROGRAMS

2.3. NETWORK STRUCTURE OF COMPUTATIONAL PROCESSES. WAVE ALGORITHMS

2.4. MODELS OF COLLECTIVE BEHAVIOUR IN ORGANIZING PARALLEL PROCESSES

2.5. PARALLEL HIGH-ACCURACY COMPUTATIONS

3. PARALLEL COMPUTING SYSTEMS

3.1. VECTOR PIPELINE PROCESSOR AM

3.2. HOMOGENEOUS COMPUTING SYSTEMS

3.3. MULTIPROCESSORS WITH PROGRAMMABLE ARCHITECTURE

3.4. SIMD COMPUTER PS-2000

3.5. HIGH-PERFORMANCE HETEROGENEOUS SYSTEM "SIBERIA"

3.6. COMBINED ARCHITECTURE SYSTEMS

4. DISTRIBUTED COMPUTING IN CELLULAR STRUCTURES

4.1. HOMOGENEOUS COMPUTING MEDIA

4.2. PARALLEL SUBSTITUTION SYSTEMS

4.3. DESIGN AND ANALYSIS OF SYSTOLIC ARRAYS

4.4. DISTRIBUTED FUNCTIONAL STRUCTURES

5. CONCLUSION

R e f e r e n c e s

by Ya. Fet ¹ and D. Pospelov ²