A multithreaded vector co-processor design is described. It is intended to be placed with its private vector memory, on an expansion board, linked to the scalar processor and its cache-based memory hierarchy. The vector co-processor can run up to 8 vector tasks (threads) in parallel. Vector registers can be accessed either as independent sets of scalar values or as array sets. The Tomasulo's algorithm, simplified to keep the issue and termination logics simple in a multithreaded context, dynamically schedules the dependent instructions. A locking feature is provided to handle both the reductions and the complex recurrences in a vector form.
Keywords: hardware, architecture
Source:
B. Goossens, A Multithreaded Vector Co-processor. In V. Malyshkin (ed.),
Parallel Computing Technologies: Proceedings of the 4th International Conference,
Lect. Notes in Comp. Sci., Vol. 1277, Springer, 1997, pp. 311-321