Research of Architectures

Martti Forsell, Department of Computer Science, University of Joensuu, Finland.


The performance of computers has been duplicating every second year during last three decades. This has been the result of major progress in areas of VLSI technology and processor architectures. Now certain physical reasons like the speed of light and the size of moleculs are beginning to slow down this development.

The use of parallelism either within a processor (instruction level parallelism) or between many processors in a single computer (parallel processing) provides, however, possibilities to increase the performance of computers even if the physical reasons will prevent manufacturing faster and smaller processors. The new architectures and models like VLIW, MTA and PRAM play important role fulfilling these promises.

The goal of this research is to find and evaluate efficient processor, memory system and communication architectures for parallel computers by combining the instruction level parallel model with the parallel processing model.

Our proposal for a combined model - Instruction Level Parallel Shared Memory Architecture (IPSM) - consists of four main components: Limited size multiport memories are quite simple extensions of single port memories. They can be used as building blocks of limites size shared memory systems or as a low level solution for two-level communication solution. Unfortunately p-port memory systems are p*p times more expensive than single port memory systems of the same size.

Coated Mesh (CM) is a topology in which a mesh of routers is coated with processor-memory modules. This raises the communication capacity and volume of the network to a level where parallel slackness can be used to hide latency of the network. Coated Block Mesh (CBM) is a variation of a coated mesh, in which routers are grouped and replaced with router blocks and processors are grouped with small multiport memories. CBM can be used as a scalable two-level communication network for time-processor optimal shared memory simulations.

An alternative to superscalar architecture is a novel VLIW architecture (Minimal Pipeline Architecture, MPA), in which the length of pipeline is minimized so that both data and control delays are minimized. This requires novel techniques like general forwarding, uncoded instructions, simple instruction set and fast branching. MPA gives better scalar unit performance than superscalar architectures using out of order execution while it takes not more silicon area.

Our proposal for efficient parallel unit architecture is a novel multithreaded architecture (MultiThreaded Architecture with Chaining, MTAC). The list of used techniques is extensive: superpipelining, multithreading, functional unit chaining and VLIW scheduling. The architecture uses multithreading to hide the latency of memory requests in a communication network. VLIW scheduling is used to simplify the structure of the processor. Functional unit chaining gives a possibility to run a block of code containing true data dependencies within a single clock cycle. Finally, extensive superpipelining decreases the clock cycle to a minimum.

Research topics

Instruction level parallelism: Parallel computation:


References

[Forsell94a] [Forsell94b] [Forsell96a] [Forsell96b] [Forsell96c]
Last modified January 29, 1997 MF.