Research of Architectures
Martti Forsell, Department of Computer Science, University of Joensuu,
Finland.
The performance of computers has been duplicating every second year during
last three decades. This has been the result of major progress in areas
of VLSI technology and processor architectures. Now certain physical reasons
like the speed of light and the size of moleculs are beginning to slow down
this development.
The use of parallelism either within a processor (instruction level parallelism)
or between many processors in a single computer (parallel processing) provides,
however, possibilities to increase the performance of computers even if
the physical reasons will prevent manufacturing faster and smaller processors.
The new architectures and models like VLIW, MTA and PRAM play important
role fulfilling these promises.
The goal of this research is to find and evaluate efficient processor, memory
system and communication architectures for parallel computers by combining
the instruction level parallel model with the parallel processing model.
Our proposal for a combined model - Instruction Level Parallel Shared Memory
Architecture (IPSM) - consists of four main components:
- Limited size multiport memories [Forsell94a]
- Scalable two-level communication network [Forsell96a]
- Minimally pipelined scalar unit architecture [Forsell96b]
- Superpipelined parallel unit architecture [Forsell96c]
Limited size multiport memories are quite simple extensions of single port
memories. They can be used as building blocks of limites size shared memory
systems or as a low level solution for two-level communication solution.
Unfortunately p-port memory systems are p*p times more expensive than single
port memory systems of the same size.
Coated Mesh (CM) is a topology in which a mesh of routers is coated with
processor-memory modules. This raises the communication capacity and volume
of the network to a level where parallel slackness can be used to hide latency
of the network. Coated Block Mesh (CBM) is a variation of a coated mesh,
in which routers are grouped and replaced with router blocks and processors
are grouped with small multiport memories. CBM can be used as a scalable
two-level communication network for time-processor optimal shared memory
simulations.
An alternative to superscalar architecture is a novel VLIW architecture
(Minimal Pipeline Architecture, MPA), in which the length of pipeline is
minimized so that both data and control delays are minimized. This requires
novel techniques like general forwarding, uncoded instructions, simple instruction
set and fast branching. MPA gives better scalar unit performance than superscalar
architectures using out of order execution while it takes not more silicon
area.
Our proposal for efficient parallel unit architecture is a novel multithreaded
architecture (MultiThreaded Architecture with Chaining, MTAC). The list
of used techniques is extensive: superpipelining, multithreading, functional
unit chaining and VLIW scheduling. The architecture uses multithreading
to hide the latency of memory requests in a communication network. VLIW
scheduling is used to simplify the structure of the processor. Functional
unit chaining gives a possibility to run a block of code containing true
data dependencies within a single clock cycle. Finally, extensive superpipelining
decreases the clock cycle to a minimum.
Research topics
Instruction level parallelism:
- Pipelined and superscalar execution model
- Very Long Instruction Word (VLIW) architectures
Parallel computation:
- MultiThreading Architectures (MTAs)
- Fast communication
- Parallel memories
References
[Forsell94a]
- M. Forsell, Are multiport memories physically feasible?, Computer
Architecture News 22, 4 (September 1994), 47-54.
[Forsell94b]
- M. Forsell, MPASim - A simulator for MPA, Report B-1994-3, Department
of Computer Science, University of Joensuu, Finland, 1994.
[Forsell96a]
- M. Forsell, V. Leppänen and M. Penttonen, Efficient Two-Level
Mesh based Simulation of PRAMs, Proceedings of the International Symposium
on Parallel Architectures, Algorithms and Networks, June 12-14, 1996, Beijing,
China, 29-35.
[Forsell96b]
- M. Forsell, Minimal Pipeline Architecture-an Alternative to Superscalar
Architecture, Microprocessors and Microsystems 20, 5 (1996), 277-284.
[Forsell96c]
- M. Forsell, MTAC - a multithreaded VLIW architecture for PRAM simulation,
Report A-1996-7, Department of Computer Science, University of Joensuu,
Finland, 1996.
Last modified January 29, 1997 MF.