Claims
- 1. A method of static macro-scheduling of parallel execution of programs on a single-chip multiprocessor with explicit parallelism architecture processors, the method comprising:
scheduling of parallel program execution on multiple processors in a static clock-precise manner; dividing the scheduled program into parallel streams in order to minimize data and control dependencies between different streams, wherein the number of instructions in a group for parallel execution in each stream does not exceed the abilities of the processor intended for parallel execution of a given stream; defining a sequence of execution in synchronization points of each pair of streams as “one later than other,” “one earlier than other” and “simultaneous”; executing the sequence of execution in the synchronization points; directly exchanging data and address information between different streams at a register file level and data cache level; and mutually controlling the streams from the executed program.
- 2. The method of claim 1, wherein synchronization between separate streams is performed directly by special instructions without using semaphores.
- 3. The method of claim 1, wherein synchronization between separate streams is performed using semaphores.
- 4. The method of claim 1, wherein data exchange is performed at the register file level.
- 5. The method of claim 1, wherein the store data exchange is performed at the data cache level.
- 6. The method of claim 1, wherein the control is transferred to target addresses received from another stream.
- 7. The method of claim 1, wherein branching in a multitude of streams is performed under control of one stream.
- 8. The method of claim 1, wherein synchronization is performed using a synchronized event counter, which overflow causes the event source stream lock, and zero state—the event receiver stream lock.
- 9. The method of claim 1, wherein the number of streams is equal to or less than the number of available processors.
- 10. A single chip multiprocessor system with explicit parallelism architecture processors for executing programs compiled using a static clock-precise macro-scheduling program and dividing schedules into streams for parallel execution on different processors, ensuring data exchange at a register file level, address and data exchange at a data cache level between the processors, as well as synchronization of parallel processor execution using special operations and control signals connecting all processors, the system comprising:
a plurality of explicit parallelism architecture processors, each processor including an instruction cache, control unit, multiple execution units, register file, data cache, memory control unit, array access unit, predicate file, bypass bus, and an interprocessor exchange subsystem for exchanging data, address information and signals controlling synchronization of the parallel processor execution, the interprocessor exchange subsystem comprising:
synchronization means for parallel processor operation including means for signal issue permitting synchronization operation execution in other processors and means for receiving other processor signals permitting synchronization operation execution in its respective processor; means for data exchange between the register files, including means for data issue from the processor registers to other processors and means for receiving data from other processors for writing them into the registers of its own processor; and means for address and data exchange to support processor cache coherence, including means for address and data transfer to other processors during execution of a store operation and a buffer for receiving and temporary storage of addresses and data from other processors, reviewed by all memory access operations of its respective processor; interprocessor interfaces including multiple communication buses between the processors for transfer of data, address information and signals controlling synchronization of the parallel processor execution; a system interface unit; a shared cache unit; and a plurality of units including I/O controllers, memory controllers, co-processors.
- 11. The single-chip multiprocessor system of claim 10 wherein virtual numbering of explicit parallelism architecture processors is used and the interprocessor exchange subsystem additionally comprises:
a configuration unit containing information about physical numbers of the processors included in a particular working configuration and controlling conversion of virtual processor numbers used in the program into physical processor numbers of a real working configuration, while exchanging data and controlling signals between the processors.
- 12. The single-chip multiprocessor system of claim 10 wherein each interprocessor exchange subsystem is connected to communication buses of other processors.
- 13. The single-chip multiprocessor system of claim 12, wherein interprocessor interfaces comprise communication buses of full cross type for transfer of data, address information and signals controlling synchronization of the parallel processor execution.
- 14. The single-chip multiprocessor system of claim 12, wherein the interprocessor interfaces comprise a multitude of buses shared by all processors for data and address information transfer and a multitude of buses of full cross type for transfer of synchronization control signals.
- 15. The single-chip multiprocessor system of claim 12, wherein the interprocessor interfaces comprise only a multitude of buses of full cross type for transfer of synchronization control signals.
- 16. A multi-chip multiprocessor system comprising a multitude of single-chip multiprocessors based on explicit parallelism architecture processors for executing programs compiled using static clock-precise macro-scheduling of a program and dividing a schedule into streams for parallel execution on different processors, ensuring interaction between the single-chip multiprocessors by means of a system interface and main memory and interaction between the processors of the single-chip multiprocessor by means of data exchange at a register file level, address and data exchange at a data cache level, and synchronization of parallel processor execution using special operations and control signals connecting all processors of the single-chip multiprocessor.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority from Provisional Application No. 60/183,176, filed Feb. 17, 2000, the disclosure of which is incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60183176 |
Feb 2000 |
US |