Claims
- 1. A method for processing a matrix of elements in a processor, the method comprising steps of:
loading a first subset of matrix elements from a first location; loading a second subset of matrix elements from a second location; storing a third subset of matrix elements in a first destination; and storing a fourth subset of matrix elements in a second destination, wherein the loading and storing steps result from a first instruction issue.
- 2. The method for processing the matrix of elements in the processor as recited in claim 1, wherein n sub-instructions perform an n-by-n matrix transpose.
- 3. The method for processing the matrix of elements in the processor as recited in claim 1, wherein the first loading step is performed with a first processing path and the second loading step is performed with a second processing path.
- 4. The method for processing the matrix of elements in the processor as recited in claim 1, further comprising the steps of:
loading a fifth subset of matrix elements from a fifth location; loading a sixth subset of matrix elements from a sixth location; storing a seventh subset of matrix elements in a third destination; and storing a eighth subset of matrix elements in a fourth destination.
- 5. The method for processing the matrix of elements in the processor as recited in claim 4, wherein the loading and storing steps introduced in claim 4 result from a second instruction issue.
- 6. The method for processing the matrix of elements in the processor as recited in claim 4, wherein each of the first through fourth destination include a matrix column.
- 7. The method for processing the matrix of elements in the processor as recited in claim 1, wherein each of the first through fourth locations include a matrix row.
- 8. The method for processing the matrix of elements in the processor as recited in claim 1, wherein the third and fourth subsets each comprise elements from the first and second subsets.
- 9. A processing core for transposing a matrix, comprising:
a first source location comprising a first plurality of matrix elements; a second source register comprising a second plurality of matrix elements; a third source register comprising a third plurality of matrix elements; a fourth source register comprising a fourth plurality of matrix elements; a first destination register comprising a fifth plurality of matrix elements; a second destination register comprising a sixth plurality of matrix elements; a first processing path coupled to the first through fourth source registers and the first destination register; and a second processing path coupled to the first through fourth source registers and the second destination register.
- 10. The processing core for transposing the matrix of claim 9, wherein:
the first through fourth registers each include a plurality of source fields, and each source field includes a matrix element.
- 11. The processing core for transposing the matrix of claim 9, wherein:
the first and second destination registers each include a plurality of result fields, and each source field includes a matrix element.
- 12. The processing core for transposing the matrix of claim 9, further comprising
first and second instruction processors; and an exchange path between the first and second instruction processors.
- 13. The processing core for transposing the matrix of claim 9, wherein the first processing path receives a first sub-instruction and the second processing path receives a second sub-instruction.
- 14. The processing core for transposing the matrix of claim 9, wherein each of the first through fourth source registers include a matrix row.
- 15. The processing core for transposing the matrix of claim 9, wherein each of the first and second destination registers include a matrix column.
- 16. The processing core for transposing the matrix of claim 9, wherein the first and second destination registers are addressed by a first and second sub-instructions which are included in a very long instruction word.
- 17. A method for processing a matrix of elements, the method comprising steps of:
loading a first instruction; loading a second instruction, wherein the first and second instructions address a first source register, second source register, third source register, fourth source register, first destination register and second destination register; loading a third instruction; loading a fourth instruction, wherein the third and fourth instructions address the first source register, the second source register, the third source register, the fourth source register, a third destination register and a fourth destination register; storing a first element of the first source register in the first destination register; and storing a fourth element of the first source register in the fourth destination register, wherein a plurality of the first through fourth elements comprise a same instruction issue.
- 18. The method for processing the matrix of elements of claim 17, wherein the first and second instructions include a first operation code and the third and fourth instructions include a second operation code different from the first operation code.
- 19. The method for processing the matrix of elements of claim 17, wherein the first and second instructions include a first operation code and the third and fourth instructions include a second operation code different from the first operation code.
- 20. The method for processing the matrix of elements of claim 17, wherein the first instruction is a sub-instruction in a very long instruction word.
CROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 60/187,779 filed on Mar. 8, 2000.
[0002] This application is being filed concurrently with related U.S. patent applications: Attorney Docket Number 016747-00991, entitled “VLIW Computer Processing Architecture with On-chip DRAM Usable as Physical Memory or Cache Memory”; Attorney Docket Number 016747-01001, entitled “VLIW Computer Processing Architecture Having a Scalable Number of Register Files”; Attorney Docket Number 016747-01780, entitled “Computer Processing Architecture Having a Scalable Number of Processing Paths and Pipelines”; Attorney Docket Number 016747-01051, entitled “VLIW Computer Processing Architecture with On-chip Dynamic RAM”; Attorney Docket Number 016747-01211, entitled “Computer Processing Architecture Having the Program Counter Stored in a Register File Register”; Attorney Docket Number 016747-01461, entitled “Processing Architecture Having Parallel Arithmetic Capability”; Attorney Docket Number 016747-01471, entitled “Processing Architecture Having an Array Bounds Check Capability”; Attorney Docket Number 016747-01481, entitled “Processing Architecture Having an Array Bounds Check Capability”; and, Attorney Docket Number 016747-01531, entitled “Processing Architecture Having a Compare Capability”; all of which are incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60187779 |
Mar 2000 |
US |