Claims
- 1. A group of instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; a mesh row column interconnect that couples said processing elements into a 4×4 matrix processing array; a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- 2. A Matrix Processor that includes instructions that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; a mesh row column interconnect that couples said processing elements into a 4×4 matrix processing array; a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- 3. A system that includes a Matrix Processor with instructions that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; a mesh row column interconnect that couples said processing elements into a 4×4 matrix processing array; a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between different said first, second, third, and fourth registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- 4. A method to make a Matrix Processor that includes instructions that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
providing 16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; coupling said processing elements into a 4×4 matrix processing array with a mesh row column interconnect; providing a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- 5. A method to use instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
providing 16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; providing a mesh row column interconnect that couples said processing elements into a 4×4 matrix processing array; providing a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; and simultaneously swapping row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- 6. A dependent claim according to claims 1, 2, 3, 4, or 5 wherein successive iterations or combinations of the instructions perform standard tensor matrix operations from the following group of matrix operations: transpose, shuffle, and deal.
- 7. A dependent claim according to claims 1, 2, 3, 4, or 5 wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders: 4 vectors of the larger data matrix to a 4×4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×4 data sub-matrix in column major order.
- 8. A group of instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
16 processing elements where an individual processing element (PE) comprises 16 PE register entries in a PE register file; a mesh row column interconnect that couples said processing elements into a 4×4 matrix processing array; 16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; wherein a group of said 16 matrix registers comprises a first, second, third, and fourth matrix register of said 16 matrix registers that simultaneously swaps row or columns between said first, second, third, and fourth matrix registers of said group of matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; and wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders: 4 vectors of the larger data matrix to a 4×4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×4 data sub-matrix in column major order.
- 9. A Matrix Processor that includes instructions that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; a mesh row column interconnect that couples said processing elements into a 4×4 matrix processing array; 16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; wherein a group of said 16 matrix registers comprises a first, second, third, and fourth matrix register of said 16 matrix registers that simultaneously swaps row or columns between said first, second, third, and fourth matrix register of said group of matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; and wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders: 4 vectors of the larger data matrix to a 4×4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×4 data sub-matrix in column major order.
- 10. A system that includes a Matrix Processor with instructions that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; a mesh row column interconnect that couples said processing elements into a 4×4 matrix processing array; 16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; wherein a group of said 16 matrix registers comprises a first, second, third, and fourth matrix register that simultaneously swaps row or columns between said first, second, third, and fourth matrix register of said group of matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; and wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders: 4 vectors of the larger data matrix to a 4×4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×4 data sub-matrix in column major order.
- 11. A method to make a Matrix Processor that includes instructions that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
providing 16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; coupling said processing elements into a 4×4 matrix processing array with a mesh row column interconnect; providing 16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; wherein a group of said 16 matrix registers comprises a first, second, third, and fourth matrix register that simultaneously swaps row or columns between said first, second, third, and fourth matrix register of said group of matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; and wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders: 4 vectors of the larger data matrix to a 4×4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×4 data sub-matrix in column major order.
- 12. A method to use instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×B matrix of data where the data matrix includes one or more 4×4 sub-matrices of data, comprising:
providing 16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file; providing a mesh row column interconnect that couples said processing elements into a 4×4 matrix processing array; providing 16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; and simultaneously swapping row or columns between a group of said 16 matrix registers that comprise a first, second, third, and fourth matrix register according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders: 4 vectors of the larger data matrix to a 4×4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×4 data sub-matrix in column major order.
- 13. A dependent claim according to claims 8, 9, 10, 11, or 12 wherein successive iterations or combinations of the instructions perform standard tensor matrix operations from the following group of matrix operations: transpose, shuffle, and deal.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefits of the earlier filed U.S. Provisional Application Serial No. 60/296,410, filed Jun. 06, 2001 (Jun. 06, 2001), which is incorporated by reference for all purposes into this specification.
[0002] Additionally, this application claims the benefits of the earlier filed U.S. Provisional Application Serial No. 60/374,174, filed Apr. 19, 2002 (Apr. 19, 2002), which is incorporated by reference for all purposes into this specification.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60296410 |
Jun 2001 |
US |
|
60374174 |
Apr 2002 |
US |