Orthogonal data transposition system and method during data transfers to/from a processing array

Description

FIELD

The disclosure relates generally to a device and method for connecting a processing array to a storage memory and in particular to a device and method that 1) transposes the data words output from the processing array for storage in the storage memory; and 2) transposes the data words from the storage to the processing array since the storage memory and the processing array store data differently.

BACKGROUND

A storage memory in a computer system or another conventional storage memory may be connected to each other by a data bus, DBus(63:0), as shown in FIG. 1. As shown in FIG. 1A, the data words in the storage memory are stored in the same orientation as the data words in the other conventional storage memory.

In contrast, each bit line in a processing array essentially functions as a mini-processor and has a plurality of computational memory cells connected to each bit line. In a processing array with the plurality of computational memory cells, reading multiple computational memory cells along the same bit line simultaneously produces a logical function (e.g. logical AND) of the memory cell contents on the read bit line. Additional circuitry can be implemented around the bit line and its associated memory cells to enable more complex logical operations on the data stored in those memory cells. The processing array may have a plurality of sections wherein each section has a plurality of bit line sections and each bit line section has a plurality of computational memory cells whose read bit lines are connected together to produce the logical function. Because the bit line is the central processing element in the processing array, and because all bit lines within each section receive the same control signals and therefore perform the same computations on their respective data, data words are stored in the processing array along bit lines (with each bit on a separate word line)—either entirely on the same bit line within a section or along the same relative bit line across multiple sections.

For example, one way to add two 16-bit data words is to store the first LSB of each data word in memory cells on bit line 0 in section 0, the second LSB of each data word in memory cells on bit line 0 in section 1, etc. Then, the software algorithm performs the logic and shift operations necessary to add the two 16-bit data words together; shift operations are needed to shift the carry bit result from section 0 to section 1 after adding the first LSBs in section 0, to shift the carry bit result from section 1 to section 2 after adding the carry bit from section 0 to the second L2Bs in section 1, etc. In this way, if the processing array consists of 16 sections with “n” bit lines per section, then “n” simultaneous 16-bit ADD operations can be performed.

The way that data words are stored in the processing array—i.e. along the same bit line (with each bit on a separate word line)—is orthogonal to how data words are conventionally stored in a memory array used exclusively for storage (e.g. storage memory)—i.e. along the same word line (with each bit on a separate bit line). Consequently, when data words are transferred between storage memory and the processing array, a mechanism is needed to orthogonally transpose the data and it is to that end that this disclosure is directed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts storage memory connected to other conventional storage memory. In this depiction, a 64-bit data bus facilitates data transfers between the two memory arrays.

FIG. 1A is an expansion of FIG. 1 that illustrates the data byte layout in storage memory and the other conventional storage memory when data is transferred between them.

FIG. 2 depicts storage memory connected to a processing array via an intermediate transposer that orthogonally transposes the data during data transfers between the two memory arrays. In this depiction, a 64-bit data bus facilitates data transfers between storage memory and the transposer, and an n-bit data bus facilitates data transfers between the transposer and the processing array.

FIGS. 3A-3C illustrate an example of the processing array shown in FIGS. 2, 4, 6A and 6B that may be used with the transposer.

FIG. 4 is an expansion of FIG. 2 that depicts the transposer constructed as a 64-row by n-column array of register bits. The diagram illustrates the data byte layout in storage memory, the transposer, and the processing array when data is transferred between them.

FIG. 5 depicts an exemplary implementation of the transposer from FIG. 2 or 4 constructed as a two-way shift register. The diagram illustrates how data is column-shifted through the transposer during data transfers between it and storage memory, and how data is row-shifted through the transposer during data transfers between it and the processing array.

FIG. 6A depicts storage memory connected to a processing array via an intermediate transposer that orthogonally transposes the data during data transfers between the two memory arrays, and via an intermediate “buffer” that buffers data transfers between the transposer and the processing array. In this depiction, a 64-bit data bus facilitates data transfers between storage memory and the transposer, a 64-bit data bus facilitates data transfers between the transposer and the buffer, and an n-bit data bus facilitates data transfers between the buffer and the processing array.

FIG. 6B is an expansion of FIG. 6A that depicts the transposer constructed as a 64-row by 64-column array of register bits, and that depicts the buffer constructed as a 64-row by n-column array of storage elements. The diagram illustrates the data byte layout in storage memory, the transposer, the buffer, and the processing array when data is transferred between them.

FIG. 6C depicts the transposer from FIG. 6B constructed as a two-way shift register. The diagram illustrates how data is column-shifted through the transposer during data transfers between it and storage memory, and how data is row-shifted through the transposer during data transfers between it and the buffer.

FIG. 6D depicts the detailed circuit implementation of the transposer in FIG. 6C.

FIG. 7 depicts an alternate form of the transposer depicted in FIGS. 3a & 3b, constructed as a 32-row by 64-column array of register bits, and as a two-way shift register.

FIG. 8 depicts an alternate form of the transposer depicted in FIGS. 5 and 6C, constructed as a 16-row by 64-column array of register bits, and as a two-way shift register.

FIG. 9 depicts an alternate form of the transposer depicted in FIGS. 5 and 6C, constructed as an 8-row by 64-column array of register bits, and as a two-way shift register.

DETAILED DESCRIPTION OF ONE OR MORE EMBODIMENTS

The disclosure is particularly applicable to a processing array device using SRAM memory cells in which data words are stored along word lines in storage memory and other conventional storage memory, and data words are stored along bit lines in the processing array and it is in this context that the disclosure will be described. It will be appreciated, however, that the apparatus and method has greater utility, such as to being used with other processing array devices. The device and method may orthogonally transpose the data transferred between the storage memory and the processing array device. The device and method may use a transposer or the combination of a transposer and a buffer to perform the orthogonal transposing of the data. In the embodiments described below, the data being transposed may be a certain number of bit units of data that are transferred between the storage memory and the processing array device. Examples using a sixty four bit unit of data, a thirty two bit unit of data, a sixteen bit unit of data and an eight bit unit of data are provided below although the device and method may operate with other different sized units of data. For purposes of the disclosure, all of the above different sized bit units of data that may be transferred by the device and method will be known as a unit of data.

FIRST EMBODIMENT

FIG. 2 depicts a first embodiment of a processing unit 20 that includes a storage memory 22 connected to a processing array 30 (constructed from a plurality of computational memory cells) via a data path 26, that may be an intermediate transposer 26 in one implementation, that orthogonally transposes the data during data transfers between the storage memory array 22 and the processing array 30 since data words are stored along word lines in storage memory 22 and other conventional storage memory and data words are stored along bit lines in the processing array 30. In this depiction, a 64-bit data bus 24, DBus(63:0) for example shown in FIG. 2, facilitates data transfers between storage memory 22 and the transposer 26, and an n-bit data bus 28, TBus(n−1:0) for example shown in FIG. 2, facilitates data transfers between the transposer 26 and the processing array 30. The elements of the processing unit 20 shown in FIG. 2 may be housed in a single semiconductor chip/integrated circuit (with a single die or multiple dies) or multiple integrated circuits. Alternatively, the processing array 30 and transposer 26 may be part of the integrated processing unit 20 (a single integrated circuit or semiconductor chip with one or multiple dies) while the storage memory 22 may be separate.

One way to orthogonally transpose data during data transfers between storage memory 22 and the processing array 30 is to implement a two-dimensional storage array block between them that facilitates the orthogonal data transposition. This intermediate storage block is henceforth referred to as the “transposer” 26. When a transposer is utilized in this manner, data transfers between storage memory 22 and the processing array 30 may be accomplished in two steps:

1. A data transfer between storage memory 22 and the transposer 26.

2. A data transfer between the transposer 26 and the processing array 30.

The execution order of the two steps depends on the direction of data transfer, storage memory→processing array or processing array→storage memory.

An example of the processing array 30 that may be used with the processing unit 20 and transposer 26 is shown in FIGS. 3A-3C although it is understood that the transposer 26 may be used with other processing arrays 30 is a similar manner that are within the scope of this disclosure. FIG. 3A illustrates an example of a processing array 30 with computational memory cells in an array that may be incorporated into a semiconductor memory or computer system and may include transposer circuitry. The processing array 30 may include an array of computational memory cells (cell 00, . . . , cell 0n and cell m0, . . . , cell mn). In one embodiment, the array of computational memory cells may be rectangular as shown in FIG. 3A and may have a plurality of columns and a plurality of rows wherein the computational memory cells in a particular column may also be connected to the same read bit line (RBL0, . . . , RBLn). The processing array 30 may further include a wordline (WL) generator and read/write logic control circuit 32 that may be connected to and generate signals for the read word line (RE) and write word line (WE) for each memory cell (such as RE0, . . . , REn and WE0, . . . , WEn) to control the read and write operations is well known and one or more read/write circuitry 34 that are connected to the read and write bit lines of the computational memory cells. The wordline (WL) generator and read/write logic control circuit 32 may also generate one or more control signals that control each read/write circuitry 34.

In the embodiment shown in FIG. 3A, the processing array may have read/write circuitry 34 for each set of bit line signals of the computational memory cells (e.g., for each column of the computational memory cells whose read bit lines are connected to each other). For example, BL0 read/write logic 340 may be coupled to the read and write bit lines (WBLb0, WBL0 and RBL0) for the computational memory cells in column 0 of the array and BLn read/write logic 34n may be coupled to the read and write bit lines (WBLbn, WBLn and RBLn) for the computational memory cells in column n of the array as shown in FIG. 3A.

During a read operation, the wordline (WL) generator and read/write logic control circuit 32 may activate one or more word lines that activate one or more computational memory cells so that the read bit lines of those one or more computational memory cells may be read out. Further details of the read operation are not provided here since the read operation is well known.

Each computational memory cell in the processing array may be a static random access memory (SRAM) cell based computational memory cell that is able to perform a computation as described above. It is noted that the processing array 30 may be constructed using other different types of memory cells. The details of an exemplary computational memory cell that may be used as part of the processing array 30 may be found in co-pending U.S. patent application Ser. No. 15/709,399, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells For Xor And Xnor Computations”, U.S. patent application Ser. No. 15/709,401, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells For Xor And Xnor Computations”, U.S. patent application Ser. No. 15/709,379, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells”, U.S. patent application Ser. No. 15/709,382, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells”, and U.S. patent application Ser. No. 15/709,385, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells” and U.S. Provisional Patent Application No. 62/430,767, filed Dec. 6, 2016 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells For Xor And Xnor Computations” and U.S. Provisional Patent Application No. 62/430,762, filed Dec. 6, 2016 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells”, all of which are incorporated by reference herein.

FIGS. 3B and 3C illustrate the processing array 30 with computational memory cells having sections having the same elements as shown in FIG. 3A. The array 30 in FIG. 3B has one section (Section 0) with “n” bit lines (bit line 0 (BL0), . . . , bit line n (BLn)) in different bit line sections (b1-sect), where each bit line connects to “m” computational memory cells (cell 00, . . . , cell m0 for bit line 0, for example). In the example in FIG. 3B, the m cells may be the plurality of computational memory cells that are part of each column of the array 30. FIG. 3C illustrates the processing array 30 with computational memory cells having multiple sections. In the example in FIG. 3C, the processing array device 30 comprises “k” sections with “n” bit lines each, where each bit line within each section connects to “m” computational memory cells. Note that the other elements of the processing array 30 are present in FIG. 3C, but not shown for clarity. In FIG. 3C, the BL-Sect(0,0) block shown corresponds to the BL-Sect(0,0) shown in FIG. 3B with the plurality of computational memory cells and the read/write logic 340 and each other block shown in FIG. 3C corresponds to a separate portion of the processing array. As shown in FIG. 3C, the set of control signals, generated by the wordline generator and read/write logic controller 32, for each section may include one or more read enable control signals (for example S[0]_RE[m:0] for section 0), one or more write enable control signals (for example S[0]_WE[m:0] for section 0) and one or more read/write control signals (for example S[0]_RW_Ctrl[p:0] for section 0). As shown in FIG. 3C, the array 30 may have a plurality of sections (0, . . . , k in the example in FIG. 3C) and each section may have multiple bit line sections (0, . . . , n per section, in the example in FIG. 3C).

Returning to FIG. 2, in the first embodiment, the transposer 26 is constructed as a two-way shift register array with “x” rows and “y” columns. The number of rows “x” is equal to the width of the data bus “DBus” that connects the transposer 26 to storage memory 22; e.g. x=64 if DBus=64 bits. The number of columns “y” is equal to the number of columns (i.e. bit lines) in a processing array section—e.g. y=n if a processing array section has “n” bit lines, and establishes the width of the data bus “TBus” that connects the transposer to the processing array—e.g. if y=n then TBus=n bits. The number of columns in the transposer is equal to the number of columns in a processing array section so that all columns associated with a particular row in a particular processing array section can be written to or read from simultaneously, thereby eliminating the need for any sort of column-addressability within a processing array section.

An example of the transposer 26 constructed as a 64-row by n-column array of register bits, where “n” is equal to the number of columns (bit lines) in a section of the processing array 30 is shown in FIG. 4. FIG. 4 illustrates that the initial data byte layout in storage memory 22, or after “n” units of 64-bit column data have been transferred from the transposer to storage memory, the data byte layout in the transposer after “n” units of 64-bit column data have been transferred from storage memory to the transposer, or after 64 units of n-bit row data have been transferred from the processing array to the transposer; and the data byte layout in the processing array after 64 units of n-bit row data have been transferred from the transposer to the processing array.

Operation of First Embodiment

The processing unit 20 with the storage memory 22, the transposer 26 and the processing array 30 may be operated to transfer data from the storage memory 22 to the processing array 30 and to transfer data from the processing array 30 to the storage memory 22 using different processes, each of which is now described in more detail.

During Storage Memory 22 to Processing Array 30 Data Transfers

1. When data is transferred from storage memory 22 to the transposer 26 on the 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:
- The 64-bit unit of transfer data on DBus(63:0) is stored in column “n−1”, rows 63:0 of the transposer.
- The 64-bit units of previously-loaded data in columns 1 through “n−1” of the transposer are column-shifted to columns 0 through “n−2”.
- The 64-bit unit of previously-loaded data in column 0, rows 63:0 of the transposer is discarded.

This procedure repeats until all “n” columns of the transposer have been loaded with storage memory data. The column shifting is the first of the two ways in which the transposer can be shifted.

2. When data is subsequently transferred from the transposer 26 to the processing array 30 on an n-bit data bus “TBus(n−1:0)”, the following occurs simultaneously:
- The n-bit unit of data in row 0, columns n−1:0 of the transposer is output to TBus(n−1:0).
- The n-bit units of data in rows 1 through 63 of the transposer are row-shifted to rows 0 through 62.
- A logic “0” is stored in row 63, columns n−1:0 of the transposer. This is arbitrary—the state of the transposer doesn't matter after storage memory data has been transferred to the processing array.

This procedure repeats until all 64 rows of transposer data have been transferred to the processing array 30. The row shifting is the second of the two ways in which the transposer can be shifted.

During Processing Array 30 to Storage Memory 22 Data Transfers:

1. When data is transferred from the processing array to the transposer on a n-bit data bus “TBus(n−1:0)”, the following occurs simultaneously:
- The n-bit unit of transfer data on TBus(n−1:0) is stored in row 63, columns n−1:0 of the transposer.
- The n-bit units of previously-loaded data in rows 1 through 63 of the transposer are row-shifted to rows 0 through 62.
- The n-bit unit of previously-loaded data in row 0, columns n−1:0 of the transposer is discarded.

This procedure repeats until all 64 rows of the transposer have been loaded with processing array data.

2. When data is subsequently transferred from the transposer to storage memory on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:
- The 64-bit unit of data in column 0, rows 63:0 of the transposer is output to DBus(63:0).
- The 64-bit units of data in columns 1 through “n−1” of the transposer are column-shifted to columns 0 through “n−2”.
- A logic “0” is stored in column “n−1”, rows 63:0 of the transposer. This is arbitrary—the state of the transposer doesn't matter after the processing array data has been transferred to storage memory.

This procedure repeats until all “n” columns of transposer data have been transferred to storage memory.

Transposer Exemplary Implementation

FIG. 5 depicts an exemplary implementation of the transposer 26 from FIG. 2 or 4 constructed as a two-way shift register. The diagram illustrates how data is column-shifted through the transposer during data transfers between it and storage memory, and how data is row-shifted through the transposer during data transfers between it and the processing array. More specifically, FIG. 5 shows the two data buses of FIG. 2 and:

- The data byte layout in the transposer after “n” units of 64-bit column data have been transferred from storage memory to the transposer, or after 64 units of n-bit row data have been transferred from the processing array to the transposer.
- How column-shifts are used when writing data into the transposer from DBus(63:0), reading data from the transposer onto DBus(63:0), and/or shifting data through the transposer during data transfers between it and storage memory.
- How row-shifts are used when writing data into the transposer from TBus(n−1:0), reading data from the transposer onto TBus(n−1:0), and/or shifting data through the transposer during data transfers between it and the processing array.

In the first embodiment, the transposer 26 has the same number of columns as a processing array section. In that case, the column pitch of the transposer should match that of the processing array to allow for a simple data connection between them. However, that may be difficult to implement in some transposer and processing array designs without wasting die area, due to the differences in the circuit design of the two blocks.

A second embodiment described below adds a buffer between the transposer and the processing array with the same number of columns as the processing array and allows for the transposer to have fewer columns than the processing array with no particular column pitch requirements. The second embodiment still eliminates the need for column-addressability in the processing array.

SECOND EMBODIMENT

A second embodiment implements a second way to orthogonally transpose data during data transfers between storage memory 22 and the processing array 30 by using the transposer 26 and a buffer 29 between them as shown in FIG. 6A. The transposer 26 facilitates the orthogonal data transposition (as in the first embodiment), and the buffer 29 buffers data transfers between the transposer 26 and the processing array 30. In this depiction, a 64-bit data bus, such as Dbus(63:0) for example shown in FIG. 6A, facilitates data transfers between storage memory 22 and the transposer 26, a 64-bit data bus, TBus(63:0) for example shown in FIG. 6A, facilitates data transfers between the transposer 26 and the buffer 29, and an n-bit data bus, PBus(n−1:0) for example shown in FIG. 6A, facilitates data transfers between the buffer 29 and the processing array 30.

In the second embodiment, the transposer 26 may have fewer columns than the processing array section 30 while the buffer 29 has the same number of columns as a processing array section 30 as shown in FIG. 6B, thereby eliminating the need for column-addressability in the processing array (as in the first embodiment). The buffer 29 is a simpler circuit than both the transposer 26 and the processing array 30, and therefore easier to implement with the same column pitch as the processing array without wasting die area. In addition, the buffer 29 is a conventional storage circuit (unlike the processing array) so that it is easier to support the column-addressability needed therein to facilitate data transfers between it and the (fewer-column) transposer.

When the transposer 26 and buffer 29 are utilized in this manner, data transfers between storage memory 22 and the processing array 30 are accomplished in three steps:

1. The data transfer between storage memory 22 and the transposer 26.

2. The data transfer between the transposer 26 and the buffer 29.

3. The data transfer between the buffer 29 and the processing array 30.

The execution order of the three steps depends on the direction of data transfer, storage memory→processing array or processing array→storage memory.

In the second embodiment, as shown in FIG. 6C, the transposer 26 is constructed as a two-way shift register array with 64 rows (to match the width of the data bus that connects it storage memory, as in the first embodiment) and 64 columns. The number of columns “64” is a whole fraction of the number of columns “n” in the buffer, and establishes the width of the data bus “TBus” that connects the transposer to the buffer. As shown in FIG. 6C, the buffer 29 is constructed as an array of storage elements with 64 rows to match the number of rows in the transposer 26, and “n” columns to match the number of columns in a processing array section 30 and establish the width of the data bus “PBus” that connects the buffer to the processing array.

During Storage Memory→Processing Array Data Transfers

1. When data is transferred from storage memory to the transposer on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:

- The 64-bit unit of transfer data on DBus(63:0) is stored in column 63, rows 63:0 of the transposer.
- The 64-bit units of previously-loaded data in columns 1 through 63 of the transposer are column-shifted to columns 0 through 62.
- The 64-bit unit of previously-loaded data in column 0, rows 63:0 of the transposer is discarded.

This procedure repeats until all 64 columns of the transposer have been loaded with storage memory data.

2. When data is subsequently transferred from the transposer to the buffer on a 64-bit data bus “TBus(63:0)”, the following occurs simultaneously:

- The 64-bit unit of data in row 0, columns 63:0 of the transposer is output to TBus(63:0).
- The 64-bit units of data in rows 1 through 63 of the transposer are row-shifted to rows 0 through 62.
- The 64-bit unit of data in row 0, columns 63:0 of the transposer is stored in row 63, columns 63:0 in the transposer. This facilitates a “row wrap” feature in the transposer.

This procedure repeats until all 64 rows of transposer data have been transferred to the buffer.

The “row wrap” feature means that after all 64 rows of transposer data have been transferred to the buffer, the transposer contains the same data as it did before the transfer started, as if the transposer data had been copied to the buffer. Such an implementation allows for the same transposer data to be copied to multiple 64-column groups in the buffer and, ultimately, in the processing array, without having to reload the transposer from storage memory each time. This is a desirable feature in some use cases.

3. Data is subsequently transferred from the buffer to the processing array row by row, on an n-bit data bus “PBus(n−1:0)”. This is accomplished via conventional means, and beyond the scope of this disclosure.

Steps 1˜3 may repeat until storage memory data has been transferred to all “n” columns of the processing array. Step 1 of the next iteration may overlap with step 3 of the previous iteration—i.e. storage memory to transposer transfers may overlap with buffer to processing array transfers.

During Processing Array→Storage Memory Data Transfers

1. Data is initially transferred from the processing array to the buffer row by row, on an n-bit data bus “PBus(n−1:0)”. This is accomplished via conventional means, and beyond the scope of this disclosure.

2. When data is subsequently transferred from the buffer to the transposer on a 64-bit data bus “TBus(63:0)”, the following occurs simultaneously:

- The 64-bit unit of transfer data on TBus(63:0) is stored in row 63, columns 63:0 of the transposer.
- The 64-bit units of previously-loaded data in rows 1 through 63 of the transposer are row-shifted to rows 0 through 62.
- The 64-bit unit of previously-loaded data in row 0, columns 63:0 of the transposer is discarded.

This procedure repeats until all 64 rows of the transposer have been loaded with processing array data.

3. When data is subsequently transferred from the transposer to storage memory on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:

- The 64-bit unit of data in column 0, rows 63:0 of the transposer is output to DBus(63:0).
- The 64-bit units of data in columns 1 through 63 of the transposer are column-shifted to columns 0 through 62.
- A logic “0” is stored in column 63, rows 63:0 of the transposer. This is arbitrary—it doesn't matter what data resides in the transposer after it has been transferred to storage memory.

This procedure repeats until all 64 columns of transposer data have been sent to storage memory.

Steps 1˜3 may repeat until all “n” columns of processing array data have been transferred to storage memory. Step 1 of the next iteration may overlap with step 3 of the previous iteration—i.e. processing array to buffer transfers may overlap with transposer to storage memory transfers.

Second Embodiment Transposer and Buffer Details

FIG. 6B depicts the transposer 26 constructed as a 64-row by 64-column array of register bits. FIG. 6B illustrates: the initial data byte layout in storage memory, or after “n” units of 64-bit column data have been transferred from the transposer to storage memory; the data byte layout in the transposer after 64 units of 64-bit column data have been transferred from storage memory to the transposer, or after 64 units of 64-bit row data have been transferred from the buffer to the transposer; the data byte layout in the buffer after (n/64)*64 units of 64-bit row data have been transferred from the transposer to the buffer, or after 64 units of n-bit row data have been transferred from the processing array to the buffer; and the data byte layout in the processing array after 64 units of n-bit row data have been transferred from the buffer to the processing array.

FIG. 6C depicts the transposer implemented as a two-way shift register and illustrates:

- The data byte layout in the transposer after 64 units of 64-bit column data have been transferred from storage memory to the transposer, or after 64 units of 64-bit row data have been transferred from the buffer to the transposer.
- The data byte layout in the buffer after 64 units of 64-bit row data have been transferred from the transposer to the buffer.
- How column-shifts are used when writing data into the transposer from DBus(63:0), reading data from the transposer onto DBus(63:0), and/or shifting data through the transposer during data transfers between it and storage memory.
- How row-shifts are used when writing data into the transposer from TBus(63:0), reading data from the transposer onto TBus(63:0), and/or shifting data through the transposer during data transfers between it and the buffer.

FIG. 6D depicts an example of an implementation of the transposer of the second embodiment with detailed circuit of the transposer in FIG. 6C. In this implementation, multiplexers (Muxes) as shown in FIG. 6D are used to select the data input to each register bit in the transposer 26, to facilitate its column-shift and row-shift capabilities. The mux input that is selected as the data input to each register bit depends on whether:

- A “column write” is occurring—i.e. a data transfer from storage memory to the transposer.
- A “column read” is occurring—i.e. a data transfer from the transposer to storage memory.
- A “row write” is occurring—i.e. a data transfer from the buffer to the transposer.
- A “row read” is occurring—i.e. a data transfer from the transposer to the buffer.

For example, for row 63, column 63, a 4:1 mux 600 is utilized to select one of four data input sources:

- DBus(63) is selected during column write.
- A logic “0” is selected during column read.
- TBus(63) is selected during row write.
- Row 0, column 63 is selected during row read.

For rows 62:0, column 63, a 3:1 mux 602 for each row and column pair is utilized to select one of three data input sources:

- For row “x”: DBus(x) is selected during column write.
- For row “x”: a logic “0” is selected during column read.
- For row “x”: row “x+1”, column 63 is selected during row read and row write.

For row 63, columns 62:0, a 3:1 mux 604 for each row and column pair is utilized to select one of three data input sources:

- For column “y”: row 63, column “y+1” is selected during column write and column read.
- For column “y”: TBus(m) is selected during row write.
- For column “y”: row 0, column “y” is selected during row read.

For rows 62:0, columns 62:0, a 2:1 mux for each column and row pair is utilized to select one of two data input sources:

- For row “x”, column “y”: row “x”, column “y+1” is selected during column write and column read.
- For row “x”, column “y”: row “x+1”, column “y” is selected during row write and row read.

In the first and second embodiments, all 64 bits of data captured in the transposer 26 during any single data transfer from storage memory 22 to the transposer 26 are ultimately stored along the same bit line in the processing array 30. That is desirable if the processing array 30 is used to process 64-bit data words. But if the processing array 30 is used to process, say, 32-bit data words, then it is desirable to be able to store each 32-bit data word captured in the transposer during a sequence of data transfers from storage memory to the transposer in a different bit line in the processing array. That is not possible with the transposers described in the first and second embodiments, because each pair of 32-bit data words comprising the 64 bits of data captured in the transposer during any single data transfer from storage memory to the transposer are ultimately stored along the same bit line in the processing array. However, the below disclosed third, fourth, and fifth embodiments disclose implementations of transposers to transpose 32-bit, 16-bit, and 8-bit data words onto separate bit lines in the processing array when the data bus that connects storage memory to the transposer is 64 bits regardless of the data word size.

THIRD EMBODIMENT

FIG. 7 depicts an alternate form of the transposer 26 and buffer 29 constructed as a 32-row by 64-column array of register bits and as a two-way shift register. In the third embodiment, as in the second embodiment, a transposer 26 (with fewer columns that the processing array) and a buffer 29 (with the same number of columns as the processing array) are implemented between storage memory 22 and the processing array 30 as shown in FIG. 6A. The difference is that the transposer 26 in the third embodiment is constructed to transpose 32-bit data words onto separate bit lines in the processing array 30 when the data bus that connects storage memory 22 to the transposer 26 remains 64 bits. Specifically, the transposer is constructed as a two-way shift register array with 32 rows (instead of 64 rows) and 64 columns. In addition, a portion of the buffer 29 is unused since the buffer 29 may be used for different embodiments of the transposer shown in FIGS. 6C and 7-9.

During Storage Memory→Processing Array Data Transfers

1. When data is transferred from storage memory to the transposer on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:

- The 32-bit unit of transfer data on DBus(63:32) is stored in column 63, rows 31:0 of the transposer.
- The 32-bit unit of transfer data on DBus(31:0) is stored in column 62, rows 31:0 of the transposer.
- The 32-bit units of previously-loaded data in columns 3, 5, 7, . . . 63 of the transposer are column-shifted to columns 1, 3, 5, . . . 61.
- The 32-bit units of previously-loaded data in columns 2, 4, 6, . . . 62 of the transposer are column-shifted to columns 0, 2, 3, . . . 60.
- The 32-bit unit of previously-loaded data in column 1, rows 31:0 of the transposer is discarded.
- The 32-bit unit of previously-loaded data in column 0, rows 31:0 of the transposer is discarded.

This procedure repeats until all 64 columns of the transposer have been loaded with storage memory data.

2. When data is subsequently transferred from the transposer to the buffer on a 64-bit data bus “TBus(63:0)”, the following occurs simultaneously:

- The 64-bit unit of data in row 0, columns 63:0 of the transposer is output to TBus(63:0).
- The 64-bit units of data in rows 1 through 31 of the transposer are row-shifted to rows 0 through 30.
- The 64-bit unit of data in row 0, columns 63:0 of the transposer is stored in row 31, columns 63:0 in the transposer. This facilitates the “row wrap” feature in the transposer, as in the second embodiment.

This procedure repeats until all 32 rows of transposer data have been transferred to the buffer.

Note that the buffer may still have 64 rows, as in the second embodiment, but only 32 rows are utilized.

Steps 1˜3 may repeat until storage memory data has been transferred to all “n” columns of the processing array.

During Processing Array→Storage Memory Data Transfers

2. When data is subsequently transferred from the buffer to the transposer on a 64-bit data bus “TBus(63:0)”, the following occurs simultaneously:

- The 64-bit unit of transfer data on TBus(63:0) is stored in row 31, columns 63:0 of the transposer.
- The 64-bit units of previously-loaded data in rows 1 through 31 of the transposer are row-shifted to rows 0 through 30.
- The 64-bit unit of previously-loaded data in row 0, columns 63:0 of the transposer is discarded.

This procedure repeats until all 32 rows of the transposer have been loaded with processing array data.

3. When data is subsequently transferred from the transposer to storage memory on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:

- The 32-bit unit of data in column 1, rows 31:0 of the transposer is output to DBus(63:32).
- The 32-bit unit of data in column 0, rows 31:0 of the transposer is output to DBus(31:0).
- The 32-bit units of data in columns 3, 5, 7, . . . 63 of the transposer are column-shifted to columns 1, 3, 5, . . . 61.
- The 32-bit units of data in columns 2, 4, 6, . . . 62 of the transposer are column-shifted to columns 0, 2, 4, . . . 60.
- A logic “0” is stored in column 63, rows 31:0 of the transposer.
- A logic “0” is stored in column 62, rows 31:0 of the transposer.

This procedure repeats until all 64 columns of transposer data have been sent to storage memory.

Steps 1˜3 may repeat until all “n” columns of processing array data have been transferred to storage memory.

FIG. 7 depicts the transposer constructed as a 32-row by 64-column array of register bits, and as a two-way shift register. It illustrates:

- The data byte layout in the transposer after 32 units of 64-bit column data have been transferred from storage memory to the transposer, or after 32 units of 64-bit row data have been transferred from the buffer to the transposer.
- The data byte layout in the buffer after 32 units of 64-bit row data have been transferred from the transposer to the buffer.
- How column-shifts are used when writing data into the transposer from DBus(63:0), reading data from the transposer onto DBus(63:0), and/or shifting data through the transposer during data transfers between it and storage memory.
- How row-shifts are used when writing data into the transposer from TBus(63:0), reading data from the transposer onto TBus(63:0), and/or shifting data through the transposer during data transfers between it and the buffer.

FOURTH EMBODIMENT

FIG. 8 depicts an alternate form of the transposer depicted in FIGS. 5 and 6C, constructed as a 16-row by 64-column array of register bits, and as a two-way shift register. In the fourth embodiment, as in the second embodiment, a transposer (with fewer columns that the processing array) and a buffer (with the same number of columns as the processing array) are implemented between storage memory and the processing array. The difference is, the transposer in the fourth embodiment is constructed to transpose 16-bit data words onto separate bit lines in the processing array when the data bus that connects storage memory to the transposer remains 64 bits. Specifically, the transposer is constructed as a two-way shift register array with 16 rows (instead of 64 rows) and 64 columns. In addition, a larger portion of the buffer 29 is unused than with the third embodiment since this embodiment is 16 bits instead of 32 bits.

During Storage Memory→Processing Array Data Transfers

1. When data is transferred from storage memory to the transposer on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:

- The 16-bit unit of transfer data on DBus(63:48) is stored in column 63, rows 15:0 of the transposer.
- The 16-bit unit of transfer data on DBus(47:32) is stored in column 62, rows 15:0 of the transposer.
- The 16-bit unit of transfer data on DBus(31:16) is stored in column 61, rows 15:0 of the transposer.
- The 16-bit unit of transfer data on DBus(15:0) is stored in column 60, rows 15:0 of the transposer.
- The 16-bit units of previously-loaded data in columns 7, 11, 15, . . . 63 of the transposer are column-shifted to columns 3, 7, 11, . . . 59.
- The 16-bit units of previously-loaded data in columns 6, 10, 14, . . . 62 of the transposer are column-shifted to columns 2, 6, 10, . . . 58.
- The 16-bit units of previously-loaded data in columns 5, 9, 13, . . . 61 of the transposer are column-shifted to columns 1, 5, 9, . . . 57.
- The 16-bit units of previously-loaded data in columns 4, 8, 12, . . . 60 of the transposer are column-shifted to columns 0, 4, 9, . . . 56.
- The 16-bit unit of previously-loaded data in column 3, rows 15:0 of the transposer is discarded.
- The 16-bit unit of previously-loaded data in column 2, rows 15:0 of the transposer is discarded.
- The 16-bit unit of previously-loaded data in column 1, rows 15:0 of the transposer is discarded.
- The 16-bit unit of previously-loaded data in column 0, rows 15:0 of the transposer is discarded.

This procedure repeats until all 64 columns of the transposer have been loaded with storage memory data.

2. When data is subsequently transferred from the transposer to the buffer on a 64-bit data bus “TBus(63:0)”, the following occurs simultaneously:

- The 64-bit unit of data in row 0, columns 63:0 of the transposer is output to TBus(63:0).
- The 64-bit units of data in rows 1 through 15 of the transposer are row-shifted to rows 0 through 14.
- The 64-bit unit of data in row 0, columns 63:0 of the transposer is stored in row 15, columns 63:0 in the transposer. This facilitates the “row wrap” feature in the transposer, as in the second embodiment.

This procedure repeats until all 16 rows of transposer data have been transferred to the buffer.

Note that the buffer may still have 64 rows, as in the second embodiment, but only 16 rows are utilized.

Steps 1˜3 may repeat until storage memory data has been transferred to all “n” columns of the processing array.

During Processing Array→Storage Memory Data Transfers

2. When data is subsequently transferred from the buffer to the transposer on a 64-bit data bus “TBus(63:0)”, the following occurs simultaneously:

- The 64-bit unit of transfer data on TBus(63:0) is stored in row 15, columns 63:0 of the transposer.
- The 64-bit units of previously-loaded data in rows 1 through 15 of the transposer are row-shifted to rows 0 through 14.
- The 64-bit unit of previously-loaded data in row 0, columns 63:0 of the transposer is discarded.

This procedure repeats until all 16 rows of the transposer have been loaded with processing array data.

3. When data is subsequently transferred from the transposer to storage memory on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:

- The 16-bit unit of data in column 3, rows 15:0 of the transposer is output to DBus(63:48).
- The 16-bit unit of data in column 2, rows 15:0 of the transposer is output to DBus(47:32).
- The 16-bit unit of data in column 1, rows 15:0 of the transposer is output to DBus(31:16).
- The 16-bit unit of data in column 0, rows 15:0 of the transposer is output to DBus(15:0).
- The 16-bit units of data in columns 7, 11, 15, . . . 63 of the transposer are column-shifted to columns 3, 7, 11, . . . 59.
- The 16-bit units of data in columns 6, 10, 14, . . . 62 of the transposer are column-shifted to columns 2, 6, 10, . . . 58.
- The 16-bit units of data in columns 5, 9, 13, . . . 61 of the transposer are column-shifted to columns 1, 5, 9, . . . 57.
- The 16-bit units of data in columns 4, 8, 12, . . . 60 of the transposer are column-shifted to columns 0, 4, 9, . . . 56.
- A logic “0” is stored in column 63, rows 15:0 of the transposer.
- A logic “0” is stored in column 62, rows 15:0 of the transposer.
- A logic “0” is stored in column 61, rows 15:0 of the transposer.
- A logic “0” is stored in column 60, rows 15:0 of the transposer.

This procedure repeats until all 64 columns of transposer data have been sent to storage memory.

Steps 1˜3 may repeat until all “n” columns of processing array data have been transferred to storage memory.

FIFTH EMBODIMENT

FIG. 9 depicts an alternate form of the transposer depicted in FIGS. 5 and 6C, constructed as an 8-row by 64-column array of register bits, and as a two-way shift register. In the fifth embodiment, as in the second embodiment, a transposer (with fewer columns that the processing array) and a buffer (with the same number of columns as the processing array) are implemented between storage memory and the processing array. The difference is, the transposer in the fifth embodiment is constructed to transpose 8-bit data words onto separate bit lines in the processing array when the data bus that connects storage memory to the transposer remains 64 bits. Specifically, the transposer is constructed as a two-way shift register array with 8 rows (instead of 64 rows) and 64 columns. In addition, a still larger portion of the buffer 29 is unused than with the third or fourth embodiments since this embodiment is 8 bits instead of 32 bits or 16 bits.

During Storage Memory→Processing Array Data Transfers

1. When data is transferred from storage memory to the transposer on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:
- The 8-bit unit of transfer data on DBus(63:56) is stored in column 63, rows 7:0 of the transposer.
- The 8-bit unit of transfer data on DBus(55:48) is stored in column 62, rows 7:0 of the transposer.
- The 8-bit unit of transfer data on DBus(47:40) is stored in column 61, rows 7:0 of the transposer.
- The 8-bit unit of transfer data on DBus(39:32) is stored in column 60, rows 7:0 of the transposer.
- The 8-bit unit of transfer data on DBus(31:24) is stored in column 59, rows 7:0 of the transposer.
- The 8-bit unit of transfer data on DBus(23:16) is stored in column 58, rows 7:0 of the transposer.
- The 8-bit unit of transfer data on DBus(15:8) is stored in column 57, rows 7:0 of the transposer.
- The 8-bit unit of transfer data on DBus(7:0) is stored in column 56, rows 7:0 of the transposer.
- The 8-bit units of previously-loaded data in columns 15, 23, 31, . . . 63 of the transposer are column-shifted to columns 7, 15, 23, . . . 55.
- The 8-bit units of previously-loaded data in columns 14, 22, 30, . . . 62 of the transposer are column-shifted to columns 6, 14, 22, . . . 54.
- The 8-bit units of previously-loaded data in columns 13, 21, 29, . . . 61 of the transposer are column-shifted to columns 5, 13, 21, . . . 53.
- The 8-bit units of previously-loaded data in columns 12, 20, 28, . . . 60 of the transposer are column-shifted to columns 4, 12, 20, . . . 52.
- The 8-bit units of previously-loaded data in columns 11, 19, 27, . . . 59 of the transposer are column-shifted to columns 3, 11, 19, . . . 51.
- The 8-bit units of previously-loaded data in columns 10, 18, 26, . . . 58 of the transposer are column-shifted to columns 2, 10, 18, . . . 50.
- The 8-bit units of previously-loaded data in columns 9, 17, 25, . . . 57 of the transposer are column-shifted to columns 1, 9, 17, . . . 49.
- The 8-bit units of previously-loaded data in columns 8, 16, 24, . . . 56 of the transposer are column-shifted to columns 0, 8, 16, . . . 48.
- The 8-bit unit of previously-loaded data in column 7, rows 7:0 of the transposer is discarded.
- The 8-bit unit of previously-loaded data in column 6, rows 7:0 of the transposer is discarded.
- The 8-bit unit of previously-loaded data in column 5, rows 7:0 of the transposer is discarded.
- The 8-bit unit of previously-loaded data in column 4, rows 7:0 of the transposer is discarded.
- The 8-bit unit of previously-loaded data in column 3, rows 7:0 of the transposer is discarded.
- The 8-bit unit of previously-loaded data in column 2, rows 7:0 of the transposer is discarded.
- The 8-bit unit of previously-loaded data in column 1, rows 7:0 of the transposer is discarded.
- The 8-bit unit of previously-loaded data in column 0, rows 7:0 of the transposer is discarded.
  
  This procedure repeats until all 64 columns of the transposer have been loaded with storage memory data.

2. When data is subsequently transferred from the transposer to the buffer on a 64-bit data bus “TBus(63:0)”, the following occurs simultaneously:
- The 64-bit unit of data in row 0, columns 63:0 of the transposer is output to TBus(63:0).
- The 64-bit units of data in rows 1 through 7 of the transposer are row-shifted to rows 0 through 6.
- The 64-bit unit of data in row 0, columns 63:0 of the transposer is stored in row 7, columns 63:0 in the transposer. This facilitates the “row wrap” feature in the transposer, as in the second embodiment.
  
  This procedure repeats until all 8 rows of transposer data have been transferred to the buffer.
  
  Note that the buffer may still have 64 rows, as in the second embodiment, but only 8 rows are utilized.

3. Data is subsequently transferred from the buffer to the processing array row by row, on an n-bit data bus “PBus(n−1:0)”. This is accomplished via conventional means, and beyond the scope of this disclosure.

Steps 1˜3 may repeat until storage memory data has been transferred to all “n” columns of the processing array.

During Processing Array→Storage Memory Data Transfers

1. Data is initially transferred from the processing array to the buffer row by row, on an n-bit data bus “PBus(n−1:0)”. This is accomplished via conventional means, and beyond the scope of this disclosure.

2. When data is subsequently transferred from the buffer to the transposer on a 64-bit data bus “TBus(63:0)”, the following occurs simultaneously:
- The 64-bit unit of transfer data on TBus(63:0) is stored in row 7, columns 63:0 of the transposer.
- The 64-bit units of previously-loaded data in rows 1 through 7 of the transposer are row-shifted to rows 0 through 6.
- The 64-bit unit of previously-loaded data in row 0, columns 63:0 of the transposer is discarded.

This procedure repeats until all 8 rows of the transposer have been loaded with processing array data.

3. When data is subsequently transferred from the transposer to storage memory on a 64-bit data bus “DBus(63:0)”, the following occurs simultaneously:
- The 8-bit unit of data in column 7, rows 7:0 of the transposer is output to DBus(63:56).
- The 8-bit unit of data in column 6, rows 7:0 of the transposer is output to DBus(55:48).
- The 8-bit unit of data in column 5, rows 7:0 of the transposer is output to DBus(47:40).
- The 8-bit unit of data in column 4, rows 7:0 of the transposer is output to DBus(39:32).
- The 8-bit unit of data in column 3, rows 7:0 of the transposer is output to DBus(31:24).
- The 8-bit unit of data in column 2, rows 7:0 of the transposer is output to DBus(23:16).
- The 8-bit unit of data in column 1, rows 7:0 of the transposer is output to DBus(15:8).
- The 8-bit unit of data in column 0, rows 7:0 of the transposer is output to DBus(7:0).
- The 8-bit units of data in columns 15, 23, 31, . . . 63 of the transposer are column-shifted to columns 7, 15, 23, . . . 55.
- The 8-bit units of data in columns 14, 22, 30, . . . 62 of the transposer are column-shifted to columns 6, 14, 22, . . . 54.
- The 8-bit units of data in columns 13, 21, 29, . . . 61 of the transposer are column-shifted to columns 5, 13, 21, . . . 53.
- The 8-bit units of data in columns 12, 20, 28, . . . 60 of the transposer are column-shifted to columns 4, 12, 20, . . . 52.
- The 8-bit units of data in columns 11, 19, 27, . . . 59 of the transposer are column-shifted to columns 3, 11, 19, . . . 51.
- The 8-bit units of data in columns 10, 18, 26, . . . 58 of the transposer are column-shifted to columns 2, 10, 18, . . . 50.
- The 8-bit units of data in columns 9, 17, 25, . . . 57 of the transposer are column-shifted to columns 1, 9, 17, . . . 49.
- The 8-bit units of data in columns 8, 16, 24, . . . 56 of the transposer are column-shifted to columns 0, 8, 16, . . . 48.
- A logic “0” is stored in column 63, rows 7:0 of the transposer.
- A logic “0” is stored in column 62, rows 7:0 of the transposer.
- A logic “0” is stored in column 61, rows 7:0 of the transposer.
- A logic “0” is stored in column 60, rows 7:0 of the transposer.
- A logic “0” is stored in column 59, rows 7:0 of the transposer.
- A logic “0” is stored in column 58, rows 7:0 of the transposer.
- A logic “0” is stored in column 57, rows 7:0 of the transposer.
- A logic “0” is stored in column 56, rows 7:0 of the transposer.
- This procedure repeats until all 64 columns of transposer data have been sent to storage memory.
  
  Steps 1˜3 may repeat until all “n” columns of processing array data have been transferred to storage memory.

The multiple embodiments of the transposers 26 (e.g. embodiments 2, 3, 4, and 5) may be implemented in a single design in which a processing array is implemented. In this case only one transposer is enabled/selected during any particular storage memory <−> processing array data transfer.

The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical applications, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated.

The system and method disclosed herein may be implemented via one or more components, systems, servers, appliances, other subcomponents, or distributed between such elements. When implemented as a system, such systems may include an/or involve, inter alia, components such as software modules, general-purpose CPU, RAM, etc. found in general-purpose computers. In implementations where the innovations reside on a server, such a server may include or involve components such as CPU, RAM, etc., such as those found in general-purpose computers.

Additionally, the system and method herein may be achieved via implementations with disparate or entirely different software, hardware and/or firmware components, beyond that set forth above. With regard to such other components (e.g., software, processing components, etc.) and/or computer-readable media associated with or embodying the present inventions, for example, aspects of the innovations herein may be implemented consistent with numerous general purpose or special purpose computing systems or configurations. Various exemplary computing systems, environments, and/or configurations that may be suitable for use with the innovations herein may include, but are not limited to: software or other components within or embodied on personal computers, servers or server computing devices such as routing/connectivity components, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, consumer electronic devices, network PCs, other existing computer platforms, distributed computing environments that include one or more of the above systems or devices, etc.

In some instances, aspects of the system and method may be achieved via or performed by logic and/or logic instructions including program modules, executed in association with such components or circuitry, for example. In general, program modules may include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular instructions herein. The inventions may also be practiced in the context of distributed software, computer, or circuit settings where circuitry is connected via communication buses, circuitry or links. In distributed settings, control/instructions may occur from both local and remote computer storage media including memory storage devices.

The software, circuitry and components herein may also include and/or utilize one or more type of computer readable media. Computer readable media can be any available media that is resident on, associable with, or can be accessed by such circuits and/or computing components. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and can accessed by computing component. Communication media may comprise computer readable instructions, data structures, program modules and/or other components. Further, communication media may include wired media such as a wired network or direct-wired connection, however no media of any such type herein includes transitory media. Combinations of the any of the above are also included within the scope of computer readable media.

In the present description, the terms component, module, device, etc. may refer to any type of logical or functional software elements, circuits, blocks and/or processes that may be implemented in a variety of ways. For example, the functions of various circuits and/or blocks can be combined with one another into any other number of modules. Each module may even be implemented as a software program stored on a tangible memory (e.g., random access memory, read only memory, CD-ROM memory, hard disk drive, etc.) to be read by a central processing unit to implement the functions of the innovations herein. Or, the modules can comprise programming instructions transmitted to a general purpose computer or to processing/graphics hardware via a transmission carrier wave. Also, the modules can be implemented as hardware logic circuitry implementing the functions encompassed by the innovations herein. Finally, the modules can be implemented using special purpose instructions (SIMD instructions), field programmable logic arrays or any mix thereof which provides the desired level performance and cost.

As disclosed herein, features consistent with the disclosure may be implemented via computer-hardware, software and/or firmware. For example, the systems and methods disclosed herein may be embodied in various forms including, for example, a data processor, such as a computer that also includes a database, digital electronic circuitry, firmware, software, or in combinations of them. Further, while some of the disclosed implementations describe specific hardware components, systems and methods consistent with the innovations herein may be implemented with any combination of hardware, software and/or firmware. Moreover, the above-noted features and other aspects and principles of the innovations herein may be implemented in various environments. Such environments and related applications may be specially constructed for performing the various routines, processes and/or operations according to the invention or they may include a general-purpose computer or computing platform selectively activated or reconfigured by code to provide the necessary functionality. The processes disclosed herein are not inherently related to any particular computer, network, architecture, environment, or other apparatus, and may be implemented by a suitable combination of hardware, software, and/or firmware. For example, various general-purpose machines may be used with programs written in accordance with teachings of the invention, or it may be more convenient to construct a specialized apparatus or system to perform the required methods and techniques.

Aspects of the method and system described herein, such as the logic, may also be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (“PLDs”), such as field programmable gate arrays (“FPGAs”), programmable array logic (“PAL”) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits. Some other possibilities for implementing aspects include: memory devices, microcontrollers with memory (such as EEPROM), embedded microprocessors, firmware, software, etc. Furthermore, aspects may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. The underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (“MOSFET”) technologies like complementary metal-oxide semiconductor (“CMOS”), bipolar technologies like emitter-coupled logic (“ECL”), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, and so on.

It should also be noted that the various logic and/or functions disclosed herein may be enabled using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) though again does not include transitory media. Unless the context clearly requires otherwise, throughout the description, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.

Although certain presently preferred implementations of the invention have been specifically described herein, it will be apparent to those skilled in the art to which the invention pertains that variations and modifications of the various implementations shown and described herein may be made without departing from the spirit and scope of the invention. Accordingly, it is intended that the invention be limited only to the extent required by the applicable rules of law.

While the foregoing has been with reference to a particular embodiment of the disclosure, it will be appreciated by those skilled in the art that changes in this embodiment may be made without departing from the principles and spirit of the disclosure, the scope of which is defined by the appended claims.

Claims

1. An apparatus, comprising: a storage memory having an array of memory cells, wherein a unit of data is stored in a plurality of memory cells connected to a same word line with each memory cell connected to a different bit line;a processing array device, connected to the storage memory, comprising a plurality of memory cells arranged in an array having a plurality of columns and a plurality of rows, each memory cell having a storage element wherein the array has a plurality of sections and each section has a plurality of bit line sections and a plurality of bit lines with one bit line per bit line section, wherein the memory cells in each bit line section are all connected to a single read bit line to perform a computation using the unit of data communicated between the processing array device and the storage memory;a data path between the storage memory and the processing array device having a transposer that orthogonally transposes the unit of data transferred between the storage memory and the processing array and wherein the transfer of the unit of data between the storage memory and the processing array device is performed in two steps, wherein the transposer shifts the unit of data from being stored along the same word line in the storage memory to the unit of data being stored along the same bit line in the processing array device; andwherein the processing array device stores the unit of data in memory cells connected to the single read bit line in a particular section of the processing array device or connected to a same relative bit line in a plurality of sections of the processing array device and each memory cell is connected to a different word line.
2. The apparatus of claim 1, wherein the data transfer from the storage memory to the processing array device further comprises a first data transfer between the storage memory and the transposer and a second data transfer between the transposer and the processing array device.
3. The apparatus of claim 1, wherein the processing array device and the transposer are integrated into a single processing unit.
4. The apparatus of claim 1, wherein the transposer is a two-dimensional storage array block.
5. The apparatus of claim 1 further comprising a data path between the storage memory and the processing array device having a transposer that orthogonally transposes the unit of data transferred between the storage memory and the processing array and a two-dimensional buffer that buffers the unit of data transfers between the transposer and the processing array device and wherein a transfer of the unit of data between the storage memory and the processing array device is performed in three steps.
6. The apparatus of claim 5, wherein the data transfer from the storage memory to the processing array device comprising a first data transfer between the storage memory and the transposer, a second data transfer between the transposer and the buffer and a third data transfer between the buffer and the processing array device.
7. The apparatus of claim 6, wherein the transposer is a two-way shift register array with “x” rows and “y” columns, wherein a number of “x” rows is equal to a width of a data bus that connects the transposer to the storage memory and wherein a number of “y” columns is a whole fraction of a number of columns in the processing array device section.
8. The apparatus of claim 7, wherein the width of the data bus that connects the transposer to the storage memory is sixty four bits, wherein the buffer has an array of storage elements with a number of rows that match the width of the data bus that connects the transposer and storage memory and a number of columns that match the “n” columns in the processing array device section, wherein a width of the data bus that connects the transposer to the buffer is equal to the numbers of columns “y” in the transposer and wherein a width of the data bus that connects the buffer to the processing array device is equal to the number of columns “n” in the buffer and in a section of the processing array device.
9. The apparatus of claim 7, wherein the transposer is constructed to transpose a thirty two bit unit of data from storage memory onto separate bit lines in the processing array device.
10. The apparatus of claim 7, wherein the transposer is constructed to transpose a sixteen bit unit of data from storage memory onto separate bit lines in the processing array device.
11. The apparatus of claim 7, wherein the transposer is constructed to transpose an eight bit unit of data from storage memory onto separate bit lines in the processing array device.
12. The apparatus of claim 5, wherein the transposer further comprises one or more transposers wherein each transposer is one of a transposer that transposes a sixty-four bit unit of data, a transposer that transposes a thirty two bit unit of data, a transposer that transposes a sixteen bit unit of data and a transposer that transposes an eight bit unit of data.
13. An apparatus, comprising: a storage memory having an array of memory cells, wherein a unit of data is stored in a plurality of memory cells connected to a same word line with each memory cell connected to a different bit line;a processing array device, connected to the storage memory, comprising a plurality of memory cells arranged in an array having a plurality of columns and a plurality of rows, each memory cell having a storage element wherein the array has a plurality of sections and each section has a plurality of bit line sections and a plurality of bit lines with one bit line per bit line section, wherein the memory cells in each bit line section are all connected to a single read bit line to perform a computation using the unit of data communicated between the processing array device and the storage memory;a data path between the storage memory and the processing array device having a transposer that orthogonally transposes the unit of data transferred between the storage memory and the processing array and wherein the transfer of the unit of data between the storage memory and the processing array device is performed in two steps, wherein the transposer is a two-way shift register array with “x” rows and “y” columns, wherein a number of “x” rows is equal to a width of a data bus that connects the transposer to the storage memory and wherein a number of “y” columns is equal to a number of bit lines in the processing array device; andwherein the processing array device stores the unit of data in memory cells connected to the single read bit line in a particular section of the processing array device or connected to a same relative bit line in a plurality of sections of the processing array device and each memory cell is connected to a different word line.
14. The apparatus of claim 13, wherein the width of the data bus that connects the transposer to the storage memory is sixty four bits and wherein a width of a data bus that connects the transposer to the processing array device is equal to the “y” columns in the transposer and in the processing array device section.
15. A method, comprising: providing a storage memory having an array of memory cells, wherein a unit of data is stored in a plurality of memory cells connected to a same word line with each memory cell connected to a different bit line;connecting a processing array device to the storage memory, the processing array device comprising a plurality of memory cells arranged in an array having a plurality of columns and a plurality of rows, each memory cell having a storage element wherein the array has a plurality of sections and each section has a plurality of bit line sections and a plurality of bit lines with one bit line per bit line section, wherein the memory cells in each bit line section are all connected to a single read bit line to perform a computation using a unit of data; andcommunicating the unit of data between the processing array device and the storage memory, wherein the processing array device stores the unit of data in memory cells connected to the single read bit line in a particular section of the processing array device or connected to a same relative bit line in a plurality of sections of the processing array device and each memory cell is connected to a different word line, wherein communicating the unit of data between the processing array device and the storage memory further comprises orthogonally transposing, using a transposer, the unit of data transferred between the storage memory and the processing array and wherein the transfer of the unit of data between the storage memory and the processing array device is performed in two steps and wherein orthogonally transposing the unit of data further comprises shifting the unit of data from being stored along the same word line in the storage memory to the unit of data being stored along the same bit line in the processing array device.
16. The method of claim 15, wherein communicating the unit of data word between the processing array device and the storage memory further comprises transferring, for a transfer from storage memory to the processing array device, the unit of data from the storage memory to transposer and transferring, for a transfer from storage memory to the processing array device, from the transposer to the processing array device.
17. The method of claim 16, wherein transferring, for the transfer from the storage memory to the transposer, the unit of data further comprises transferring, a sixty four bit unit of data on a 64 bit data bus between the storage memory and the transposer into column n−1 and rows 63:0 of the transposer, simultaneously column shifting, in the transposer, sixty four bit units of data stored in column 1:n−1 to column 0:n−2 and simultaneously discarding the sixty four bit unit of data stored in column 0 rows 63:0 of the transposer until all columns of the transposer are loaded with data from the storage memory.
18. The method of claim 16, wherein transferring, for the transfer from the transposer to the processing array device further comprises transferring, an n bit unit of data from row 0 columns n−1:0 of the transposer on an n bit data bus between the transposer and the processing array device, simultaneously row shifting, in the transposer, n bit units of data in row 1:63 in the transposer to rows 0:62 in the transposer and simultaneously storing a logic “0” into row 63, columns n−1:0 of the transposer until all rows of the transposer are transferred to the processing array device.
19. The method of claim 15, wherein communicating the unit of data between the processing array device and the storage memory further comprises transferring, for a transfer from the processing array device to the storage memory, the unit of data from the processing array device to the transposer and transferring, for a transfer from the processing array device to the storage memory, from the transposer to the storage memory.
20. The method of claim 19, wherein transferring, for the transfer from the processing array device to the transposer, further comprises transferring, an n bit unit of data on a n-bit data bus between the processing array device and the transposer into row 63, columns n−1:0 of the transposer, simultaneously row shifting, in the transposer, the n bit units of data stored in row 1:63 of the transposer to rows 0:62 of the transposer and simultaneously discarding the n-bit unit of data stored in row 0, columns n−1:0 of the transposer until all rows of the transposer are loaded with data from the processing array device.
21. The method of claim 19, wherein transferring, for the transfer from the transposer to the storage memory further comprises transferring, a sixty four bit unit of data in column 0, rows 63:0 of the transposer on a 64 bit data bus between the transposer and the storage memory, simultaneously column shifting, in the transposer, sixty four bit units of data in columns 1:n−1 of the transposer to columns 0:n−2 of the transposer and simultaneously storing a logic “0” into a set of rows of the transposer until all n columns of the transposer are transferred to the storage memory.
22. A method, comprising: providing a storage memory having an array of memory cells, wherein a unit of data is stored in a plurality of memory cells connected to a same word line with each memory cell connected to a different bit line;connecting a processing array device to the storage memory, the processing array device comprising a plurality of memory cells arranged in an array having a plurality of columns and a plurality of rows, each memory cell having a storage element wherein the array has a plurality of sections and each section has a plurality of bit line sections and a plurality of bit lines with one bit line per bit line section, wherein the memory cells in each bit line section are all connected to a single read bit line to perform a computation using a unit of data; andcommunicating the unit of data between the processing array device and the storage memory, wherein the processing array device stores the unit of data in memory cells connected to the single bit line in a particular section of the processing array device or connected to a same relative bit line in a plurality of sections of the processing array device and each memory cell is connected to a different word line, wherein communicating the unit of data between the processing array device and the storage memory further comprises orthogonally transposing, using a transposer, the unit of data transferred between the storage memory and the processing array and wherein the transfer of the unit of data between the storage memory and the processing array device is performed in two steps, wherein the transposer is a two-way shift register array with “x” rows and “y” columns, wherein a number of “x” rows is equal to a width of a data bus that connects the transposer to the storage memory and wherein a number of “y” columns is equal to a number of bit lines in the processing array device.

PRIORITY CLAIM/RELATED APPLICATIONS

This application is a continuation in part of and claims priority under 35 USC 120 to U.S. patent application Ser. No. 15/709,399, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells For Xor And Xnor Computations”, U.S. patent application Ser. No. 15/709,401, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells For Xor And Xnor Computations”, U.S. patent application Ser. No. 15/709,379, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells”, U.S. patent application Ser. No. 15/709,382, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells”, and U.S. patent application Ser. No. 15/709,385, filed Sep. 19, 2017 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells” that in turn claim priority under 35 USC 119(e) and 120 and claim the benefit of U.S. Provisional Patent Application No. 62/430,767, filed Dec. 6, 2016 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells For Xor And Xnor Computations” and U.S. Provisional Patent Application No. 62/430,762, filed Dec. 6, 2016 and entitled “Computational Dual Port Sram Cell And Processing Array Device Using The Dual Port Sram Cells”, the entirety of all of which are incorporated herein by reference.

US Referenced Citations (396)

Number	Name	Date	Kind
3451694	Hass	Jun 1969	A
3747952	Graebe	Jul 1973	A
3795412	John	Mar 1974	A
4227717	Bouvier	Oct 1980	A
4308505	Messerschmitt	Dec 1981	A
4587496	Wolaver	May 1986	A
4594564	Yarborough, Jr.	Jun 1986	A
4677394	Vollmer	Jun 1987	A
4716322	D'Arrigo et al.	Dec 1987	A
4741006	Yamaguchi et al.	Apr 1988	A
4856035	Lewis	Aug 1989	A
5008636	Markinson	Apr 1991	A
5302916	Pritchett	Apr 1994	A
5375089	Lo	Dec 1994	A
5382922	Gersbach	Jan 1995	A
5400274	Jones	Mar 1995	A
5473574	Clemen et al.	Dec 1995	A
5530383	May	Jun 1996	A
5535159	Nii	Jul 1996	A
5563834	Longway et al.	Oct 1996	A
5587672	Ranganathan et al.	Dec 1996	A
5608354	Hori	Mar 1997	A
5661419	Bhagwan	Aug 1997	A
5696468	Nise	Dec 1997	A
5736872	Sharma et al.	Apr 1998	A
5744979	Goetting	Apr 1998	A
5744991	Jefferson et al.	Apr 1998	A
5748044	Xue	May 1998	A
5768559	Iino et al.	Jun 1998	A
5805912	Johnson et al.	Sep 1998	A
5883853	Zheng et al.	Mar 1999	A
5937204	Schinnerer	Aug 1999	A
5942949	Wilson et al.	Aug 1999	A
5963059	Partovi et al.	Oct 1999	A
5969576	Trodden	Oct 1999	A
5969986	Wong	Oct 1999	A
5977801	Boerstler	Nov 1999	A
5999458	Nishimura et al.	Dec 1999	A
6005794	Sheffield et al.	Dec 1999	A
6044034	Katakura	Mar 2000	A
6058063	Jang	May 2000	A
6072741	Taylor	Jun 2000	A
6100721	Durec et al.	Aug 2000	A
6100736	Wu et al.	Aug 2000	A
6114920	Moon et al.	Sep 2000	A
6115320	Mick et al.	Sep 2000	A
6133770	Hasegawa	Oct 2000	A
6167487	Camacho	Dec 2000	A
6175282	Yasuda	Jan 2001	B1
6226217	Riedlinger et al.	May 2001	B1
6262937	Arcoleo et al.	Jul 2001	B1
6263452	Jewett et al.	Jul 2001	B1
6265902	Klemmer et al.	Jul 2001	B1
6286077	Choi et al.	Sep 2001	B1
6310880	Waller	Oct 2001	B1
6366524	Abedifard	Apr 2002	B1
6377127	Fukaishi et al.	Apr 2002	B1
6381684	Hronik et al.	Apr 2002	B1
6385122	Chang	May 2002	B1
6407642	Dosho et al.	Jun 2002	B2
6418077	Naven	Jul 2002	B1
6441691	Jones et al.	Aug 2002	B1
6448757	Hill	Sep 2002	B2
6473334	Bailey et al.	Oct 2002	B1
6483361	Chiu	Nov 2002	B1
6504417	Cecchi et al.	Jan 2003	B1
6538475	Johansen et al.	Mar 2003	B1
6567338	Mick	May 2003	B1
6594194	Gold	Jul 2003	B2
6642747	Chiu	Nov 2003	B1
6661267	Walker et al.	Dec 2003	B2
6665222	Wright et al.	Dec 2003	B2
6683502	Groen et al.	Jan 2004	B1
6683930	Dalmia	Jan 2004	B1
6732247	Berg et al.	May 2004	B2
6744277	Chang et al.	Jun 2004	B1
6757854	Zhao	Jun 2004	B1
6789209	Suzuki et al.	Sep 2004	B1
6816019	Delbo' et al.	Nov 2004	B2
6836419	Loughmiller	Dec 2004	B2
6838951	Nieri et al.	Jan 2005	B1
6842396	Kono	Jan 2005	B2
6853696	Moser et al.	Feb 2005	B1
6854059	Gardner	Feb 2005	B2
6856202	Lesso	Feb 2005	B2
6859107	Moon et al.	Feb 2005	B1
6882237	Singh et al.	Apr 2005	B2
6897696	Chang et al.	May 2005	B2
6933789	Molnar et al.	Aug 2005	B2
6938142	Pawlowski	Aug 2005	B2
6940328	Lin	Sep 2005	B2
6954091	Wurzer	Oct 2005	B2
6975554	Lapidus et al.	Dec 2005	B1
6998922	Jensen et al.	Feb 2006	B2
7002404	Gaggl et al.	Feb 2006	B2
7002416	Pettersen et al.	Feb 2006	B2
7003065	Homol et al.	Feb 2006	B2
7017090	Endou et al.	Mar 2006	B2
7019569	Fan-Jiang	Mar 2006	B2
7042271	Chung et al.	May 2006	B2
7042792	Lee et al.	May 2006	B2
7042793	Masuo	May 2006	B2
7046093	McDonagh et al.	May 2006	B1
7047146	Chuang et al.	May 2006	B2
7053666	Tak et al.	May 2006	B2
7095287	Maxim et al.	Aug 2006	B2
7099643	Lin	Aug 2006	B2
7141961	Hirayama et al.	Nov 2006	B2
7142477	Tran et al.	Nov 2006	B1
7152009	Bokui et al.	Dec 2006	B2
7180816	Park	Feb 2007	B2
7200713	Cabot et al.	Apr 2007	B2
7218157	Van De Beek et al.	May 2007	B2
7233214	Kim et al.	Jun 2007	B2
7246215	Lu et al.	Jul 2007	B2
7263152	Miller et al.	Aug 2007	B2
7269402	Uozumi et al.	Sep 2007	B2
7282999	Da Dalt et al.	Oct 2007	B2
7312629	Chuang et al.	Dec 2007	B2
7313040	Chuang et al.	Dec 2007	B2
7330080	Stoiber et al.	Feb 2008	B1
7340577	Van Dyke et al.	Mar 2008	B1
7349515	Chew et al.	Mar 2008	B1
7352249	Balboni et al.	Apr 2008	B2
7355482	Meltzer	Apr 2008	B2
7355907	Chen et al.	Apr 2008	B2
7369000	Wu et al.	May 2008	B2
7375593	Self	May 2008	B2
7389457	Chen et al.	Jun 2008	B2
7439816	Lombaard	Oct 2008	B1
7463101	Tung	Dec 2008	B2
7464282	Abdollahi-Alibeik et al.	Dec 2008	B1
7487315	Hur et al.	Feb 2009	B2
7489164	Madurawe	Feb 2009	B2
7512033	Hur et al.	Mar 2009	B2
7516385	Chen et al.	Apr 2009	B2
7538623	Jensen et al.	May 2009	B2
7545223	Watanabe	Jun 2009	B2
7565480	Ware et al.	Jul 2009	B2
7577225	Azadet et al.	Aug 2009	B2
7592847	Liu et al.	Sep 2009	B2
7595657	Chuang et al.	Sep 2009	B2
7622996	Liu	Nov 2009	B2
7630230	Wong	Dec 2009	B2
7633322	Zhuang et al.	Dec 2009	B1
7635988	Madurawe	Dec 2009	B2
7646215	Chuang et al.	Jan 2010	B2
7646648	Arsovski	Jan 2010	B2
7659783	Tai	Feb 2010	B2
7660149	Liaw	Feb 2010	B2
7663415	Chatterjee et al.	Feb 2010	B2
7667678	Guttag	Feb 2010	B2
7675331	Jung et al.	Mar 2010	B2
7719329	Smith	May 2010	B1
7719330	Lin et al.	May 2010	B2
7728675	Kennedy et al.	Jun 2010	B1
7737743	Gao et al.	Jun 2010	B1
7746181	Moyal	Jun 2010	B1
7746182	Ramaswamy et al.	Jun 2010	B2
7750683	Huang et al.	Jul 2010	B2
7760032	Ardehali	Jul 2010	B2
7760040	Zhang et al.	Jul 2010	B2
7760532	Shirley et al.	Jul 2010	B2
7782655	Shau	Aug 2010	B2
7812644	Cha et al.	Oct 2010	B2
7830212	Lee et al.	Nov 2010	B2
7839177	Soh	Nov 2010	B1
7843239	Sohn et al.	Nov 2010	B2
7843721	Chou	Nov 2010	B1
7848725	Zolfaghari et al.	Dec 2010	B2
7859919	De La Cruz, II et al.	Dec 2010	B2
7876163	Hachigo	Jan 2011	B2
7916554	Pawlowski	Mar 2011	B2
7920409	Clark	Apr 2011	B1
7920665	Lombaard	Apr 2011	B1
7924599	Evans, Jr. et al.	Apr 2011	B1
7940088	Sampath et al.	May 2011	B1
7944256	Masuda	May 2011	B2
7956695	Ding et al.	Jun 2011	B1
7965108	Liu et al.	Jun 2011	B2
8004920	Ito et al.	Aug 2011	B2
8008956	Shin et al.	Aug 2011	B1
8044724	Rao et al.	Oct 2011	B2
8063707	Wang	Nov 2011	B2
8087690	Kim	Jan 2012	B2
8089819	Noda	Jan 2012	B2
8117567	Arsovski	Feb 2012	B2
8174332	Lombaard et al.	May 2012	B1
8218707	Mai	Jul 2012	B2
8242820	Kim	Aug 2012	B2
8258831	Banai	Sep 2012	B1
8284593	Russell	Oct 2012	B2
8294502	Lewis et al.	Oct 2012	B2
8400200	Kim et al.	Mar 2013	B1
8488408	Shu et al.	Jul 2013	B1
8493774	Kung	Jul 2013	B2
8526256	Gosh	Sep 2013	B2
8542050	Chuang et al.	Sep 2013	B2
8575982	Shu et al.	Nov 2013	B1
8593860	Shu et al.	Nov 2013	B2
8625334	Liaw	Jan 2014	B2
8643418	Ma et al.	Feb 2014	B2
8692621	Snowden et al.	Apr 2014	B2
8693236	Shu	Apr 2014	B2
8817550	Oh	Aug 2014	B1
8837207	Jou	Sep 2014	B1
8885439	Shu et al.	Nov 2014	B1
8971096	Jung et al.	Mar 2015	B2
8995162	Sang	Mar 2015	B2
9018992	Shu et al.	Apr 2015	B1
9030893	Jung	May 2015	B2
9053768	Shu et al.	Jun 2015	B2
9059691	Lin	Jun 2015	B2
9070477	Clark	Jun 2015	B1
9083356	Cheng	Jul 2015	B1
9093135	Khailany	Jul 2015	B2
9094025	Cheng	Jul 2015	B1
9135986	Shu	Sep 2015	B2
9142285	Hwang et al.	Sep 2015	B2
9159391	Shu et al.	Oct 2015	B1
9171634	Zheng	Oct 2015	B2
9177646	Arsovski	Nov 2015	B2
9196324	Haig et al.	Nov 2015	B2
9240229	Oh et al.	Jan 2016	B1
9311971	Oh	Apr 2016	B1
9318174	Chuang et al.	Apr 2016	B1
9356611	Shu et al.	May 2016	B1
9384822	Shu et al.	Jul 2016	B2
9385032	Shu	Jul 2016	B2
9396790	Chhabra	Jul 2016	B1
9396795	Jeloka et al.	Jul 2016	B1
9401200	Chan	Jul 2016	B1
9412440	Shu et al.	Aug 2016	B1
9413295	Chang	Aug 2016	B1
9431079	Shu et al.	Aug 2016	B1
9443575	Yabuuchi	Sep 2016	B2
9484076	Shu et al.	Nov 2016	B1
9494647	Chuang et al.	Nov 2016	B1
9552872	Jung	Jan 2017	B2
9608651	Cheng	Mar 2017	B1
9613670	Chuang et al.	Apr 2017	B2
9613684	Shu et al.	Apr 2017	B2
9679631	Haig et al.	Jun 2017	B2
9685210	Ghosh et al.	Jun 2017	B1
9692429	Chang et al.	Jun 2017	B1
9697890	Wang	Jul 2017	B1
9722618	Cheng	Aug 2017	B1
9729159	Cheng	Aug 2017	B1
9789840	Farooq	Oct 2017	B2
9804856	Oh et al.	Oct 2017	B2
9847111	Shu et al.	Dec 2017	B2
9853633	Cheng et al.	Dec 2017	B1
9853634	Chang	Dec 2017	B2
9859902	Chang	Jan 2018	B2
9916889	Duong	Mar 2018	B1
9935635	Kim et al.	Apr 2018	B2
9966118	Shu et al.	May 2018	B2
10065594	Fukawatase	Sep 2018	B2
10153042	Ehrman	Dec 2018	B2
10192592	Shu et al.	Jan 2019	B2
10249312	Kim et al.	Apr 2019	B2
10249362	Shu	Apr 2019	B2
10388364	Ishizu et al.	Aug 2019	B2
10425070	Chen et al.	Sep 2019	B2
10521229	Shu et al.	Dec 2019	B2
10535381	Shu et al.	Jan 2020	B2
10659058	Cheng et al.	May 2020	B1
10673440	Camarota	Jun 2020	B1
10770133	Haig et al.	Sep 2020	B1
10777262	Haig et al.	Sep 2020	B1
20010052822	Kim et al.	Dec 2001	A1
20020006072	Kunikiyo	Jan 2002	A1
20020060938	Song	May 2002	A1
20020136074	Hanzawa et al.	Sep 2002	A1
20020154565	Noh et al.	Oct 2002	A1
20020168935	Han	Nov 2002	A1
20030016689	Hoof	Jan 2003	A1
20030107913	Nii	Jun 2003	A1
20030185329	Dickmann	Oct 2003	A1
20040053510	Little	Mar 2004	A1
20040062138	Partsch et al.	Apr 2004	A1
20040090413	Yoo	May 2004	A1
20040160250	Kim et al.	Aug 2004	A1
20040169565	Gaggl et al.	Sep 2004	A1
20040199803	Suzuki et al.	Oct 2004	A1
20040240301	Rao	Dec 2004	A1
20040264279	Wordeman	Dec 2004	A1
20040264286	Ware et al.	Dec 2004	A1
20050024912	Chen et al.	Feb 2005	A1
20050026329	Kim et al.	Feb 2005	A1
20050036394	Shiraishi	Feb 2005	A1
20050186930	Rofougaran et al.	Aug 2005	A1
20050226079	Zhu et al.	Oct 2005	A1
20050226357	Yoshimura	Oct 2005	A1
20050253658	Maeda et al.	Nov 2005	A1
20050285862	Noda	Dec 2005	A1
20060039227	Lai et al.	Feb 2006	A1
20060055434	Tak et al.	Mar 2006	A1
20060119443	Azam et al.	Jun 2006	A1
20060139105	Maxim et al.	Jun 2006	A1
20060143428	Noda	Jun 2006	A1
20060248305	Fang	Nov 2006	A1
20070001721	Chen et al.	Jan 2007	A1
20070047283	Miyanishi	Mar 2007	A1
20070058407	Dosaka	Mar 2007	A1
20070109030	Park	May 2007	A1
20070115739	Huang	May 2007	A1
20070139997	Suzuki	Jun 2007	A1
20070171713	Hunter	Jul 2007	A1
20070189101	Lambrache et al.	Aug 2007	A1
20070229129	Nakagawa	Oct 2007	A1
20080010429	Rao	Jan 2008	A1
20080049484	Sasaki	Feb 2008	A1
20080068096	Feng et al.	Mar 2008	A1
20080079467	Hou et al.	Apr 2008	A1
20080080230	Liaw	Apr 2008	A1
20080117707	Manickavasakam	May 2008	A1
20080129402	Han et al.	Jun 2008	A1
20080155362	Chang et al.	Jun 2008	A1
20080175039	Thomas	Jul 2008	A1
20080181029	Joshi et al.	Jul 2008	A1
20080265957	Luong et al.	Oct 2008	A1
20080273361	Dudeck et al.	Nov 2008	A1
20090027947	Takeda	Jan 2009	A1
20090089646	Hirose	Apr 2009	A1
20090141566	Arsovski	Jun 2009	A1
20090154257	Fukaishi et al.	Jun 2009	A1
20090231943	Kunce et al.	Sep 2009	A1
20090256642	Lesso	Oct 2009	A1
20090296869	Chao et al.	Dec 2009	A1
20090319871	Shirai	Dec 2009	A1
20100085086	Nedovic et al.	Apr 2010	A1
20100157715	Pyeon	Jun 2010	A1
20100169675	Kajihara	Jul 2010	A1
20100172190	Lavi	Jul 2010	A1
20100177571	Shori et al.	Jul 2010	A1
20100214815	Tam	Aug 2010	A1
20100232202	Lu	Sep 2010	A1
20100260001	Kasprak et al.	Oct 2010	A1
20100271138	Thakur et al.	Oct 2010	A1
20100322022	Shinozaki et al.	Dec 2010	A1
20110018597	Lee et al.	Jan 2011	A1
20110063898	Ong	Mar 2011	A1
20110153932	Ware et al.	Jun 2011	A1
20110211401	Chan et al.	Sep 2011	A1
20110267914	Ish Iku Ra	Nov 2011	A1
20110280307	Macinnis et al.	Nov 2011	A1
20110292743	Zimmerman	Dec 2011	A1
20110299353	Ito et al.	Dec 2011	A1
20120049911	Ura	Mar 2012	A1
20120133114	Choi	May 2012	A1
20120153999	Kim	Jun 2012	A1
20120242382	Tsuchiya et al.	Sep 2012	A1
20120243347	Sampigethaya	Sep 2012	A1
20120250440	Wu	Oct 2012	A1
20120281459	Teman et al.	Nov 2012	A1
20120327704	Chan	Dec 2012	A1
20130039131	Haig et al.	Feb 2013	A1
20130083591	Wuu	Apr 2013	A1
20130170289	Grover et al.	Jul 2013	A1
20140056093	Tran et al.	Feb 2014	A1
20140125390	Ma	May 2014	A1
20140136778	Khailany et al.	May 2014	A1
20140185366	Chandwani et al.	Jul 2014	A1
20140269019	Kolar	Sep 2014	A1
20150003148	Iyer et al.	Jan 2015	A1
20150029782	Jung	Jan 2015	A1
20150063052	Manning	Mar 2015	A1
20150187763	Kim et al.	Jul 2015	A1
20150213858	Tao	Jul 2015	A1
20150248927	Fujiwara	Sep 2015	A1
20150279453	Fujiwara	Oct 2015	A1
20150302917	Grover	Oct 2015	A1
20150310901	Jung	Oct 2015	A1
20150357028	Huang et al.	Dec 2015	A1
20160005458	Shu et al.	Jan 2016	A1
20160027500	Chuang et al.	Jan 2016	A1
20160064068	Mojumder	Mar 2016	A1
20160141023	Jung	May 2016	A1
20160225436	Wang	Aug 2016	A1
20160225437	Kumar	Aug 2016	A1
20160247559	Atallah et al.	Aug 2016	A1
20160284392	Block et al.	Sep 2016	A1
20160329092	Akerib	Nov 2016	A1
20170194046	Yeung, Jr. et al.	Jul 2017	A1
20170345505	Noel et al.	Nov 2017	A1
20180122456	Li	May 2018	A1
20180123603	Chang	May 2018	A1
20180157621	Shu et al.	Jun 2018	A1
20180158517	Shu et al.	Jun 2018	A1
20180158518	Shu et al.	Jun 2018	A1
20180158519	Shu et al.	Jun 2018	A1
20180158520	Shu	Jun 2018	A1
20200117398	Haig et al.	Apr 2020	A1
20200160905	Charles et al.	May 2020	A1
20200301707	Shu et al.	Sep 2020	A1

Foreign Referenced Citations (3)

Number	Date	Country
104752431	Jul 2015	CN
10133281	Jan 2002	DE
2005-346922	Dec 2005	JP

Non-Patent Literature Citations (1)

Entry
US 10,564,982 B1, 02/2020, Oh et al. (withdrawn)

Provisional Applications (2)

	Number	Date	Country
	62430767	Dec 2016	US
	62430762	Dec 2016	US

Continuations (2)

	Number	Date	Country
Parent	15709379	Sep 2017	US
Child	16150176		US
Parent	16150176		US
Child	16150176		US

Continuation in Parts (6)

	Number	Date	Country
Parent	15709399	Sep 2017	US
Child	16150176		US
Parent	15709401	Sep 2017	US
Child	15709399		US
Parent	16150176		US
Child	15709399		US
Parent	15709382	Sep 2017	US
Child	16150176		US
Parent	16150176		US
Child	16150176		US
Parent	15709385	Sep 2017	US
Child	16150176		US

Orthogonal data transposition system and method during data transfers to/from a processing array

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

CPC

International Classifications

Abstract

Description

Claims

PRIORITY CLAIM/RELATED APPLICATIONS

US Referenced Citations (396)

Foreign Referenced Citations (3)

Non-Patent Literature Citations (1)

Provisional Applications (2)

Continuations (2)

Continuation in Parts (6)