Multi-Port Memory Architecture For Storing Multi-Dimensional Arrays II

Information

  • Patent Application
  • 20070277004
  • Publication Number
    20070277004
  • Date Filed
    May 23, 2006
    18 years ago
  • Date Published
    November 29, 2007
    17 years ago
Abstract
An N-port memory architecture is disclosed that stores multi-dimensional arrays so that: (1) N contiguous elements in a row can be accessed without blocking, (2) N contiguous elements in a column can be accessed without blocking, (3) some N-element two-dimensional sub-arrays can be accessed without blocking, and (4) all N/2-element two-dimensional sub-arrays can be accessed without blocking. Second, the architecture has been modified so that the above can happen and that any element can be accessed on any data port. The architecture is particularly advantageous for loading and unloading data into the vector registers of a single-instruction, multiple-data processor, such as that used for video decoding.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 depicts a block diagram of a multi-processor and a multi-port memory.



FIG. 2 depicts a graph of the space-time complexity for three multi-port architectures in the prior art.



FIG. 3 depicts a block diagram of an N-port memory in accordance with the illustrative embodiment of the present invention in which N=8.



FIG. 4 depicts a block diagram of the logical structure of memory 301, which is of a linear memory with P memory locations identified by addresses 0 through P−1, wherein P is a positive integer greater than 1.



FIG. 5 depicts a block diagram of the salient components of memory 301, which comprises storage 501, N=8×N=8 data switch 502, and N=8×N=8 address switch and decoder 503, interconnected as shown.



FIG. 6 depicts a block diagram of the salient components of storage 501, which comprises N=8 independent memory banks 501-1 through 501-8.



FIG. 7
a depicts a mapping of the elements in a multi-dimensional array to memory banks.



FIG. 7
b depicts how N contiguous elements of a the first column are all stored in different memory banks, and, therefore, can be read without contention.



FIG. 7
c depicts how N contiguous elements in the third row are all stored in different memory banks, and, therefore, can be read without contention.



FIG. 7
d depicts how a subarray of N/2 contiguous elements—elements (3,2), (4,2), (3,3), and (4,3)—are all stored in different memory banks, and, therefore, can be read without contention.



FIG. 7
e depicts how some, but not all, subarrays of N contiguous elements are stored in different memory banks, and, therefore, can be read without contention.



FIG. 8 depicts a mapping of multi-dimensional array elements to logical addresses.



FIG. 9 depicts a mapping of logical addresses to memory banks.



FIG. 10 depicts a block diagram of the salient components of address switch and decoder 502, which comprises N=8×N=8 address switch 1001 and address decoder 1002.





DETAILED DESCRIPTION


FIG. 3 depicts a block diagram of an N-port memory in accordance with the illustrative embodiment of the present invention in which N=8. It will be clear to those skilled in the art, after reading this specification, how to make and use alternative embodiments of the present invention for any value in which N is a positive integer.


Memory 301 comprises N=8 data ports and N=8 address ports. A word can be read from or written to memory 301 on a data port independently of whether a word is read from or written to memory 301 on another port. In other words, any combination of N=8 words can be read from and written into memory 301 in one cycle. For example, a word can be written into memory 301 on data ports 1, 6, and 8, while words are read from memory 301 on data ports 2, 3, 4, 5, and 7. In all cases, the data on port n, wherein nε{1, 2, . . . , N}, is associated with the address on address port n.



FIG. 4 depicts a block diagram of the logical structure of memory 301, which is a linear memory with P memory locations identified by addresses 0 through P−1, and wherein P is a positive integer greater than 1. In accordance with the illustrative embodiment, P=16,384=0x3FFF=2̂14, but it will be clear to those skilled in the art how to make and use alternative embodiments of the present invention for any value of P. So although memory 301 has multiple ports, the reading of an address on one address port yields the same data as on another port because they both refer to the same logical memory location.



FIG. 5 depicts a block diagram of the salient components of memory 301, which comprises storage 501, N=8×N=8 data switch 502, and N=8×N=8 address switch and decoder 503, interconnected as shown.


Storage 501 comprises P memory locations, N address ports, 510-1 through 510-8, and N data ports, 513-1 through 513-8. In accordance with the illustrative embodiment, each logical memory location corresponds to only one of the address ports 510-1 through 510-8 and one of the data ports 513-1 through 513-8.


The constraint that each logical memory location in memory 501 corresponds to only one of the address ports 510-1 through 510-8 means that a logical address on one of address ports 511-1 through 511-8 must be routed to the correct one of address ports 510-1 through 510-8. This is the function performed by address switch and decoder 503. In other words, address switch and decoder 503 must:

    • i. decode each logical address on each of address ports 511-1 through 511-8,
    • ii. generate a physical memory address in storage 501 that corresponds to that logical address, and
    • iii. route the physical address to the appropriate one of address ports 510-1 through 510-8.


      In accordance with the illustrative embodiment, address switch and decoder 503 comprises an N×N non-blocking crossbar switch, but it will be clear to those skilled in the art, after reading this specification, how to make and use alternative embodiments of the present invention in which another structure provides the requisite functionality.


The shuffling of addresses between address ports 511-1 through 511-8 and address ports 510-1 through 510-8, without more, destroys the isomorphic relationship in which the data on port 512-n is associated with the address on address port 511-n. To preserve this relationship, data switch 502 performs the inverse shuffle of address switch and decoder 503. For example, if logical address 0x0000 is presented on address port 511-3 during a read operation, the data in logical address 0x0000 should appear on data port 512-3. But within memory 301, address switch and decoder 503 might route the corresponding physical address to address port 510-1 which would cause the desired word to emerge on data port 513-1. To ensure that the word emerges on data port 512-3, data switch 513 routes the word from data port 513-1 to 512-3.


In accordance with the illustrative embodiment, data switch 502 is an N×N non-blocking crossbar switch, but it will be clear to those skilled in the art, after reading this specification, how to make and use alternative embodiments of the present invention in which another structure provides the requisite functionality.


There is another advantage to the combination of address switch and decoder 503 and data switch 502 and that is that it enables the word at any logical address to be read from, or written to, any of data ports 512-1 through 512-8. This is particularly advantageous when, for example, memory 301 is used to load and unload the vector registers in a single-instruction, multiple-data processor.



FIG. 6 depicts a block diagram of the salient components of storage 501, which comprises N=8 independent memory banks 601-1 through 601-8. Each memory bank is a single-port memory that comprises P/N=2̂13=2048 words. Because storage 501 comprises independent memory banks only one word from each memory bank can be read or written to in a single cycle.


Although the worst-case contention situation cannot be eliminated the average-case can be by distributing words that are often accessed together across different memory banks. There are special-purpose applications where group of words are often accessed together and one of those applications involves the storage of multi-dimensional arrays, such as those commonly manipulated in video coding and coding (e.g., H.264, MPEG, etc.). For example, in video decoding, the elements in a row, a column, and a contiguous block tend to be accessed far more frequently together than random elements in the array.


In accordance with the illustrative embodiment, each element of a J×K two-dimensional array, wherein J and K are both positive integers greater than 1, is assigned to one of the memory banks so that three conditions are satisfied:

    • i. the coordinates for N contiguous elements in a row of the two-dimensional array decode into different memory banks; and
    • ii. the coordinates for N contiguous elements in a column of the two-dimensional array decode into different memory banks; and
    • iii. the coordinates for the elements in an L by M two-dimensional subarray of the two-dimensional array decode into different memory banks, wherein L and M are both positive integers, 1≦L≦J, 1≦M≦K, and 2≦L*M≦N/2.


It will be clear to those skilled in the art, after reading this disclosure, how to generate any of the many suitable mappings between array coordinates and memory banks—and one illustrative mapping is depicted in FIG. 7a.


FIG. 7
b depicts how N contiguous elements of a the first column are all stored in different memory banks, and, therefore, can be read without contention. The reader can verify that the same is true for all columns.



FIG. 7
c depicts how N contiguous elements in the third row are all stored in different memory banks, and, therefore, can be read without contention. The reader can verify that the same is true for all columns.



FIG. 7
d depicts how a subarray of N/2 contiguous elements—elements (3,2), (4,2), (3,3), and (4,3)—are all stored in different memory banks, and, therefore, can be read without contention. The reader can verify that the same is true for all subarrays of N/2 contiguous elements.



FIG. 7
e depicts how some, but not all, subarrays of N contiguous elements are stored in different memory banks, and, therefore, can be read without contention. The reader can verify that the same is true for some, but not all, subarrays of N contiguous elements.


One corollary of the above constraints is that, in accordance with the Pigeon Hole Principal, at least two coordinates for any N+1 elements decode into the same memory bank.


In accordance with the illustrative embodiment, each element of a J×K two-dimensional array is assigned a logical address in, for example, row-column order as depicted in FIG. 8. It will be clear to those skilled in the art how to assign the elements to logical addresses in accordance with a different, but suitable, scheme.


In addition, address switch and decoder 503 comprises logic for decoding each of the addresses into:

    • i. a memory bank, and
    • ii. a unique physical address into that memory bank


      so that the following three conditions are satisfied:
    • i. addresses p+(c−1) decode into different memory banks for all p and all c, wherein 0≦p+(c−1)<P, wherein p is a positive integer and pε{0, . . . , P−1}, wherein c is a positive integer and cε{1, . . . , C}, and wherein C is a positive integer and C≦N; and
    • ii. addresses p+N(r−1) decode into different memory banks for all p and all r, wherein 0≦p+N(r−1)<P, wherein r is a positive integer and rε{1, . . . R}, and wherein R is a positive integer and R≦N; and
    • iii. addresses p+(c−1)+N(r−1) decode into different memory banks for all p, all c, and all r, wherein 0≦p+(c−1)+N(r−1)<P, and wherein 1≦C*R≦N/2.


The result will be a mapping of logical addresses to memory banks, such as that depicted in FIG. 9.

Here too, because there are only N memory banks, the Pigeon Hole Principal holds—at least two addresses in every set of N+1 addresses decode into the same memory bank.



FIG. 10 depicts a block diagram of the salient components of address switch and decoder 502, which comprises N=8×N=8 address switch 1001 and address decoder 1002.


Address switch 1001 is combinational logic that receives a P-bit logical address on each of address ports 511-1 through 511-8 and that outputs a (log2P-log2N)-bit physical address on each of address ports 510-1 through 510-8. Address switch 1001 shuffles the addresses under the control of address decoder 1002 using a non-blocking cross-bar switch, but performs the logical address to physical memory address translation on its own so that each P-bit logical address assigned to a single memory bank generates a unique (log2P-log2N)-bit physical address. It will be clear to those skilled in the art how to accomplish this.


It is to be understood that the above-described embodiments are merely illustrative of the present invention and that many variations of the above-described embodiments can be devised by those skilled in the art without departing from the scope of the invention. It is therefore intended that such variations be included within the scope of the following claims and their equivalents.

Claims
  • 1. An apparatus comprising: (i) A address ports, wherein A is a positive integer greater than one;(ii) D data ports, wherein D is a positive integer greater than one;(iii) N independent memory banks, wherein N is a positive integer greater than one;(iv) an address switch for routing addresses on said address ports to said memory banks, wherein said routing is based, at least in part, on said addresses; and(v) a data switch for routing data between said data ports and said memory banks, wherein said routing is based, at least in part, on (1) said addresses, and (2) which of said addresses is on which of said address ports.
  • 2. The apparatus of claim 1 wherein said apparatus comprises P memory locations identified by addresses 0 through P−1, wherein P is a positive integer greater than 1; and wherein said address switch is non-blocking for all combinations of addresses p+(c−1) for all p and all c, wherein 0≦p+(c−1)<P, wherein p is a positive integer and pε{0, . . . , P−1}, wherein c is a positive integer and cε{1, . . . , C}, and wherein C is a positive integer and C≦N;wherein said address switch is blocking for at least one combination of addresses p+(c−1) for all p and all c when C>N;wherein said address switch is non-blocking for all combinations of addresses p+N(r−1) for all p and all r, wherein 0≦p+N(r−1)<P, wherein r is a positive integer and rε{1, . . . , R}, and wherein R is a positive integer and R≦N; andwherein said address switch is blocking for at least one combination of addresses p+N(r−1) for all p and all r when R>N.
  • 3. The apparatus of claim 2 wherein said address switch is non-blocking for addresses p+(c−1)+N(r−1) for all p, all c, and all r, wherein 0≦p+(c−1)+N(r−1)<P, and wherein 1≦C*R≦N.
  • 4. The apparatus of claim 1 wherein said apparatus comprises P memory locations identified by addresses 0 through P−1, wherein P is a positive integer greater than 1; and wherein said address switch is non-blocking for all combinations of addresses p+(c−1)+N(r−1) for all p, all c, and all r, wherein 0≦p+(c−1)+N(r−1)<P, wherein p is a positive integer and pε{0, . . . , P−1}, wherein c is a positive integer and cε{1, . . . , C}, wherein r is a positive integer and rε{1, . . . , R}, and wherein C and R are positive integers and 1≦C*R≦N; andwherein said address switch is blocking for at least one combination of addresses p+(c−1)+N(r−1) for all p and all c when C*R>N.
  • 5. The apparatus of claim 1 wherein said apparatus comprises P memory locations identified by addresses 0 through P−1, wherein P is a positive integer greater than 1; and wherein said data switch is non-blocking for all combinations of addresses p+(c−1) for all p and all c, wherein 0≦p+(c−1)<P, wherein p is a positive integer and pε{0, . . . , P−1}, wherein c is a positive integer and cε{1, . . . C}, and wherein C is a positive integer and C≦N;wherein said data switch is blocking for at least one combination of addresses p+(c−1) for all p and all c when C>N;wherein said data switch is non-blocking for all combinations of addresses p+N(r−1) for all p and all r, wherein 0≦p+N(r−1)<P, wherein r is a positive integer and rε{1, . . . R}, and wherein R is a positive integer and R≦N; andwherein said data switch is blocking for at least one combination of addresses p+N(r−1) for all p and all r when R>N.
  • 6. The apparatus of claim 5 wherein said data switch is non-blocking for addresses p+(c−1)+N(r−1) for all p, all c, and all r, wherein 0≦p+(c−1)+N(r−1)<P, and wherein 1≦C*R≦N.
  • 7. The apparatus of claim 1 wherein said apparatus comprises P memory locations identified by addresses 0 through P−1, wherein P is a positive integer greater than 1; and wherein said data switch is non-blocking for all combinations of addresses p+(c−1)+N(r−1) for all p, all c, and all r, wherein 0≦p+(c−1)+N(r−1)<P, wherein p is a positive integer and pε{0, . . . , P−1}, wherein c is a positive integer and cε{1, . . . , C}, wherein r is a positive integer and rε{1, . . . , R}, and wherein C and R are positive integers and 1≦C*R≦N; andwherein said data switch is blocking for at least one combination of addresses p+(c−1)+N(r−1) for all p and all c when C*R>N.
  • 8. An apparatus comprising: (a) a processor comprising an N-word register, wherein N is a positive integer greater than one;(b) an N-port memory comprising P memory locations identified by addresses 0 through P−1, wherein P is a positive integer greater than one;(c) an N by N data switch interposed between said memory and said register that is: (i) non-blocking for all combinations of addresses p+(c−1) for all p and all c, wherein 0≦p+(c−1)<P, wherein p is a positive integer and pε{0, . . . , P−1}, wherein c is a positive integer and cε{1, . . . , C}, and wherein C is a positive integer and C≦N;(ii) blocking for at least one combination of addresses p+(c−1) for all p and all c when C>N;(iii) non-blocking for all combinations of addresses p+N(r−1) for all p and all r, wherein 0≦p+N(r−1)<P, wherein r is a positive integer and rε{1, . . . , R}, and wherein R is a positive integer and R≦N; and(iv) blocking for at least one combination of addresses p+N(r−1) for all p and all r when R>N.
  • 9. The apparatus of claim 8 wherein said data switch is (v) non-blocking for addresses p+(c−1)+N(r−1) for all p, all c, and all r, wherein 0≦p+(c−1)+N(r−1)<P, and wherein 1≦C*R≦N.
  • 10. The apparatus of claim 9 wherein said data switch is (vi) blocking for at least one combination of addresses p+(c−1)+N(r−1) for all p, all c, and all r, when C*R>N.
  • 11. An apparatus comprising: (a) a processor comprising an N-word register, wherein N is a positive integer greater than one;(b) an N-port memory comprising P memory locations identified by addresses 0 through P−1, wherein P is a positive integer greater than one;(c) an N by N data switch interposed between said memory and said register that is: (i) non-blocking for addresses p+(c−1)+N(r−1) for all p, all c, and all r, wherein 0≦p+(c−1)+N(r−1)<P, wherein p is a positive integer and pε{0, . . . , P−1}, wherein c is a positive integer and cε{1, . . . , C}, wherein r is a positive integer and rε{1, . . . , R}, and wherein C and R are positive integers and 1≦C*R≦N; and(ii) blocking for at least one combination of addresses p+(c−1)+N(r−1) for all p, all c, and all r, when C*R>N.