The present invention relates to a memory access unit and to a memory controller comprising one or more memory access units.
Motor vehicles are increasing being equipped with different forms of sensors for detecting harmful situations. These sensors produce large amounts of data in multiple dimensions. Ideally, samples should be stored sequentially in memory to allow hardware accelerators to process them efficiently. However, there is often a need to process samples in more than one dimension.
One solution is to re-order samples between processing steps. However, this increases computation overhead and memory requirements.
Another solution is to employ memory which allows data to be read out in orthogonal dimensions. An example of such a memory is described in WO 2009/003115 A1.
The present invention seeks to provide a memory access unit for handling transfers of data words of fixed size (herein referred to as “samples”).
According to a first aspect of the present invention there is provided a memory access unit for handling transfers of samples in a d-dimensional array between one of m data buses, where m≥1, and k*m memories, where k≥2. The memory access unit comprises k address calculators and k sample collectors. Each address calculator is configured to receive a bus address, to add a respective offset to generate a sample bus address and to generate, from the sample bus address according to an addressing scheme, a respective address in each of the d dimensions for accessing a respective sample. Each sample collector is operable to generate a memory select for a respective one of the k*m memories so as to transfer the sample between a predetermined position in a bus data word and the respective one of the k*m memories. Each sample collector is configured to calculate a respective memory select in dependence upon the address in each of the d dimensions such that each sample collector selects a different one of the k memories so as to allow the sample collectors to access k of the k*m memories concurrently. For a single bus, m=1 and k*m=k. Each sample collector may be operable to transfer the respective sample to or from the respective one of the k*m memories.
Thus, samples can be accessed (read or write) along any of the dimensions without sample re-ordering in the order of the dimension.
A sample data width and a sample storage data width may be the same. The sample data width may be an integer multiple of (for example, double) the sample storage data width. The sample data width may be settable to be, for example, four bytes, consisting of 32 bits, or eight bytes, consisting of 64 bits. The storage data width may be, for example, four bytes, consisting of 32 bits. However, other storage data widths may be used. The data bus width may be 16 bytes, consisting of 128 bits, or 32 bytes, consisting of 256 bits.
The memory access unit may further comprise a set of registers for changeably setting the numbers of samples in each of the d dimensions and/or sample data width.
The number d of dimensions may be settable to be two or three.
Each sample collector may be configured to calculate the respective memory select in dependence upon a sum of the addresses in each of the d dimensions.
Each sample collector may be configured to calculate the memory select, CS, using:
CS=[(A_1+A_2+ . . . +A_d)]% k
Each address calculator may be configured to generate an index. This may be used when the sample data width is greater than the sample storage or memory data width.
Each sample collector may be configured to calculate a memory select, CS, using:
CS=[I_S+(w_s/w_m)*(A_1+A_2+ . . . +A_d)]% k
where w_s is the sample data width and w_m is memory data width.
Each address calculator may be configured to calculate the respective addresses in dependence upon a linearly-increasing word address, sample data width or size and numbers of samples in each dimension.
Each address calculator may be configured to receive a bus address and to adjust the bus address based on a respective offset. Thus, the address calculators can generate respective sets of addresses and, thus, generate different memory selects.
The memory access unit may further comprise a bus interface coupled to the address calculators and the sample collectors. The bus interface may be configured to pass a bus address to each of the address calculators.
The memory access unit is preferably implemented in hardware logic.
The memory access unit may be suitable for handling transfers of samples in a d-dimensional array between a data bus and k*m memories, where the k*m memories are shared by m data buses.
According to a second aspect of the present invention there is provided a memory controller comprising at least one memory access unit.
The memory controller may comprise at least two memory access units. The at least two memory access units may access a common (or “global”) set of registers.
The memory controller may comprise m memory access units for handling transfers of samples in a d-dimensional array between m data buses and k*m memories.
According to a third aspect of the present invention there is provided a memory system comprising a memory controller and k*m memories, each set of k*m memories operatively connected to the m memory access units.
There may be one data bus, i.e. m=1. There may be more than one data bus, i.e. m≥2. There may be between two and ten, or more, data buses, i.e. 10≥m≥2, or m>10.
According to a fourth aspect of the present invention there is provided an integrated circuit comprising a memory access unit or a memory controller.
The integrated circuit may be a microcontroller. The integrated circuit may be an application specific integrated circuit (ASIC). The integrated circuit may be a system on a chip (SoC). The integrated circuit may be a hardware accelerator. The integrated circuit may be a graphical processing unit (GPU). The integrated circuit may be a digital signal processor (DSP).
According to a fifth aspect of the present invention there is provided a motor vehicle comprising a computing device comprising a memory access unit or a memory controller.
The motor vehicle may be a motorcycle, an automobile (sometimes referred to as a “car”), a minibus, a bus, a truck or lorry. The motor vehicle may be powered by an internal combustion engine and/or one or more electric motors.
According to a sixth aspect of the present invention there is provided a method of transferring samples in a d-dimensional array between one of m data buses and k*m memories. The method comprises, for each of k samples: receiving a bus address and adding a respective offset to generate a sample bus address and generating, from the sample bus address according to an addressing scheme, a respective address in each of the d dimensions for accessing samples along one of the dimensions and generating a memory select for a respective one of the k*m memories, where m≥1, and k≥2, so as to transfer a sample between a predetermined position in a bus data word and the respective one of the k*m memories, wherein generating the memory select comprises calculating a memory select in dependence upon the addresses in each of the d dimensions so as to allow the k samples to be written to or read from k of the k*m memories concurrently.
The method can be performed by a unit (or module) such that it allows the unit and other units to access the k*m memories concurrently.
The method is preferably a hardware-implemented method. The unit (or module) may be a logic unit (or “logic module”).
According to a seventh aspect of the present invention there is provided a computer program comprising instructions which, when executed by one or more processors, causes the one or more processors to perform the method.
According to an eighth aspect of the present invention there is provided a computer readable medium (which may be non-transitory) carrying or storing the computer program.
Certain embodiments of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:
In the following description, all constants, variables and registers are of type integer unless otherwise specified.
Memory System 1
Referring to
The memory access circuit 4 includes an address calculator 8 (which may also be referred to as an “address decoder”), a data multiplexer 9 and a bus interface 10.
Each memory module 21, . . . , 2n has a memory data width w_m. In this example, the memory data width w_m is 128 bits. In this example, the data bus 7 is 128 bits wide.
The address calculator 8 selects one RAM module 21, . . . , 2n from n RAM modules 21, . . . , 2n using a chip select signal CS and specifies a part of memory 2 to be accessed using a memory address A_M, via chip select and memory address buses 11. The address calculator 8 calculates chip select CS and memory address A_M using a bus address A_B received on the address bus 6 via the bus interface 10.
In a write transfer, bus data D_B are placed on memory data bus 121, . . . , 12k. The data are stored in one of the RAM modules 21, . . . , 2n upon selection using chip select CS at a respective address A_M. The bus interface can request only part of the bus data D_B to be stored.
In a read transfer, a selected RAM module 21, . . . , 2n can transfer memory data D_M to the bus interface 10. The bus interface 10 can request only part of memory data D_M be transferred.
Referring also to
The picture 14 (
As shown in
Thus, when samples 13 are read out along the y dimension, performance is degraded by a factor of four.
Performance can be degraded even further if data transfer occurs in bursts. Bus systems can have long latencies. If an access request is made, it may be necessary to wait a several cycles before data is transferred. Hence, modern bus systems transfer multiple data words in sequence (called “burst”) with each request. However, bursts require that the address for each data word increases linearly without jumps. Therefore, they cannot be used to transfer the samples in the y direction, reducing the performance even more.
If more than one sample is transferred in a bus word, then the full width of a bus cannot be used unless the sample order matches the storage order. However, often there is a need to read out samples along more than one dimension, for example, in image processing and radar applications, and other applications which involve multi-dimensional data access or transpositions, such as in multi-dimensional fast Fourier transform (FFT) processing. Some applications, such as radar, are automotive applications. In accordance with the present invention, memory access units and memory access methods can allow samples to be read out along different dimensions without the need for sample re-ordering and/or which can help improve bus utilization.
Memory System 21
Referring to
The memory access unit 24 includes an array of k address calculator modules 281, . . . , 28k, an array of k sample collector modules 291, . . . , 29k, a common bus interface 30 and a set of configuration registers 31.
Each memory module 221, . . . , 22n has a memory data width w_m. The memory data width w_m and the minimum sample data width are the same. In this example, the data bus 27 is 128 bits wide. However, the data bus 27 can be narrower or wider, for example, 256 bits wide.
Referring also to
Referring again to
A_B′=A_B+(Nr−1)*w_m (1)
where Nr=1, 2, . . . , k and memory data width w_m.
An example of an address decoding scheme for three-dimensional accesses in radar signal processing is described hereinafter. However, any suitable address decoding scheme can be used. The dimension address A_L is the linear address along the access dimension.
Referring to
A sample collector 291, . . . , 29k identifies a sample storage 221, . . . , 22k in which a requested sample 33 is stored or is to be stored, selects the identified sample storage 221, . . . , 22k using a select signal CS, requests transfer (read or write) of the sample 34 at the address A_M and fills or retrieves the sample 34 at the assigned position in the bus data word D_B.
A dedicated sample collector 291, . . . , 29k exists for each sample 34 in the bus data word D_B. As will be explained in more detail later, the sample collectors 291, . . . , 29k perform memory access in parallel using a chip select arrangement which ensures that there are no access conflicts in sample storage 221, . . . , 22k. Thus, the full bus data width can be utilized without introducing wait states.
As shown in
And, as shown in
Accesses along the x and y dimensions can utilize the full bus width, in this case, four samples per bus data word. Furthermore, bursts are possible since A_L linearly increases, without jumps. This can further improve performance. For example, if the bus 25 is an Advanced eXtensible Interface (AXI) bus with ten pipeline stages, a single access requires ten cycles per word. A 16-beat burst requires ten cycles for the initial word and fifteen cycles for the remaining words. On average, one word requires 25/16=1.6 cycles per word. This is another speed up by an additional factor of six.
Thus, compared with the memory system 1 shown in
Referring again to
Each sample 34 is addressed within the array by its dimension address A_1, A_2, . . . , A_d and sample index I_S. The sample index I_S is used if the sample width w_s is greater than the sample storage width w_m.
Since the bus data word D_B contains more than one sample, a bus access field indicates along which dimension access is requested. Several bus address calculation schemes are possible which identify encoding access direction, linear address A_L and index.
Each sample collector 291, . . . , 29k computes, or calculates a sample address, a physical word address, a selected memory chip select (CS) and a memory address (steps S3 to S6) using equations 2 to 5 below:
Each sample collector 291, . . . , 29k accesses a respective sample storage 221, . . . , 22k (step S7). The bus interface answers the access request (step S8).
The selected memory chip select CS(1), . . . , CS(k) is different for adjacent samples in every dimension. If accesses occur along dimension j (j={1, . . . , d}), only the dimension address A_j changes in equation 4. The other dimension addresses A_i (i={1, . . . , d}, i< >j) remain constant. However, for the term which changes, the term is different for adjacent addresses in the accessed dimension. Thus, parallel accesses to sample storages are possible.
There is, however, an exception to this arrangement, namely when a transaction crosses an access dimension boundary. In that case, more than one term changes in the selected memory CS equation or calculation. Depending on the chosen sizes, wait states may be necessary. No wait states occur if each dimension size S_1, S_2, . . . , S_d is a multiple of w_m*k/w_s. A memory access system which has multiple bus interfaces and which uses wait states is described hereinafter.
Application of the Memory System 21 in a Radar Application
Referring to
In radar and other applications, a change of dimension requires re-ordering samples in memory (referred to as performing “corner turns”). A large amount of processing time can be spent on the corner turns.
Referring to
The multi-dimensional memory system 21 can be used to store and then read out the right concatenation of the samples 35 in the bus response for the given dimension. Thus, data reorganisation can be avoided.
Simple Examples Showing Address Calculation and Sample Storage Selection
Referring to
The maximum number of samples per bus word k is four, the total number of dimensions d is three, the memory data width w_m is four bytes and the sample data width w_s is four bytes. There are four, five and two samples in first, second and third dimensions S_1, S_2, S_3, respectively.
As shown in
Referring to
Referring to
Bus Address Decoding Scheme for Radar Devices
Referring again to
An address decoding scheme will now be described with reference to
Range Addressing Mode
Range addressing mode can be selected by setting DIM to 0, i.e. 2b00. The first, second and third dimensional addresses A_1, A_2 and A_3 are calculated as follows:
A_1=(A_L/(w_s)) % S_1 (6-R-1)
A_2=(A_L/(S_1*w_s))% S_2 (6-R-2)
A_3=(A_L/(S_2*S_1*w_s))% S_3 (6-R-3)
Range addressing mode can also be used to access the memory in a traditional way by setting the sample width to be the same as memory width, i.e., w_s=w_m, and the dimension size S_1, . . . , S_d to be the maximum physical memory size.
Pulse Addressing Mode
Pulse addressing mode can be selected by setting DIM to 1, i.e. 2b01. The first, second and third dimensional addresses A_1, A_2 and A_3 are calculated as follows:
A_1=(A_L/(S_2*w_s) % S_1 (6-P-1)
A_2=(A_L/(w_s))% S_2 (6-P-2)
A_3=(A_L/(S_2*S_1*w_s)) % S_3 (6-P-3)
Channel Addressing Mode
Channel addressing mode can be selected by setting DIM to 2, i.e. 2b10. The first, second and third dimensional addresses A_1, A_2 and A_3 are calculated as follows:
A_1=(A_L/(S_3*w_s) % S_1 (6-C-1)
A_2=(A_L/(S_3*S_1*w_s))% S_2 (6-C-2)
A_3=(A_L/(w_s))% S_3 (6-C-3)
Sample Index
The sample index I_S is calculated using:
I_S=A_L/w_m % (w_s/w_m) (7)
Referring to
Referring to
Referring to
Memory System 71
Referring to
A RDY feedback can also be used to handle conflicts in CS selection. If a plurality of sample collectors accesses the same sample storage, then conflict can be resolved by an arbiter.
To support m bus interfaces, there are an equivalent number of memory segments. In the best case, m masters can access each segment without conflicts.
Referring to
Each memory access unit 741, . . . , 74m includes an array of k address calculator modules 791, . . . , 79k, an array of k sample collector modules 801, . . . , 80k and a common bus interface 81. The memory system 71 is provided with a set of configuration registers 82.
Each memory module 721, . . . , 72m has a memory data width w_m. In this example, memory data width w_m is 32 bits and each data bus 77 is 128 bits wide.
Referring also to
Referring to
The address calculator 281, . . . 28k, 791, . . . , 79k comprises an adder unit 28a, 79a and address calculator arithmetic logic unit 28b, 79b.
The adder unit 28a, 79a adjusts the bus address A_B to match the position Nr=1, 2, . . . , k of the sample in the bus data word using equation (i) above.
The address calculator arithmetic logic unit 28b, 79b converts the sample bus address A_B′ into the dimensional address A_1, . . . , A_d and the sample index I_S. Different address decoding schemes are possible.
Referring to
The sample collector 291, . . . , 29k, 801, . . . , 80k comprises sample calculator arithmetic logic unit 29a, 80a and a multiplexer 29b, 80b.
To support parallel accesses without conflicts from m bus interfaces, the sample storage chip select signal CS partitions memory into m segments, where m is a positive integer greater than one. Each segment contains S_M words.
Within the segment, sample collector 291, . . . , 29k, 801, . . . , 80k calculates sample storage chip select signal CS(MULTIPLE BUS) (hereinafter also referred to simply as CS) using:
CS(MULTIPLE BUS)=(A_P/S_M)*k+CS(SINGLE BUS) (4′)
where CS(SINGLE) can be calculated using equation 4 above.
The sample collector 291, . . . , 29k, 801, . . . , 80k calculates address in memory A_M using equation 4′ below:
A_M=(A_P % S_M)/k (5′)
D_B is bi-directional depending on transfer direction (read/write).
RDY indicates that a bus must wait where there are multiple masters and/or multiple sample collectors access the same RAM module.
Referring to
The sample storage module 221, . . . , 22k*m, 721, . . . , 72k*m comprises a RAM macro 22a, 72a, a multiplexer 22b, 72b, an arbiter 22c, 72c and comparators 22d(1), . . . , 22d(k*m), 72d(1), . . . , 72d(k*m).
Each sample storage 721, . . . , 72k*m reacts only to a CS request that matches its number Nr, where Nr={1, . . . , k*m}. The arbiter 22C, 72C selects one request from all active CS requests by a suitable policy, for example round-robin. All other sample collectors 291, . . . , 29k, 801, . . . , 80k are paused using the RDY signal.
RAM 22a, 72a handles the data word transfer depending on the direction (read or write).
Referring to
The bus interface 81 includes a concatenate block 81a and k-input AND gate 81b.
Each sample collector 801, . . . , 80k handles a fixed sample position in the bus word. The bus interface 81 concatenates data D_M from each sample collector 801, . . . , 80k for output as bus data, namely:
D_B={D_M(k), . . . ,D_M(1)} (7)
The k-input AND gate 81b receives RDY(1), . . . , RDY(k) from the sample collectors 801, . . . , 80k and outputs a RDY signal.
Referring to
Referring to
The motor vehicle 91 includes an advanced driver assistance system (ADAS) 92 which includes sensors (not shown) and one or more memory systems 21, 71.
It will be appreciated that many modifications may be made to the embodiments hereinbefore described.
Number | Date | Country | Kind |
---|---|---|---|
14191961 | Nov 2014 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/073870 | 10/15/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2016/071091 | 5/12/2016 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4603348 | Yamada | Jul 1986 | A |
5008852 | Mizoguchi | Apr 1991 | A |
5684981 | Jones | Nov 1997 | A |
5765181 | Oberlin | Jun 1998 | A |
6604166 | Jana | Aug 2003 | B1 |
20100042759 | Srinivasan et al. | Feb 2010 | A1 |
20100145993 | Rakib et al. | Jun 2010 | A1 |
20110087821 | Seo et al. | Apr 2011 | A1 |
20110296078 | Khan et al. | Dec 2011 | A1 |
20130080739 | Kyo | Mar 2013 | A1 |
20140310496 | Eguro | Oct 2014 | A1 |
Number | Date | Country |
---|---|---|
WO 2009003115 | Dec 2008 | WO |
Entry |
---|
Song, et al., “Synthesis of Custom Interleaved Memory Systems,” IEEE Transactions on very large scale integration (VLSI) Systems, IEEE Service Center, vol. 8., No. 1, Feb. 1, 2000, pp. 74-83. |
Extended Search Report and Written Opinion of priority EP application No. 14191961.3 dated May 15, 2015, 13 pages. |
International Search Report and Written Opinion of corresponding International Application No. PCT/EP2015/073870 dated Dec. 1, 2015, 8 pages. |
Number | Date | Country | |
---|---|---|---|
20170329702 A1 | Nov 2017 | US |