This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/EP2011/063619, filed Aug. 8, 2011, which was published in accordance with PCT Article 21(2) on Feb. 16, 2012 in English and which claims the benefit of European patent application No. 10305890.5, filed Aug. 13, 2010.
The invention relates to a method and to an apparatus for storing at least two data streams into an array of memories, or for reading at least two data streams from an array of memories, wherein the array is arranged such that the memories are accessed via multiple data or memory buses to each of which multiple memories are connected, and wherein only blocks of data can be programmed in, or read from, a memory at a time.
High speed mass storage devices built with non-volatile memory chips are suitable for recording and playing back concurrent data/video streams under real-time conditions, e.g. for video productions in film and broadcast studio environments. Higher spatial resolutions, higher frame rates and uncompressed multiple stream recording for 3D productions are increasing the requirements for storage media bandwidth and processing power capabilities. Since some years non-volatile memories like NAND flash devices are used for recording digital video. To fulfil the special storage requirements of digital video production with off-the-shelf NAND flash devices in mobile embedded storage media, a special handling of data and of NAND flash memories is required.
Writing data to state-of-the-art NAND flash memories requires a special processing of data flow caused by the internal architecture of NAND flash memories. Such non-volatile memory devices are organised as an array of programmable and readable memory pages, comprising data blocks of some kilobytes size. If data are to be written into a NAND flash memory device, it is necessary to program a full page of some kilobytes size (PAGESIZE). Additionally, the flash device needs a non-negligible processing time for writing such page into its internal memory array. During this programming time no other read/write commands can be executed on that flash device. Therefore the memory bus resources connecting the flash device with its controller will be unused during most of the time. To optimise utilisation of memory bus resources, it is known to connect multiple NAND flash devices to the same address/memory bus and to use them in an interleaved manner: the flash devices are processed one after the other, and while the first device is busy following a programming command, the memory bus resources are used to handle the other devices sharing the same memory bus. Manufacturers of NAND flash devices are supporting such kind of processing by integrating multiple dies of a flash device in one integrated circuit (IC), sharing a common external memory bus. Therefore such interleaved processing is feasible on a single IC. Depending on the time required for the programming/reading operations, it is possible to choose the number of interleaved devices in such a way that the bandwidth of the controlling memory bus is used in an optimum manner. For example, current NAND flash devices may have an 8-bit memory bus as external interface that can be driven with a speed of 40 MHz. The memory bus resource has a full bandwidth of approximately 40 MB/s, but the NAND flash device is written by programming operations of 2 KB pages that may last up to 600 μs. This will result in a sustained bandwidth of approximately 3.2 MB/s. In order to use the full 40 MB/s bandwidth of the memory bus resource, it would be necessary to connect 12 NAND flash devices to the memory bus and to use them in an interleaved manner.
To provide bandwidths higher than one memory bus can handle, data are written in parallel to multiple memory buses, whereby multiple flash pages on different flash devices are programmed concurrently. A corresponding structure of flash memories is shown in
This kind of flash memory arrangement has the advantage that almost unlimited bandwidths can be provided for flash storage media. But increasing bandwidth leads also to an increasing amount of data that need to be written coherently. Only after a block of Y*PAGESIZE of data is available, these data can be programmed to the corresponding NAND flash memory devices on all Y parallel memory buses. To guarantee a specific minimum read or write bandwidth for input/output data 10, it is necessary to read/write the data in an interleaved manner as mentioned above. That implies the need to read/write sequential blocks of a size X*Y*PAGESIZE.
When accessing only a single data stream or file, there is no problem in ensuring a sequential read/write or recording and playback, respectively. But in case multiple concurrent data streams are to be written to, or read from, the NAND flash array, some effort is required to comply with these conditions.
One solution could be to buffer data of each data stream in an independent buffer, and to write data of the different streams consecutively to the flash memory array only after one buffer contains an amount of Y*PAGESIZE data. However, such processing would scramble the memory pages of different data streams into sequential ‘interleaved blocks’, whereby the interleaving is guaranteed when the data streams are written to and read from the flash memory array concurrently. But in case a single data stream only shall be read from the flash array, the interleaving is not ensured. Additionally, consecutive flash pages in a NAND flash memory are coupled to ‘Erase Blocks’, and in case one separate data stream is to be erased it would be necessary to implement a wasteful procedure for preserving the data pages of the other data streams. For implementing a corresponding solution using double buffering processing, a buffer of size 2*Y*PAGESIZE*NUMBER_OF_STREAMS would be required.
To avoid scrambled data in the interleaved blocks and in the erasing blocks, it is necessary to handle the different data streams as independent data blocks in the flash array. For writing incoming data streams independently in the flash array while still fulfilling the interleaving requirements, the incoming data must be buffered until the buffer of one specific data stream has collected enough data for writing a full ‘interleaving block’ in the flash memory array. But that solution would require a large buffer size of 2*Y*X*PAGESIZE*NUMBER_OF_STREAMS.
A problem to be solved by the invention is to provide a high-speed real-time flash memory processing in which multiple incoming data streams can be concurrently written to, or read from, independent areas of a flash memory array using a minimum buffer size only, and wherein interleaving requirements are fulfilled for each stream independently. This problem is solved by the methods disclosed in claims 1 and 3. An apparatus that utilises this method is disclosed in claims 2 and 4.
According to the invention, the Y parallel memory buses are not used as strictly parallel memory buses, but as serial data lanes (i.e. memory buses). It is not necessary to buffer data until the amount of incoming data of one data stream will suffice writing corresponding data pages on all Y memory buses. Instead, data are written in the flash memory array as soon as one of the data streams has enough data buffered for writing a full ‘interleaving block’ on one memory bus. In combination with a smart buffer control it is possible to allocate and use a minimal number of small buffers in a flexible way. Data of the different data streams are also concurrently written to the memory buses of the flash memory array and, depending on the receiving time of the data, it is advantageously possible to handle storage, or replay, of the data streams in a more flexible and effective manner.
In principle, the inventive method is suited for storing or recording at least two data streams or files into an array of memories, wherein said array is arranged such that the memories are accessed via multiple memory buses to each of which multiple memories are connected, and wherein only blocks of data can be programmed in a memory at a time, said method including the steps:
a) collecting data from different ones of said data streams in corresponding different buffers until the amount of collected data for a current one of said data streams in a current buffer equals the size of a current one of said data blocks, wherein the number of buffers is different than the number of data streams;
b) storing the data of said current data block from that current buffer into a memory or memories connected to a current one of said memory buses, wherein the following buffered data block of the related data stream is later on stored into a memory or memories connected to a following one of said memory buses, the number of said following memory bus being increased or decreased, respectively, with respect to the number of the current memory bus;
c) repeatedly performing steps a) and b), also for the other ones of said data streams using other available ones of said buffers and other ones of said memory buses.
In principle the inventive apparatus is suited for storing or recording at least two data streams or files into an array of memories, wherein said array is arranged such that the memories are accessed via multiple memory buses to each of which multiple memories are connected, and wherein only blocks of data can be programmed in a memory at a time, said apparatus including:
In principle, the inventive method is suited for reading or replaying at least two data streams or files from an array of memories, wherein said array is arranged such that the memories are accessed via multiple memory buses to each of which multiple memories are connected, and wherein only blocks of data can be read from a memory at a time, said method including the steps:
a) reading data of a current data block from a memory or memories connected to a current one of said memory buses and storing them into a current buffer, wherein the following data block of the related data stream is later on read from a memory or memories connected to a following one of said memory buses, the number of said following memory bus being increased or decreased, respectively, with respect to the number of the current memory bus;
b) assembling data of different ones of said data streams from corresponding different buffers, wherein the number of buffers is different than the number of data streams;
c) repeatedly performing steps a) and b), also for the other ones of said data streams using other available ones of said buffers and other ones of said memory buses.
In principle the inventive apparatus is suited for reading or replaying at least two data streams or files from an array of memories, wherein said array is arranged such that the memories are accessed via multiple memory buses to each of which multiple memories are connected, and wherein only blocks of data can be read from a memory at a time, said apparatus including:
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
Accessing multiple data streams or files in the flash memory array requires as much flexibility for data block access as possible. For example, the format of the multiple data streams of the data input or output can conform the Ethernet protocol whereas the data for storage in, or read from, the memory array must be addressed according to a different protocol or format. To provide high-bandwidth access to a NAND flash memory array with a minimum of buffer resources and a maximum of flexibility in data stream access, the invention deviates from the known strictly parallel handling of memory buses. Instead, the parallel memory buses are regarded as representing serial memory buses (i.e. lanes) that are fed, or read, by concurrent memory bus processing engines. The known parallelism is broken with respect to temporal direction because the processing of the memory buses no more starts and ends at the same time instants. The known parallelism is also broken with respect to processing location because different memory buses can access different data streams in a different or flexible way.
Incoming data of different data streams, which data streams can have differing data rates, are buffered in an independent FIFO buffer for each data stream. As soon as the amount of buffered data of one data stream is beginning to exceed the size of one ‘interleaving block’(X*PAGESIZE) on one memory bus, these already stored data are written at the actual position (i.e. the current position at which the data stream is to be written or its writing is to be continued) on the current memory bus for that data stream. Successive ‘interleaving blocks’ of that data stream are distributed in an ascending order (or, as an alternative embodiment, in descending order) to the Y memory buses, i.e. a first data block of a particular data stream is written to memory bus 1, the second data block to e.g. memory bus 2, the third data block to e.g. memory bus 3, . . . , the Yth data block to e.g. memory bus Y, and (Y+1)th data block again to memory bus 1, and so on. Thus, because of concurrent memory bus processing, a parallelism in processing on the memory buses for a given data stream is remaining in order to achieve the desired bandwidth for that data stream read/write access. Advantageously, such regular structure of memory bus accesses is minimising the demands on a file system that is required for arranging data streams/files on a recording storage medium. This procedure supports the parallelism of data streams because, depending on the current incoming data, different memory buses can access different data streams concurrently.
The inventive processing can be implemented in a device as shown in the block diagram of
The processing of the received stream data in memory controller 11 is shown in
For such processing it is assumed that data of the different streams is received randomly and is written with separate regular structures into the memory array. To minimise the required buffer size X*PAGESIZE, the special properties of streaming data can be used. Given that the flash memory fabric is designed to achieve at least the write data rate of the sum of the data rates of the incoming data streams (whereby it is assumed that one memory bus can handle 1/Y of this data rate), one would need N*Y FIFO buffers for buffering the incoming data for N simultaneous streams and in addition N+1 FIFO buffers for bridging the time gap until the first buffer can be re-used. So, an amount of N*Y+N+1 buffers of size X*PAGESIZE is required, wherein N is the number of concurrent data streams, Y is the number of memory buses and X is the number of interleaved flash devices. That amount of buffers would be needed for processing the worst case: all data streams are to be written into the same memory buses concurrently. However, with the above described procedure it is additionally possible to reduce the number of buffers by the minimum number of freed buffers: as soon as the content of a buffer starts to be written into the flash memory devices, an amount of Y buffers is filled, until that buffer is freed. Assuming that an amount of N−1 buffers may need to wait (while another data stream is written) before their content can be written into the memory bus, an amount of
freed buffers will result.
Therefore, when using the processing described above, a total amount of
buffers of size X*PAGESIZE is required to guarantee availability of the full data rate under all conditions.
In a further embodiment, N data streams having equal data rates are recorded concurrently.
Each one of these data streams has a maximum data rate of Y/N*DATARATE_OF_ONE_FLASH_BUS. Advantageously it is possible to minimise the required number of buffers (i.e. FIFOs) to Y+N+1 when preventing that data streams are written concurrently into a single one of the memory buses. That is accomplished by starting the writing of the different data streams in every processing cycle on different memory buses. The processing described in connection with
For every received data stream i (step 31) a reference value zcurrent=actual_fifo(i) to its current buffer is provided by controller 11. The memory bus to which the next data of a stream i must be sent to is registered in value actual_bus(i). In step 32, the stream i data are forwarded to the actual buffer data input actual_fifo(i).datain as long as a corresponding flag actual_fifo(i).full is not set (step 33). As soon as buffer actual_fifo(i) is full, it is checked in step 37 whether or not data stream i is accessed for the first time. If true, the value actual_bus(i) is set in step 38 to the sum of values last_accessed_bus and Y/N. The buffer data are passed in step 34 to memory bus processing engine actual_bus(i). In step 35 the value last_accessed_bus is set to the value actual_bus(i), and value actual_bus(i) is incremented to actual_bus(i+1) (or decremented to actual_bus(i−1) in a different embodiment), in a circular manner among memory bus numbers MB1 to MBY. In step 36, value actual_fifo(i) is set to the number znext of the next free buffer among buffer numbers 1 to Z, and in step 31 the following data block of stream i is received by controller 11.
For replay of the data stream or streams stored in memory devices MD1.1 to MDY.X, the corresponding inverse steps are carried out. Memory bus processing engine 13 number receives the desired data from memory bus actual_bus(i) and converts the incoming data format or timing or order of sequence into the data format required for output to the current buffer number actual_fifo(i). Besides controlling the memory bus processing engines and the buffers, controller 11 assembles the data output from the buffers to the data streams 10 in the desired format.
The invention facilitates read/write access for multiple data streams in real-time. An access faster than real-time to separate data streams is possible. It is still possible to read or replay the data streams with maximum bandwidth. It is still possible to delete data streams independently from each other.
The invention can be used in all block-based storage media that are organised in parallel data paths.
Number | Date | Country | Kind |
---|---|---|---|
10305890 | Aug 2010 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2011/063619 | 8/8/2011 | WO | 00 | 2/11/2013 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2012/019997 | 2/16/2012 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6226708 | McGoldrick et al. | May 2001 | B1 |
20020124129 | Zilberman | Sep 2002 | A1 |
20030028704 | Mukaida et al. | Feb 2003 | A1 |
20040193782 | Bordui | Sep 2004 | A1 |
20050010717 | Ng et al. | Jan 2005 | A1 |
20050251617 | Sinclair et al. | Nov 2005 | A1 |
20070038808 | Yim et al. | Feb 2007 | A1 |
20070233939 | Kim | Oct 2007 | A1 |
20070288687 | Panabaker | Dec 2007 | A1 |
20080034153 | Lee et al. | Feb 2008 | A1 |
20080133822 | Saxe | Jun 2008 | A1 |
20100088463 | Im et al. | Apr 2010 | A1 |
20100318689 | Brune et al. | Dec 2010 | A1 |
Number | Date | Country |
---|---|---|
WO0246929 | Jun 2002 | WO |
Entry |
---|
Seong, Yoon Jae etal: “Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture” IEEE Transactions, vol 59, No. 7, Jul. 2010, pp. 905-921. |
Kim, Jae-Hong et al: “A methodology for extracting performance parameters in solid state disks (SSDs)”, Mascots '09, Sep. 21, 2009, IDDD, pp. 1-10. |
Search Report dated Sep. 7, 2011. |
Number | Date | Country | |
---|---|---|---|
20130138875 A1 | May 2013 | US |