Non-volatile memory (NVM) provides a benefit to a user in that it can retrieve and retain stored information even after a power cycle. Non-volatile memory is typically used for secondary or tertiary storage as it provides lower performance including higher latency and/or lower throughput over volatile dynamic random access memory (DRAM).
Novel NVM technologies may improve performance by lowering latency, but may require ancillary challenges to be properly used or harness the lower latency and/or higher throughput. Any integration with these novel NVM technologies must overcome these ancillary challenges, while managing the lower latency and/or higher throughput of the device.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Using an field programmable gate array (FPGA) for integration with a lower latency NVM is disclosed. An FPGA is a gate array device that may be programmed “in the field”, after the gate array device has been manufactured. This is useful for integration with emerging technologies as an FPGA has the ability to reprogram once deployed. The utility of being able to do that with respect to emerging NVM device technologies is high: Emerging NVM technologies may not be fully or precisely characterized and/or the characteristics of which may change through and after development and integration design.
Traditionally a software programmable controller system-on-chip (SOC) is used for NAND-based flash memory, for example a PMC Princeton PCIe NVMe controller. Such a software programmable controller uses an internal CPU with firmware to integrate 8-32 independent flash channels, each for a flash memory device, with a host interface, typically PCIe, SATA, and/or SAS. An external SDRAM buffer, typically using DDR3 protocol, may be integrated to provide throughput matching between the flash channel bus and the host interface bus.
Recent NVM technologies include higher throughput and/or lower latency memories. In one embodiment, a lower latency NVM includes next generation memory, for example a memory technology that is a transistorless and/or resistive-based memory.
In one embodiment, a higher throughput/lower latency NVM provides the following advantages over flash memory:
In one embodiment, a higher throughput/lower latency NVM may require differing procedures over traditional NAND-based flash NVM to handle:
Throughout this specification a memory is fully “bit-addressable” if it allows a bit to be rewritten without having to rewrite other bits, for example read-modify-write as with flash memory.
An FPGA controller is a novel approach at controlling an emerging NVM technology to provide new parallelizable logic as new physical phenomena are discovered about the device. For example, flash traditionally uses a 40-bit BCH error correction coding (ECC) which research has determined to be most effective for channel coding for the NAND flash media. For emerging NVM technology it is conceivable not only that ECC parameters (40-bit to 64-bit, e.g.) but ECC algorithms/logic structures themselves (BCH to Reed Solomon, e.g.) may be updated as device statistics and error correlation characterization evolve over time.
With at least fifty times the performance of traditional NAND-based flash, emerging NVM technology may not be handled by existing software programmable controller SOC approaches. For example while a flash channel may be four bytes wide, an emerging NVM may be sixteen bytes wide. With up to 16 devices at, for example, currently at 150 ns/byte, a controller may need to handle 6.6 MB/sec per device or 106 MB/sec per controller. For control module (102) with 36 memory modules (106) this results in a 3.5 GB/s throughput chassis. An FPGA-based controller can handle such high throughput by parallelizing logic on a per channel basis, for example.
The system comprises a control module (102) coupled to both a host/user/client, a service module (104) and a plurality of memory modules (106a-106z). In one embodiment, up to 36 memory modules (106) may be used. The control module uses a high throughput bus to couple to the host, for example PCIe Gen 3 with between x4 and x96 lanes. Within the control module (102) there are one or more processors (114) that are external to the memory modules (106a)-(106z). An external processor (114) may have one or more processing cores. An external processor (114) may be coupled internally using a lower throughput bus, for example PCIe 2.0 with x1 lane.
The control module (102) is coupled by PCIe to a memory module (106z), which comprises: an FPGA controller (116); a non-volatile memory media (118), and an associated DDR4 buffer/cache (120). In one embodiment, a non-volatile memory media (118) may include next generation non-volatile memory.
In one embodiment, the FPGA (116) interfaces hostward via a PCIe bus, for example dual PCIe Gen3 x4, for a data rate up to 6 GB/s. The FPGA (116) interfaces using DDR4 with the SDRAM cache (120). The function of the settling time cache is to provide a host with a cache for reading Within the settled time window. A typical settling time is 1 usec to 100 ms, so to enable a cache deep enough to store data for up to 100 ms and fast enough to absorb data at 6 GB/s, the settling time cache must be at least 6 GB/s×100 ms=600 MB in size with a 6 GB/s read/write bandwidth. A communication interface (210), for example an SDRAM controller logic block, within FPGA (116) is used to provide a high data transfer rate connection to the external RAM (120).
In one embodiment, DDR4 is also used to interface each device (118a, 118b, 118c, . . . 118z) to the FPGA (116). With a potential mismatch between the DDR4 deviceward bus and PCIe hostward bus, one or more RAM buffers (212) may be used to handle the throughput mismatch between the respective busses. FPGA field programmable logic blocks are used to provide one or more access modules (214) which provide access functions to the NVM modules (118a-118z). FPGA field programmable logic blocks are used to provide one or more management modules (216) which provide management functions to the NVM modules (118a-118z).
In one embodiment, the management function for a module (216) is forward error correction and/or ECC. For example, a BCH algorithm used for flash memory is based on error pattern and statistical error correlations that may be inappropriate for emerging NVM technologies. Forward error correction and/or ECC algorithms/machines/modules may be designed and improved by updating the FPGA and/or module (216) even after being deployed in the field.
In one embodiment, the management function for a module (216) is wear leveling. Unlike flash memory which requires read-modify-write over blocks of flash, emerging NVM may enable fully bit-addressable rewrite, for write-in-place. The module (216) may include field programmable algorithms/modules for wear detection and wear migration that resolve wear bit-by-bit.
In one embodiment, the management function for a module (216) is tile replacement. Using an FPGA (116) allows emerging NVM technology vendors to insert proprietary modules (216) to indicate when memory tiles are failing and require replacement. A tile replacement module (216) may take this indication and provide tile migration.
In one embodiment, the management function for a module (216) is media scheduling control. For example, a start-gap algorithm may be used when emerging NVM technologies only require mild wear leveling to avoid the higher complexity and lower performance of table-based methods for wear leveling. The start-gap uses a start and gap addressing register to gently migrate read/write access to spread out access, for example uniformly, over time to avoid ‘hot spots’.
In one embodiment, the communication interface (210) and/or management function for module (216) includes a write settling time cache controller. For example, a settling period for an emerging NVM technology may increase from 1 usec to 100 ms. Using an FPGA (116) allows flexibility not only for caching algorithms, but also cache size and flexibility for new physical settling time characteristics to permit a new settling time cache characteristic.
In one embodiment, the access module (214) comprises a new NVM interface protocol to succeed DDR4. In one embodiment, the access module (214) comprises a new host interface protocol to succeed PCIe/NVMe, for example a proprietary protocol such as NVMd. Having a flexibility to change the interface protocols or other management modules (216) in the field based on new physical insights for the emerging NVM technology associated with the NVM modules (118a-118z) may provide dynamic improvements to reliability and performance of the memory module (106).
In one embodiment, a read access to given cell in the emerging NVM technology may affect the payload of the cell and/or one or more physically adjacent cells. For example for next generation memory, cells may be adjacent in three-dimensions, such that there are at least six near neighbors along the X-axis, Y-axis, and Z-axis. The phenomena where reading a cell causes errors in the cell or its adjacent cells is termed throughout this specification as “read disturb”. Similarly, a write access to given cell may affect all physically adjacent cells and the phenomena where writing a cell causes errors in the cell or its adjacent cells is termed throughout this specification as “write disturb”. The management module (216) may include algorithms/modules to minimize the deleterious effects of read disturb and write disturb.
In step 304, a plurality of programmable logic blocks are programmed in the first configuration to perform one or both of the access function and the management function with respect to a plurality of non-volatile memory modules.
In step 402, an indication of a new value for an NVM device (118) characteristic is received. For example, the manufacturer of an NVM device (118), for higher statistical reliability, may change the specification of the settling time cache from 1 ms to 100 ms. In step 404, a new algorithm and/or parameter is determined to reflect the changed value. In the example above, previously the settling time cache size was 6 GB/s×1 ms=6 MB. This may have fit within SDRAM available within the FPGA (116). An increase of one hundred times for the settling period may result in using the external SDRAM (120), and reprogramming the FPGA (116) to configure logic blocks for a communication interface (210) to handle the 600 MB settling time cache.
In step 406, the FPGA device (116) is reprogrammed in place (i.e. in the field) to implement the new algorithm instead of the previously-implemented algorithm. In the example above, the FPGA (116) is reprogrammed to expand the settling time cache from 6 MB to 600 MB.
In step 502, instructions are received to program and/or reprogram one or more logic blocks on FPGA (116). In one embodiment, instructions are received via a file transmitted to the field by physical media (e.g. a hard drive or thumb drive) or over the internet (e.g. email, web address, ftp site, etc).
In step 504, logic blocks in the FPGA (116) are implemented according to the instructions of step 502. In one embodiment, a hardware and/or software device is used to reconfigure the FPGA (116) via a logic block configuration bitstream, for example via an SD card, JTAG port, SPI interface, etc.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Number | Name | Date | Kind |
---|---|---|---|
6782410 | Bhagat | Aug 2004 | B1 |
6842377 | Takano | Jan 2005 | B2 |
7017011 | Lesmanne | Mar 2006 | B2 |
7196942 | Khurana | Mar 2007 | B2 |
7269715 | Le | Sep 2007 | B2 |
7321955 | Ohmura | Jan 2008 | B2 |
7512736 | Overby | Mar 2009 | B1 |
8230193 | Klemm | Jul 2012 | B2 |
8566546 | Marshak | Oct 2013 | B1 |
8990527 | Linstead | Mar 2015 | B1 |
20010052038 | Fallon | Dec 2001 | A1 |
20030084244 | Paulraj | May 2003 | A1 |
20040068621 | Van Doren | Apr 2004 | A1 |
20050125607 | Chefalas | Jun 2005 | A1 |
20060202999 | Thornton | Sep 2006 | A1 |
20070008328 | MacWilliams | Jan 2007 | A1 |
20090094413 | Lehr | Apr 2009 | A1 |
20090228648 | Wack | Sep 2009 | A1 |
20090282101 | Lim | Nov 2009 | A1 |
20100046267 | Yan | Feb 2010 | A1 |
20100050016 | Franklin | Feb 2010 | A1 |
20100125712 | Murase | May 2010 | A1 |
20100142243 | Baxter | Jun 2010 | A1 |
20100241785 | Chen | Sep 2010 | A1 |
20100332780 | Furuya | Dec 2010 | A1 |
20110202735 | Kono | Aug 2011 | A1 |
20110307745 | McCune | Dec 2011 | A1 |
20120110293 | Yang | May 2012 | A1 |
20120198107 | McKean | Aug 2012 | A1 |
20130007373 | Beckmann | Jan 2013 | A1 |
20130067161 | Chandra | Mar 2013 | A1 |
20130080805 | Vick | Mar 2013 | A1 |
20130111129 | Maki | May 2013 | A1 |
20130152097 | Boctor | Jun 2013 | A1 |
20130326270 | Chen | Dec 2013 | A1 |
20140003114 | Pellizzer | Jan 2014 | A1 |
20160217835 | Blott | Jul 2016 | A1 |
20160313914 | Koets | Oct 2016 | A1 |
20170123995 | Freyensee | May 2017 | A1 |
Entry |
---|
Author Unknown, PMC, Flashtec, PCLE Flash Controllers, Product Brief, Flashtec NVMe2032 & NVMe2016 Controllers, 32 and 16 Channel PCI Express Flash Controller Products, 2015, Issue 2. |
Author Unknown, Xilinx, Spartan-6 FPGA Memory Controller User Guide, UG388 (v2.3) Aug. 9, 2013. |