As is known in the art, storage class memory (SCM) devices can include flash-type memory, such as NAND flash and other non-volatile memory technologies. Such devices are desirable in a wide range of applications. As is also know, a variety of bus standards and protocols can be used be transfer data between applications running on processors located on hosts and peripheral devices. It is desirable to continually improve the performance of computer system that utilize significant amounts of memory.
In one aspect of the invention, a method comprises: providing application access to a flash device having page cache memory and storage class memory via a bus by mapping a user process virtual address space, wherein the process for the application resides on a host having a processor with direct cache-line access to the page cache memory, wherein the user process virtual address space includes at least a partial mapping of physical address windows for one or more separate flash devices.
The method can further include one or more of the following features: the bus comprises a PCIe bus, the bus comprises a memory channel, the mapping of a flash device is performed by a common device driver for all device mappings, a memory window for a full mapping corresponds to the full addressable size of a flash device, and/or direct access by the application of the page cache memory upon return from a page fault.
In another aspect of the invention, an article comprises: a non-transitory computer-readable medium having stored instructions that enable a machine to: provide application access to a flash device having page cache memory and storage class memory via a bus by mapping a user process virtual address space, wherein the process for the application resides on a host having a processor with direct cache-line access to the page cache memory, wherein the user process virtual address space includes at least a partial mapping of physical address windows for one or more separate flash devices.
The article can further include one or more of the following features: the bus comprises a PCIe bus, the bus comprises a memory channel, the mapping of a flash device is performed by a common device driver for all device mappings, a memory window for a full mapping corresponds to the full addressable size of a flash device, and/or instructions for direct access by the application of the page cache memory upon return from a page fault.
In a further aspect of the invention, a system comprises: a processor; and a memory coupled to the processor, wherein the processor and the memory are configured to: providing application access to a flash device having page cache memory and storage class memory via a bus by mapping a user process virtual address space, wherein the process for the application resides on a host having a processor with direct cache-line access to the page cache memory, wherein the user process virtual address space includes at least a partial mapping of physical address windows for one or more separate flash devices.
The system can further include one or more of the following features: the bus comprises a PCIe bus, the bus comprises a memory channel, a memory window for a full mapping corresponds to the full addressable size of a flash device, and/or direct access by the application of the page cache memory upon return from a page fault.
The systems and methods sought to be protected herein may be more fully understood from the following detailed description of the drawings, in which:
The phrases “computer,” “computing system,” “computing environment,” “processing platform,” “data memory and storage system,” and “data memory and storage system environment” as used herein with respect to various embodiments are intended to be broadly construed, so as to encompass, for example, private or public cloud computing or storage systems, or parts thereof, as well as other types of systems comprising distributed virtual infrastructure and those not comprising virtual infrastructure. In addition, while particular vendor configurations, terminology, and standards, e.g., PCIe, and the like, are used herein, it understood that these are used to facilitate an understanding of the embodiments described herein and should not be construed to limit the scope of the invention.
The terms “application,” “program,” “application program,” and “computer application program” herein refer to any type of software application, including desktop applications, server applications, database applications, and mobile applications. The terms “application process” and “process” refer to an instance of an application that is being executed within a computing environment. As used herein, the terms “processing thread” and “thread” refer to a sequence of computer instructions which can execute concurrently (or in parallel) with one or more other such sequences.
The term “memory” herein refers to any type of computer memory accessed by an application using memory access programming semantics, including, by way of example, dynamic random access memory (DRAM) and memory-mapped files. Typically, reads or writes to underlying devices are performed by an operating system (OS), not the application. As used herein, the term “storage” refers to any resource that is accessed by the application via input/output (I/O) device semantics, such as read and write system calls. In certain instances, the same physical hardware device could be accessed by the application as either memory or as storage.
As is understood by one or ordinary skill in the art, memory management in a computer system can include paging to store and retrieve data from secondary storage, i.e., external storage for use in main memory. The operating system retrieves data from secondary storage in blocks referred to as pages. Paging is useful in systems having virtual memory to allow the use of secondary storage for data that do not fit into physical random-access memory (RAM).
Paging is performed when a program tries to access pages that are not currently mapped to physical memory, which is referred to as a page fault. When a page fault occurs, in conventional systems the operating system must take control and handle the page fault transparently to the application generating the page fault. In general, the operating system determines the location of the data in secondary storage and obtains an empty page frame in physical memory to use as a container for the data. The requested data is then loaded into the available page frame and page table is updated to refer to the new page frame. Control is returned to the program for transparently retrying the instruction that caused the page fault.
Virtual memory is divided to provide a virtual address space having pages, i.e., blocks of contiguous virtual memory addresses. Page tables are used to translate the virtual addresses seen by the application into physical addresses used by the hardware to process instructions. Page table entries include a flag indicating whether the corresponding page is in physical memory so that for physical memory the page table entry contains the physical memory address at which the page is stored. If a referenced page table entry indicates that it is not currently in physical memory, a page fault exception is generated to be handled by the operating system.
In one aspect of the invention, an adapter, which can be provided as a PCIe adapter, includes flash memory and a page cache, which can be provided as DRAM. Data transfer from flash to page cache on the adapter is localized in comparison to DRAM memory on a host running an application.
In embodiments, memory mapped regions are fronted by a page cache to which an application issues loads and stores. The page cache, which can be provided as DRAM memory, is located on a flash device, such as a PCIe SCM device, and given direct cache-line access from processors. With this configuration, page transfers between the SCM chips and page caches are localized on the PCIe adapter reducing the PCIe bus utilization. On a page fault, the mapping and management of virtual to physical pages is still managed by an OS driver which in turn cooperatively manages translation tables and cache evictions on the PCIe adapter. In one embodiment, upon return from a page fault, the application directly accesses data from the DRAM on the PCIe SCM device directly.
As shown in
The adapter 406 includes a controller 410 to control access to NAND, for example, chips 400 and a DRAM interface 412 to enable access to the DRAM 402. A memory controller logic module 414 provides large PCI memory windows using a management firmware module 416 and a dual core reduced instruction set computing (RISC) system 418, for example to control overall operation of the device 400. Pages in a ‘pending’ state cause the host to wait for PCIe completion. This allows the host to immediately map pages before they are up to date. The RISC processor 418 provides flash control, MMU TLB miss handling, flash conditioning, etc. The flash controller 410 read/writes flash pages into DRAM 402 via the controller. Many pages can be loaded/flushed in parallel as needed.
With regard to a memory channel or other interface, as shown in
The full flash device mapping 504 and the region of flash mapping 506 are mapped into the process address space by a respective device driver 510a,b. The mappings are directly mapped within the physical address window of the PCI express root complex 512. The PCI Express Flash Device A 514, Device B 516, and Device C 518 are each configured to respond to a full 2 TB, for example, PCI express memory window. Each memory window corresponds to the actual size of each flash device. The Flash devices 514, 516, 518, communicate via a bus/fabric 520.
The full flash device mapping 504 is a complete map of the entire PCI Express Flash Device A 514 into the process virtual address space. The region of flash mapping 506 is a partial map of a portion of PCI Express Flash Device C 518. The portion is typically a partition, or a region of the physical device.
An application load 524 or store 526 to flash device A mapping 504 is marked as cacheable memory 528 and causes the host to fill/flush full cache lines. The virtual address cache line fill/flush requests are validated/translated through the TLB/MMU 530 to a PCI Express physical address and forwarded over the PCI express bus 520 to the PCI express flash device A 514.
An application load 532 or store 534 to flash device C mapping 506 has substantially similar processing including fill/flush full cache lines 536 and TLB/MMU translation 538. However, the device mapping 506 is a partial device mapping. This demonstrates that devices can be fully mapped into a user's address space, or only partially mapped. A partition or a region within a device can be mapped by the user if desired.
While embodiments of the invention are shown and described in conjunction with a PCIe adapter, it is understood that other bus standards can be used without departing from the scope of the claimed invention.
Processing may be implemented in hardware, software, or a combination of the two. Processing may be implemented in computer programs executed on programmable computers/machines that each includes a processor, a storage medium or other article of manufacture that is readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and one or more output devices. Program code may be applied to data entered using an input device to perform processing and to generate output information.
The system can perform processing, at least in part, via a computer program product, (e.g., in a machine-readable storage device), for execution by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers). Each such program may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the programs may be implemented in assembly or machine language. The language may be a compiled or an interpreted language and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network. A computer program may be stored on a storage medium or device (e.g., CD-ROM, hard disk, or magnetic diskette) that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer. Processing may also be implemented as a machine-readable storage medium, configured with a computer program, where upon execution, instructions in the computer program cause the computer to operate. Processing may be performed by one or more programmable processors executing one or more computer programs to perform the functions of the system. All or part of the system may be implemented as special purpose logic circuitry (e.g., an FPGA (field programmable gate array) and/or an ASIC (application-specific integrated circuit)).
All references cited herein are hereby incorporated herein by reference in their entirety. A non-transitory machine-readable medium may include but is not limited to a hard drive, compact disc, flash memory, non-volatile memory, volatile memory, magnetic diskette and so forth but does not include a transitory signal per se.
Having described certain embodiments, which serve to illustrate various systems and methods sought to be protected herein, it will now become apparent to those of ordinary skill in the art that other embodiments incorporating these concepts, structures, and techniques may be used. Accordingly, it is submitted that that scope of the patent should not be limited to the described embodiments but rather should be limited only by the spirit and scope of the following claims.
The present application claims the benefit of U.S. Provisional Patent Application No. 62/004,163, filed on May 28, 2014, which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4429363 | Duke et al. | Jan 1984 | A |
5276867 | Kenley et al. | Jan 1994 | A |
5386547 | Jouppi | Jan 1995 | A |
5440686 | Dahman et al. | Aug 1995 | A |
5564035 | Lai | Oct 1996 | A |
5664153 | Farrell | Sep 1997 | A |
5809560 | Schneider | Sep 1998 | A |
6205528 | Kingsbury et al. | Mar 2001 | B1 |
6487562 | Mason, Jr. et al. | Nov 2002 | B1 |
6618793 | Rozario et al. | Sep 2003 | B2 |
6629200 | Kanamaru et al. | Sep 2003 | B1 |
6633954 | Don et al. | Oct 2003 | B1 |
6715037 | Malcolm | Mar 2004 | B2 |
6728837 | Wilkes et al. | Apr 2004 | B2 |
6732242 | Hill et al. | May 2004 | B2 |
6748494 | Yashiro | Jun 2004 | B1 |
6795894 | Neufeld et al. | Sep 2004 | B1 |
6829698 | Arimilli et al. | Dec 2004 | B2 |
6829762 | Arimilli et al. | Dec 2004 | B2 |
6842847 | Arimilli et al. | Jan 2005 | B2 |
6851024 | Wilkes et al. | Feb 2005 | B1 |
6920514 | Arimilli et al. | Jul 2005 | B2 |
6925551 | Arimilli et al. | Aug 2005 | B2 |
7017031 | Arimilli et al. | Mar 2006 | B2 |
7213005 | Mourad et al. | May 2007 | B2 |
7213248 | Arimilli et al. | May 2007 | B2 |
7644239 | Ergan et al. | Jan 2010 | B2 |
7856530 | Mu | Dec 2010 | B1 |
8438339 | Krishna et al. | May 2013 | B2 |
9021048 | Luna et al. | Apr 2015 | B2 |
9092156 | Xu et al. | Jul 2015 | B1 |
20020010836 | Barroso et al. | Jan 2002 | A1 |
20020038391 | Ido et al. | Mar 2002 | A1 |
20030101320 | Chauvel et al. | May 2003 | A1 |
20050091457 | Auld et al. | Apr 2005 | A1 |
20050097272 | Jiang et al. | May 2005 | A1 |
20050172098 | Worley | Aug 2005 | A1 |
20060218349 | Oe et al. | Sep 2006 | A1 |
20070011420 | Boss et al. | Jan 2007 | A1 |
20080256294 | Gill | Oct 2008 | A1 |
20100325352 | Schuette et al. | Dec 2010 | A1 |
20110099335 | Scott et al. | Apr 2011 | A1 |
20110161589 | Guthrie et al. | Jun 2011 | A1 |
20120297113 | Belluomini et al. | Nov 2012 | A1 |
20120317312 | Elko et al. | Dec 2012 | A1 |
20130254462 | Whyte | Sep 2013 | A1 |
20140013053 | Sawin et al. | Jan 2014 | A1 |
20140019650 | Li et al. | Jan 2014 | A1 |
20140082288 | Beard et al. | Mar 2014 | A1 |
20140101370 | Chu et al. | Apr 2014 | A1 |
20140115256 | Liu et al. | Apr 2014 | A1 |
20140129779 | Frachtenberg et al. | May 2014 | A1 |
20140156935 | Raikin et al. | Jun 2014 | A1 |
20140223072 | Shivashankaraiah et al. | Aug 2014 | A1 |
20150026403 | Ish et al. | Jan 2015 | A1 |
20150178097 | Russinovich | Jun 2015 | A1 |
20150301931 | Ahmad et al. | Oct 2015 | A1 |
20150324294 | Ogawa et al. | Nov 2015 | A1 |
20150378953 | Debbage | Dec 2015 | A1 |
Entry |
---|
Office Action dated Dec. 11, 2015 for U.S. Appl. No. 14/319,440; 34 Pages. |
EMC Corporation, “EMC VSPEX with EMC XtremSF and EMC XtremSW Cache;” Design Guide; May 2013; 84 Pages. |
“Pointer Basics;” Retrieved on Dec. 17, 2015 from Stanford CS Education Library; https://web.archive.org/web/20120214194251/http://cslibrary.stanford.edu/106; 5 Pages. |
“Logical Unit Number (LUN);” Definition from Techopedia.com; Retrieved on Dec. 17, 2015; https://web.archive.org/web/20120113025245/http://www.techopedia.com/defin ition/321/logical-unit-umber-lun; 2 Pages. |
Cooney et al., “Prioritization for Cache Systems;” U.S. Appl. No. 14/319,440, filed Jun. 30, 2014; 23 Pages. |
Michaud et al., “Methods and Apparatus for Memory Tier Page Cache with Zero File;” U.S. Appl. No. 14/501,112, filed Sep. 30, 2014; 28 Pages. |
Clark et al., “Second Caches for Memory and Page Caches;” U.S. Appl. No. 14/564,420; filed Dec. 9, 2014; 21 Pages |
EMC Corporation, “Introduction to EMC Xtremcache;” White Paper; Nov. 2013; 33 Pages. |
EMC Corporation, “EMC VSPEX with EMC XTREMSF And EMC Xtremcache;” Design Guide; Dec. 2013; 95 Pages. |
Tam et al., “mlcache: A Flexible Multi-Lateral Cache Simulator;” Proceedings of the 6th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems; Jul. 19-24, 1998; pp. 19-26; 8 Pages. |
Response to U.S. Office Action dated Dec. 11, 2015 corresponding to U.S. Appl. No. 14/319,440; Response filed on Feb. 26, 2016; 33 Pages. |
U.S. Final Office Action dated Jun. 3, 2016 corresponding to U.S. Appl. No. 14/319,440; 46 Pages. |
Response to U.S. Office Action dated Feb. 23, 2016 corresponding to U.S. Appl. No. 14/501,112; Response filed on Jun. 2, 2016; 10 Pages. |
Office Action dated Feb. 23, 2016; For U.S. Appl. No. 14/501,112; 24 pages. |
U.S. Notice of Allowance dated Nov. 10, 2016 for U.S. Appl. No. 14/319,440; 21 Pages. |
Request for Continued Examination (RCE) and Response to Final Office Action dated Jul. 29, 2016 for U.S. Appl. No. 14/501,112; RCE and Response filed on Oct. 31, 2016; 20 Pages. |
Response to Office Action dated Aug. 3, 2016 for U.S. Appl. No. 14/564,420; Response filed Nov. 3, 2016; 16 Pages. |
Response to Final Office Action dated Jun. 3, 2016 corresponding to U.S. Appl. No. 14/319,440; Response filed Sep. 22, 2016; 18 Pages. |
Advisory Action dated Sep. 30, 2016 corresponding to U.S. Appl. No. 14/319,440; 3 Pages. |
U.S. Appl. No. 14/501,112 Final Office Action dated Jul. 29, 2016, 33 pages. |
U.S. Final Office Action dated Feb. 10, 2017 for U.S. Appl. No. 14/564,420; 29 Pages. |
U.S. Office Action dated Aug. 3, 2016 corresponding to U.S. Appl. No. 14/564,420; 31 Pages. |
U.S. Non-Final Office Action dated Mar. 24, 2017 for U.S. Appl. No. 14/501,112; 35 Pages. |
Number | Date | Country | |
---|---|---|---|
62004163 | May 2014 | US |