For purposes of analyzing relatively large data sets (often called “big data”), computer systems have ever-increasingly large main memories. One type of memory is a volatile memory, such as a Dynamic Random Access Memory (DRAM). A volatile memory loses its content in the event of a power loss. Moreover, the memory cells of certain volatile memories, such as the DRAM, are frequently refreshed to avoid data loss. Another type of memory is a non-volatile memory (NVM), which retains its data in the event of a power loss. The memory cells of an NVM retain their stored data without being refreshed.
A computer system may employ measures to protect data associated with applications executing on the system from being exposed to internal or external adversaries. One approach to protect data from one application from being visible to another application includes clearing, or “zeroing,” units of memory (pages of memory, for example) before the units are allocated to a new application. In this manner, the computer system may zero a given memory unit by writing zeros to all of the addressable locations of the unit. Due to the zeroing, the newly-allocated units of memory do not contain data traces left behind by other applications to which the units were previously allocated.
Non-Volatile Memories (NVMs) are ever-increasingly being used as replacements for volatile memories. As examples, NVMs include flash memories, memristors, phase change memories, ferroelectric random access memories (F-RAMs) and magnetoresistive random access memories (MRAMs), to name a few. In general, an NVM may have advantages over a volatile memory. For example, the NVM may be more scalable, as compared to a volatile memory, thereby providing a higher storage density. Other advantages may be that NVM cells are not refreshed (thereby not consuming refresh power); the NVM does not lose its content upon power loss; and the NVM allows for the potential of persistent data.
A potential challenge, however, with using zeroing to protect application data in an NVM-based computer system is that the NVM may have a relatively large write latency (i.e., an NVM device may take a relatively longer time to store data, as compared to a volatile memory device). Therefore, for example, zeroing an NVM page may consume more time than zeroing a page of volatile memory. Another potential challenge in zeroing NVM is that an NVM cell may be written a finite number of times before the cell is no longer usable. Therefore, the above-described zeroing approach may potentially impact the lifetime of the NVM.
Example implementations are disclosed herein in which a region of memory, such as a page, may be effectively zeroed without actually writing zeros to the memory. More specifically, in accordance with example implementations, a computer system includes a memory controller that manages access to a memory of the system based on initialization state indicators that are stored in a table of the memory controller. Using this approach, the memory controller may effectively initialize a region of the memory by updating the corresponding indicator(s) in the local table, instead of by, for example, writing zeros to the region.
More specifically, in accordance with example implementations, the memory controller maintains and uses a Zero Tracking Table (ZTT) for purposes of tracking regions of the memory that are zeroed. In accordance with example implementations, the computer system may use the ZTT to track and manage the zeroed status of regions of the memory, which correspond to cache line-aligned memory boundaries (called “cache line regions” herein). As described herein, the memory controller may also use the ZTT to zero out a page of memory (containing multiple cache line regions), without actually writing zeros or any other data to the memory. In this manner, instead of accessing the memory to zero out a page, the memory controller updates the ZTT so that the ZTT stores data that represents that the cache lines of the page have been zeroed.
Moreover, in accordance with example implementations, when a requestor submits a read request to read data from a cache line region, which the ZTT indicates is zeroed, the memory controller furnishes a cache line-sized block of zeros to the requestor, without actually accessing the memory. When a requestor writes to a zeroed cache line, the memory controller updates the ZTT so that the ZTT stores data that represents that the cache line region is no longer zeroed. It is noted that the written cache line might have the value of zero, but, in accordance with example implementations, the cache line is still marked in the ZTT table as being no longer zero.
As a more specific example,
As depicted in
The memory controller 130 controls the flow of data into and out of the memory 120 in response to requests 140 (read requests, write requests, zero page requests, and so forth) that are provided by requestors of the physical machine 100. As an example, a requestor may be a processor 112 that executes instructions associated with the operating system 152 to cause the processor 112 to submit a read, write or zero request 140. A requestor may also be an entity other than a processor 112, such as a direct memory access (DMA) controller, a graphics controller, and so forth.
In general, the memory controller 130 may receive the requests 140 through signaling that occurs over one or multiple communication links of the physical machine 100, such as a communication link to one or multiple processors, a Peripheral Component Interconnect (PCI)-Express bus, a Direct Media Interface, and so forth. The memory controller 130 may communicate responses 142 to the requests 140 over the same communication links.
For a request 140 that involves writing data in or reading data from the memory 120, the memory controller 130 provides signals to a memory bus 144 that is coupled to the memory 120. For example, to write data to the memory 120, the memory controller 130 provides control signals that identify the bus operation as being a write operation, address signals that represent an address of the memory 120 in which the data is to be stored and data signals that represent the data. The memory 120 responds by storing the data in the memory cells associated with the address.
To read data from the memory 120, the memory controller 130 provides control signals to the memory bus 144, such as signals that identify the bus operation as being a read operation and address signals that represent a physical address of the memory 120 from which the data is to retrieved. The memory 120 responds by providing data signals to the memory bus 144, which represent the data stored in the memory cells associated with the address.
In accordance with example implementations, the memory controller 130 may be an integrated circuit (IC). Moreover, in accordance with example implementations, the memory controller 130 may be part of an IC contains a bridge (a north bridge, for example) that is separate from the processors 120. In accordance with further example implementations, the memory controller 130 may be part of a CPU semiconductor package that contains one or multiple processors 112.
In accordance with some implementations, the memory controller 130 has access to a local memory 135 that stores a Zero Tracking Table (ZTT) 134, which stores data that represents which cache line regions 123 and which pages 122 of the memory 120 are to be treated as being zeroed. Depending on the particular implementation, the local memory 135 may be a volatile memory or a non-volatile memory; and in accordance with some implementations, the local memory 135 may be part of an IC that also contains the memory controller 130.
As a more specific example, in accordance with example implementations, to generate the zero page request 140-1, one or multiple processors 112 may execute machine executable instructions that cause a user level process to pass a virtual address to a kernel of the operating system 152 using a system call; and in response to the system call, the operating system kernel may write the physical address of the page to be zeroed to a memory-mapped input/output (I/O) register 131 of the memory controller 130. It is noted that such a mechanism may be used, in lieu of having applications directly write to the register 131, as such application access may introduce a security vulnerability.
The zero page request 140-1 may be generated by a requestor other than a requestor associated with a processor 112, and the zero page request 140-1 may be generated by executing instructions other than instructions associated with an operating system, in accordance with further example implementations.
In response to the zero page request 140-1, the memory controller 130 updates the ZTT 134, as indicated at reference numeral 210 in
For a subsequent read request that targets a cache line region 123 of the zeroed page 122-1, the memory controller 130 selectively accesses the memory 120, based on whether the ZTT 134 indicates whether the cache line region 123 has been written after being zeroed. In this manner, in accordance with example implementations, the memory controller 130 updates the ZTT 134 when a write occurs to a zeroed cache line region 123 for purposes of changing the corresponding indicator of the ZTT 134 to reflect that the region 123 should no longer be treated as being zeroed (although the write may be a write of all zeroes to the cache line region 123).
More specifically,
Referring to
Referring to
In accordance with further example implementations, a page may be initialized before being allocated to an application with a predetermined data pattern other than a pattern of all zeros (a pattern of all ones, a certain predetermined pattern of ones and zeros, and so forth). Moreover, in accordance with further example implementations, the memory controller may initialize regions of the memory other than pages (units of multiple pages, for example). In accordance with further example implementations, the memory controller may track regions of the memory other than cache line boundary-aligned regions.
Thus, referring to
Referring to
Referring to
Thus, in accordance with example implementations, in response to receiving a zero page request, the memory controller 133 may update the corresponding entry 500 of the ZTT 134 to clear all of the cache line bit indicators 510 to represent that all of the cache line regions 123 of the page 122 have been zeroed. As writes occur to a given page, the memory controller 130 may, in accordance with example implementations, update the corresponding cache line bit indicators 510 to set the corporate indicators 510 (i.e., store corresponding one bits in the indicators 510) to indicate that the cache line regions 123 are no longer zeroed. Therefore, in accordance with example implementations, the page entry 500 serve as an indicator to indicate or represent whether an associated page is zeroed or not; and the bit indicator 510 serves as an indicator to indicate or represent whether an associated cache line region is zeroed or not.
The ZTT 134 may have a different format and may contain data arranged in a different fashion than that depicted in
In accordance with some implementations, the memory controller 130 invalidates zeroed cache line memory regions. For example, in accordance with some implementations, a zero page request may be followed with the execution of PCOMMIT and SFENCE instructions. It is assumed for this approach that the address range of the register 131 (
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2015/053308 | 9/30/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2017/058218 | 4/6/2017 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6160739 | Wong | Dec 2000 | A |
7472219 | Tamura | Dec 2008 | B2 |
7529880 | Chung | May 2009 | B2 |
7646636 | Kim | Jan 2010 | B2 |
7653796 | Inoue | Jan 2010 | B2 |
7694119 | Scharland | Apr 2010 | B1 |
7774525 | Farhan | Aug 2010 | B2 |
8099544 | Kurashige | Jan 2012 | B2 |
8112573 | Keays | Feb 2012 | B2 |
20020095487 | Day et al. | Jul 2002 | A1 |
20020149986 | Wong | Oct 2002 | A1 |
20090187717 | Nasu | Jul 2009 | A1 |
20090249015 | Tzeng | Oct 2009 | A1 |
20100077131 | Lam | Mar 2010 | A1 |
20110173373 | Scouller et al. | Jul 2011 | A1 |
20120179862 | Norman | Jul 2012 | A1 |
20120265925 | Miura | Oct 2012 | A1 |
20120284587 | Yu | Nov 2012 | A1 |
20130145085 | Yu | Jun 2013 | A1 |
20150186072 | Darragh et al. | Jul 2015 | A1 |
20160070474 | Yu | Mar 2016 | A1 |
Number | Date | Country |
---|---|---|
101169760 | Apr 2008 | CN |
103080911 | May 2013 | CN |
104298616 | Jan 2015 | CN |
H05324453 | Dec 1993 | JP |
WO-2012016783 | Feb 2012 | WO |
Entry |
---|
Machine translation of JPH05324453A; retrieved from https://patents.google.com/patent/JPH05324453A/en on May 8, 2019 (Year: 2019). |
Malloc(3)—Linux man page; die.net; Sep. 1, 2010; retrieved from https://web.archive.org/web/20100901171130/https://linux.die.net/man/3/malloc on Mar. 4, 2020 (Year: 2010). |
Free(3)—Linux man page; die.net; Jul. 15, 2010; retrieved from https://web.archive.org/web/20100715163433/https://linux.die.net/man/3/free on Mar. 4, 2020 (Year: 2010). |
Comprehensively and efficiently protecting the heap; Kharbutli et al.; ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pp. 207-218; Oct. 2006 (Year: 2006). |
The dynamics of changing dynamic memory allocation in a large-scale C++ application; Harrison et al.; OOPSLA '06: Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications, pp. 866-873; Oct. 2006 (Year: 2006). |
Hu, J., et al., Write Activity Reduction on Non-Volatile Main Memories for Embedded Chip Multiprocessors, 2013, Transactions on Embedded Computing Systems, 12(3), pp. 77. |
Li, J., et al., A Content-aware Writing Mechanism for Reducing Energy on Non-volatile Memory Based Embedded Storage Systems, Oct. 19, 2014, Design Automation for Embedded Systems, 1 page. |
Extended European Search Report, EP Application No. 15905600.1, dated Mar. 19, 2018, pp. 1-12, EPO. |
Amit Singh. “Mac OS X Internals: A Systems Approach”, Addison-Wesley Professional, 2006, 1154 pages. |
Bhandari et al., “Implications of cpu caching on byte-addressable non-volatile memory programming”, Technical report, 2012, 7 pages. |
Binkert et al., “The gem5 simulator”, SIGARCH Comput. Archit. News, vol. 39, No. 2, pp. 1-7, Aug. 2011, ISSN 0163-5964. doi: 10.1145/2024716.2024718. URL http://doi.acm.org/10.1145/2024716.2024718. |
Bovet et al., “Understanding the Linux Kernel”, Oreilly & Associates Inc, 2005, 463 pages. |
Calhoun et al., “Optimizing Kernel Block Memory Operations”, 2006, 8 pages. |
Chakrabarti et al., “Atlas: Leveraging locks for non-volatile memory consistency”, In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, pp. 433-452, ACM, 2014. |
Chhabra et al., i-nvmm: A secure non-volatile main memory system with incremental encryption. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pp. 177-188. |
Chow et al., “Shredding your garbage: Reducing data lifetime through secure deallocation”, In Proceedings of the 14th Conference on USENIX Security Symposium—vol. 14, SSYM'05, 2005, pp. 331-346. |
Gonzalez et al., “Powergraph: Distributed graph-parallel computation on natural graphs”, In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pp. 17-30. |
HP Labs, “The machine: A new kind of computer”, available online at <https://web.archive.org/web/20150108051127/http://www.hpl.hp.com/research/systems-research/themachine/>, Jan. 8, 2015, 3 pages. |
Huai et al., “Observation of spin-transfer switching in deep submicron-sized and low-resistance magnetic tunnel junctions,” Applied physics letters, vol. 84, No. 16, 2004, pp. 3118-3120. |
Intel, “Software Guard Extensions Programming Reference”, Sep. 2013, 156 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2015/053308, dated Jun. 30, 2016, 7 pages. |
Jiang et al., “Architecture support for improving bulk memory copying and initialization performance”, In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, PACT '09, 2009, pp. 169-180. |
Lewis et al., “Avoiding initialization misses to the heap” In Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on Computer architecture, pp. 183-194, 2002. |
Li et., “Exploring high-performance and energy proportional interface for phase change memory systems”, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013. |
Liu et al., “NVM Duet: Unified working memory and persistent store architecture”, ASPLOS '14, pp. 1-34. |
Matthew Dillon, “Pre-Faulting and Zeroing Optimizations”, Design elements of the FreeBSD VM system, Nov. 13, 2013, 2 pages. |
Moraru et al., Persistent, Protected and cached: Building blocks for main memory data stores. Work, 2012, 28 pages. |
Muralimanohar et al., “Cacti 6.0: A tool to model large caches”, HP Laboratories, vol. 27, 2009, 24 pages. |
Nair et al., “Archshield: Architectural framework for assisting dram scaling by tolerating high error rates,” In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, 2013, pp. 72-83. |
Novark et al., “Automatically correcting memory errors with high probability”, In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM. Press, 2007, 11 pages. |
Qureshi et al., “Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling”, In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 14-23, Dec. 2009. |
Rogers et al., “Using address independent seed encryption and bonsai merkle trees to make secure processors os-and performance-friendly”, In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, 2007, pp. 183-196. |
Russinovich et al., “Windows Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition,” Microsoft Press, 5th edition, 2009, , 1263 pages. |
Sartor et al., “Cooperative cache scrubbing”, In Proceedings of the 23rd international conference on Parallel architectures and compilation, pp. 15-26. ACM, 2014. |
Seshadri et al., “Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization”, In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-46, 2013, pp. 185-197. |
Valat et al., “Introducing kernel-level page reuse for high performance computing”, In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, Article No. 3, ACM, 2013, 9 pages. |
William Stallings, Cryptography and Network Security (6th ed.), 2014, 758 pages. |
Yan et al, “Improving cost, performance, and security of memory encryption and authentication”, Appears in the Proceedings of the 33rd International Symposium on Computer Architecture (ISCA-33), Jun. 2006, pp. 179-190. |
Yang et al., “Memristive devices for computing”, Nature nanotechnology, vol. 8, No. 1, 2013, pp. 13-24. |
Yang et al., “Why nothing matters: The impact of zeroing”, In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '11, 2011, pp. 307-324. |
Young et al., “Deuce: Write-efficient encryption for non-volatile memories”, In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, 2015, pp. 33-44. |
Zhou et al., “A durable and energy efficient main memory using phase change memory technology”, In ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture, 2009, pp. 14-23. |
Number | Date | Country | |
---|---|---|---|
20180121122 A1 | May 2018 | US |