This application claims the benefit of Korean Patent Application No. 10-2014-0002082, filed on Jan. 7, 2014, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
The present disclosure relates to a memory system, and more particularly, to a micro-journaling for a file system using a non-volatile memory.
In the near future, non-volatile memories are highly expected to replace current main memories in mainstream computer systems. From a register file in a processor, through a cache, a main memory and a secondary storage is organized to provide fast and large memory systems, and this memory system hierarchy has been well-established for years. Recently developed non-volatile memories have been lagging in performance when comparing read/write access times and durability to those of dynamic random access memory (DRAM). However, newly found non-volatile memories are showing similar performances compared to DRAM, in access times and endurance. The non-volatile memories can permanently store data even after power-off. When the non-volatile memories are employed as main memories in memory hierarchy designs, file systems should be changed to embrace the non-volatile memories. In detail, reliability of a file system based on a non-volatile memory need to be better secured.
The present disclosure provides a system using micro-journaling in a file system based on a non-volatile memory, and a method of recovering the system.
According to an aspect of the inventive concept, a system includes: a central processing unit (CPU) for controlling operations of the system, including a CPU cache; and a main memory for performing micro-journaling, wherein the micro-journaling includes a commit operation for flushing data of the CPU cache to a user space of the main memory, and the main memory is a non-volatile memory where a file system resides.
The main memory may be any one of a spin transfer torque magnetic random access memory (STT-MRAM), a resistance random access memory (ReRAM), and an MRAM, and a ferroelectric random access memory (FeRAM).
The system may further include a storage device for storing data processed in the system.
The storage device may be used for swapping file data extracted from a virtual memory in the main memory.
The micro-journaling may use the user space of the main memory as a data log space.
The micro-journaling may include a checkpoint operation for marking an update of a file write by a system call during recovery rewrites of the micro-journaling.
The checkpoint operation may use an input/output (I/O) vector, the I/O vector having a pointer to a base virtual address of source data and a length of data for an unfinished file write caused by the sudden power-off of the system.
The checkpoint operation may use a page directory, the page directory accessing a top level of a page table for the file write.
The micro-journaling may perform an atomic and ordered file write on the file system through the commit operation.
The micro-journaling may transactionally update a source data according to a system call as a whole.
The micro-journaling may be used for rebooting after a sudden power-off of the system to recover the file system.
According to another aspect of the inventive concept, a method of recovering a system includes: rebooting the system after a sudden power-off; restoring a page table to a user space of a nonvolatile main memory; and rerunning a file write that was stopped during the sudden power-off, by micro-journaling in the main memory.
According to another aspect of the inventive concept, a system includes: a central processing unit (CPU); and a non-volatile main memory for copying a code executed by the CPU, driving a file system, and performing a micro-journaling, wherein the micro-journaling includes: a commit operation for flushing data of the CPU cache to a user space of the main memory, a checkpoint operation for marking an update of the file write per page unit by a system call, and a recording operation of a data log of the user space.
Exemplary embodiments of the present inventive concepts will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
Hereinafter, the present disclosure will be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the present inventive concepts are shown. Like reference numerals in the drawings denote like elements.
This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the exemplary embodiments set forth herein.
The terms used in the present specification are merely used to describe particular embodiments, and are not intended to limit the inventive concept. An expression used in the singular encompasses the expression of the plural, unless it has a clearly different meaning in the context. In the present specification, it is to be understood that the terms such as “including” or “having,” etc., are intended to indicate the existence of the features, numbers, steps, actions, components, parts, or combinations thereof disclosed in the specification, and are not intended to preclude the possibility that one or more other features, numbers, steps, actions, components, parts, or combinations thereof may exist or may be added.
It will be understood that when an element is referred to as being “connected” or “coupled” to or “on” another element, it can be directly connected or coupled to or on the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. Unless indicated otherwise, these terms are only used to distinguish one element from another. For example, a first chip could be termed a second chip, and, similarly, a second chip could be termed a first chip without departing from the teachings of the disclosure.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this inventive concept belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list.
Referring to
The CPU 110 controls an overall operation of the system 100. The CPU 110 may perform a command corresponding to a code by executing the code copied in the main memory 120. The CPU 110 may perform various computing functions, such as certain calculations or tasks. According to one or more embodiments, the CPU 110 may include one processor core (single core) or a plurality of processor cores (multi-core). For example, the CPU 110 may include a dual-core, a quad-core, or a hexa-core. According to one or more embodiments, the CPU 110 may further include an internal or external cache memory.
Not only a code executed by the CPU 110 may be copied in the main memory 120, but also data processed according to a command may be stored in the main memory 120. The main memory 120 may drive a plurality of pieces of software or firmware. For example, the main memory 120 may drive an operating system (OS), an application, a file system, a memory manager, and an input/output (I/O) driver.
The OS may control software or hardware resources of the system 100, and may control program execution by the CPU 110. The application denotes any one of various application programs executed in the system 100. The file system may systematize a file or data when the file or data is stored in a storage region, such as the main memory 120 or the storage device 130. The file system may provide address information according to a write or read command to the storage device 130. The file system may be used according to a certain OS executed in the system 100. The memory manager may control a memory access operation performed by the main memory 120 or the storage device 130. The I/O driver may transmit information between the system 100 and various peripheral devices or a network, such as Internet.
The storage device 130 may be a data storage device based on a flash memory. The storage device 130 may include, for example, a flash memory, a controller, and a buffer memory. The storage device 130 may be, for example, a memory card device, a solid state device (SSD), an advanced technology attachment (ATA) bus device, a serial advanced technology attachment (SATA) bus device, a multimedia card device, a secure digital (SD) device, a memory stick device, a hybrid drive device, or a universal serial bus (USB) flash device.
The flash memory may be connected to the controller through an address or data bus. The flash memory may be divided into a data region and a meta region. General user data or main data may be stored in the data region, and metadata (for example, mapping information according to a flash transition layer (FTL)) required to drive the flash memory or the storage device 130, aside from the user data, may be stored in the meta region.
The controller may transmit and receive data to and from the flash memory or buffer memory through the address or data bus. The controller may include a mapping manager including an FTL and a page map table, and a local memory used to drive the mapping manager. The FTL enables the flash memory to be efficiently used. The FTL converts a logical address provided by the CPU 110 to a physical address usable by the flash memory.
The FTL manages such an address conversion through a map table. The map table shows logical addresses and physical addresses corresponding to the logical addresses. The map table may have a different size according to a mapping unit, and may use any one of various mapping methods. A page map table may be a map table in a page unit and may be used to convert a logical address number (LAN) to a physical page number (PPN).
The main memory 120 may be realized by using a non-volatile memory. A magnetic random access memory (MRAM) that is one of the non-volatile memories is a magnetoresistance-based non-volatile memory. The MRAM is different from a volatile RAM in several ways. Since the MRAM is non-volatile, the MRAM may maintain data stored therein even when a memory device is turned off.
Generally, a non-volatile RAM is slower than the volatile RAM, but the MRAM has read and write response times comparative to those of the volatile RAM. Unlike typical RAM technologies storing data as electric charges, the MRAM stores data according to magnetoresistance components. Generally, magnetoresistance components include two magnetic layers, wherein each magnetic layer has magnetization.
The MRAM is a non-volatile memory for reading and writing data by using a magnetic tunnel junction (MTJ) pattern including two magnetic layers and an insulating film between the magnetic layers. A resistance value of the MTJ pattern may vary according to a magnetization direction of the magnetic layer, and data may be programmed or erased by using a difference between resistance values.
In the MRAM using a spin transfer torque (STT) phenomenon, the magnetization direction of the magnetic layer switches according to spin transfer of electrons when a spin polarized current is supplied in one direction. The magnetization direction of one magnetic layer (pinned layer) is fixed and the magnetization direction of the other magnetic layer (free layer) may be switched according to a magnetic field generated by a program current.
The magnetic field of the program current may arrange the magnetization directions of the two magnetic layers to be parallel or anti-parallel. When the magnetization directions are parallel, resistance between the two magnetic layers is in a low (0) state. When the magnetization directions are anti-parallel, the resistance between the two magnetic layers is in a high (1) state. The switching of the magnetization direction of the free layer and the high or low state of the resistance between the magnetic layers enable the MRAM to perform write and read operations.
Although the MRAM is non-volatile and provides a quick response time, an MRAM cell has a scaling limitation and is sensitive to write disturbance. The program current applied to switch the state of the resistance between the magnetic layers is generally high. Thus, when a plurality of cells are arranged in one MRAM array, the program current applied to one cell induces a field of the free layer of an adjacent cell to change. Such write disturbance may be reduced by using an STT phenomenon.
A typical STT-MRAM includes an MTJ device. The MTJ device is a magnetoresistance data storage device including two magnetic layers (a pinned layer and a free layer) and an insulating layer between the magnetic layers.
A program current generally flows through the MTJ device. The pinned layer polarizes an electron spin of the program current, and the spin-polarized electron current passes through the MTJ device to generate a torque. The spin-polarized electron current applies the torque on the free layer to mutually operate with the free layer.
When the torque passing through the MTJ device is higher than a threshold switching current density, the torque is enough to switch a magnetization direction of the free layer. Accordingly, the magnetization direction of the free layer may be parallel or anti-parallel to the pinned layer, and resistance between the magnetic layers is changed.
The STT-MRAM does not require an external magnetic field required for the spin-polarized electron current to switch the free layer. Moreover, scaling is improved according to a program current reduction along with a cell size reduction, and write disturbance is prevented. In addition, the STT-MRAM may have a high tunnel magnetic resistance ratio and allows a high ratio between high and low states, and thus improves a reading operation in a magnetic domain.
The MRAM is a universal memory device having low expenses and high capacity characteristics of a dynamic random access memory (DRAM), a high speed operation characteristic of a static random access memory (SRAM), and a non-volatile characteristic of a flash memory.
The main memory 120 may be realized by using the STT-MRAM. According to one or more embodiments, the main memory 120 may be realized as a resistance random access memory (ReRAM), an MRAM, a ferroelectric random access memory (ReRAM), or a similar memory thereof.
The main memory 120 is a non-volatile memory, but may have a page cache like a conventional volatile memory used in a virtual memory system. Like a cache of the CPU 110, the page cache is used for copying a portion of data from the storage device 130 into the main memory 120, updating the copied data and write back the updated data to the storage device 130 by page unit. The page unit in the virtual memory system is a unit for paging that is different from the physical page for write/read unit of the flash memory described in
Since the main memory 120 is realized as a non-volatile memory, micro journaling supporting transactional write system calls in a file system based on a non-volatile memory is suggested to support reliability of the file system. The micro journaling allows data update of all processor caches to be recorded in non-volatile memories before power-off when a write system call is invoked. The micro journaling generates micro journals in a kernel space in the non-volatile main memory during the write system call. In order to improve the micro-journaling, data may be flushed within a specified address range. The micro journaling may be resumed even when a system is suddenly turned off during the write system calls.
Before describing the micro-journaling, traditional journaling and shadow-paging for ensuring reliability of a file system will be described. The traditional journaling and shadow-paging ensures the reliability of the file system when a main memory is a DRAM.
Referring to
Since the system may fail or crash during a write operation performed on the storage devices, journaling separately keeps records of updates in the journal area. For example, commit operations occur when two pages are ready to be written thereon. In a first commit operation, pages PA-1 and PB-1 are committed to the storage device. Then, pages PC-1 and PA-2 are committed in a second commit operation. A page PB-2 is not committed yet.
The file system is independently updated to reflect written pages one by one from the journal area. While updating the file system, a checkpoint is marked to indicate which pages in the journal area are successfully updated to the file system.
Even if the system fails before completely updating a page, the system may rewrite the page from the journal area to the file system. In terms of the file system, the journal area already has duplicate pages. Thus, any updates committed to the journal area may be reliably reflected to the file system. For a page cache, only pages that are successfully written to the journal area are regarded as pages written to the file system. Thus, the journaling has an inherent overhead that performs write operations on the storage twice, i.e., one on the journal area and the other on the file system.
A file operation write1 updates three pages PA, PB, and PC. Before a page is written on an original page PA-0, a duplication of the original page PA-0 is prepared as a shadow page P′A-0. Then, content of the shadow page P′A-0 is updated to a page PA-1. An inode structure in a kernel contains metadata of a file. An inode is also referred to as an index node. Mapping information, i.e., one of metadata, is represented with an address space structure of a file. When the shadow page P′A-0 is successfully written, mapping information in metadata (address space) is atomically changed to designate the updated shadow page P′A-0. A pointer of a page PA-0 is changed to a page PA-1. Then, the original page PA-0 is released. In addition, metadata on a file size and allocated page numbers are also updated. For remaining pages, the same shadow-paging mechanism is performed to update the in-memory file system.
Since updates to shadow pages and metadata may be still in a CPU cache, cache lines containing such data may be properly flushed to a main memory for file system integrity. An order of flushes between data and metadata must be preserved. After cache lines for data pages are all flushed to the main memory, cache lines for metadata are flushed. Shadow-paging has an inherent overhead of page duplication for shadow pages, which, in principle, imposes the same overhead as journaling.
Since file systems in normal operating systems are implemented with consideration of latency between a volatile main memory and a storage device, systems with nonvolatile main memory may need a new file system design.
Referring to
In the file system, a latency time of file accesses may be reduced. Since files are initially maintained in the non-volatile main memory 120, file data may be transmitted more rapidly from the first access. Conventional file systems have block-level device layers to interact with a storage device to achieve reliability of the file systems. In the file system according to the exemplary embodiment of the present inventive concepts, a page cache in the non-volatile main memory 120 instead of a block level device layer between a volatile storage device and a non-volatile storage device may achieve reliability and transactional write of the file system. When infrequently used file data is selected for eviction pages from a virtual memory system, the storage device 130 may store the evicted pages by swapping operation in a swap space in the storage device 130.
An operation method of the file system for achieving reliability and transactional write may be much lighter than that of a file system using a general volatile main memory. Conventional file systems use a heavy reliability technique with journaling, which writes journal logs on a storage device. On the other hand, the file system according to an exemplary embodiment of the present inventive concepts may not require a reliability mechanism to preserve consistency between the non-volatile main memory 120 and the storage device 130. Despite of the non-volatility in the main memory 120, a reliability mechanism between the CPU 110 and the non-volatile main memory 120 is still desirable.
To help ensure the reliability between the CPU 110 and the non-volatile main memory 120, each file operation should be atomic and ordered. The micro journaling according to an exemplary embodiment of the present inventive concepts includes flushing cache lines in a specific order to help ensure a transactional write, and recording a micro-journal.
The transactional write may achieve a recording operation that satisfies all four elements, i.e., atomicity that achieve a completely finished or completely unfinished state of a recording process, consistency about whether content and meta information of data to be stored match those of data actually applied to a file, isolation that ensures non-interference of another recording operation during recording, and durability that ensures whether data to be stored once is permanently recorded.
Persistency of the file system may be supported by modifying booting operations of operating systems. As the file system is built upon page caches, a kernel structure such as page tables and inodes is maintained during power-offs and reboots, without being lost. According to an embodiment of the present inventive concepts, when the system restarts the operating system, user data and information about a file system, which are pre-stored in the nonvolatile main memory may be used such that the file system may have a better performance after reboots. Here the information about a file system may include a super block that stores states of the file system as a whole. The super block may include, for example, a type, name and total size, management block size, mount location, used size, unused size, and connection information of a root directory of the file system.
When the system is suddenly turned off, a reboot sequence needs to restore the page tables and access the inodes and journal logs on a virtual address space. Thus, all contents in the page tables may be the latest, and page tables updated in dirty cache lines of a CPU cache may be all stored in a nonvolatile memory before the system is turned off. Here, a dirty cache line means a cache line including data that is only updated in a CPU cache and not yet written in a storage device. Also, in order to prevent the system from being turned off before the dirty cache lines of the CPU cache are all updated in the nonvolatile memory, page tables may be arranged in uncacheable areas.
For a write system call, data in a user space is transferred to the write system call as a source for a write operation. The data resides in the user address space and has a very valuable role in micro journaling for a non-volatile memory system. Since the data is retained in the non-volatile memory even after power-off and reboot, a source data for a write system call may be permanent once any dirty cache lines of the data are properly flushed to the non-volatile main memory 120. Thus, the data in a user space may be used as a journaling log, so there is no need to duplicate the data in the main memory in the storage device against a sudden power-off like a conventional file system. The micro journal in a kernel space may have all the information to track the data in the user space so that the micro journaling is much lighter way to secure reliability and transactional write of a file system compared to the conventional journaling.
Once committed data are regarded as data recorded in a file system 124, checkpoint operations are performed to assure that all the committed data are properly written on the file system 124. If committed data does not complete the total write operation by the write system call during checkpoint operation due to sudden power-off of the system, the committed data may be rewritten at the next reboot time. This is the reason why journaling logs are kept at the non-volatile area and necessary records for rewriting data on the file system are kept. Thus, at the beginning of a write system call, the micro journaling flushes CPU cache lines related to the source data in the user space 122 and generates a micro journal in the kernel space 124 according to the stored source data in the non-volatile memory 120 and updates kernel space 124 by moving the source data from the user space 122 to the kernel space 124 during a checkpoint operation. After updating the kernel space during checkpoint operation, the micro journal related to the write system call in the kernel space 124 may be deleted or invalided.
The user space 122 may be an area to store updated data by processes invoked by the file system. The kernel space 124 may be an area for managing the file system by the operating system (OS) and may include an inode, a page table, various kinds of data structure, scheduler, process information, and a data structure for memory system management. The kernel space 124 may have a cache area for paging of a virtual memory system. The inode may include a micro-journal.
The storage device 130 may include a swap space to swap pages between the non-volatile main memory 120 and the storage device 130.
Referring to
The page directory 614 of the micro journal 610 is designed to be input to the control register CR3 in the CPU 110, and to point an entry of the page directory 614. The micro-journaling searches the page directory 614 and the page table 620 to find a physical address of updated data. A pointer in the page table 620, which is stored in the entry of the page directory 614, outputs a physical address 630 of a page that is updated source data.
A page directory offset and a page table offset is provided by the base virtual address 617. The first portion of the base virtual address 617 corresponds to the page directory offset and the second portion of the base virtual address 617 corresponds to the page table offset. The base virtual address 617 may include a third portion as an offset of a physical page table (not shown) to find the physical location of the source data 300.
The page directory 619 and the page table 620 may be located in a kernel space and the page directory may be located in the top-level of the area for storing the page table 620.
The page directory 614, e.g., one field in the micro journal 610, is used to access the page table 620 while the micro journal 610 is generated. The micro journal 610 may also contain a predetermined field for checkpoint information 616. The checkpoint information 616 is used to track a page level progress of generic file write for the write system call. The checkpoint information 616 may indicate a completion of data write by page unit during the checkpoint operation.
When data in a user space is a part of a stored file, the data in the user space may be transferred to a write system call. The micro-journaling performs commit operations to move the data of the dirty cache lines to the non-volatile main memory. The commit operation according to an embodiment of the inventive concept includes flushing of a CPU cache and generating of a micro-journal. By using a cache line flush instruction, the flushing of the CPU cache may selectively move the data of the dirty cache lines to the non-volatile main memory. Upon the completion of the all necessary cache line flushes, committing of the file write may be completed. Although the file system is not updated yet after the commit operation, persistent data is already written in the nonvolatile main memory. Thus, the file write may be redone during a recovery process even if the system is suddenly turned off before the completion of the file write, thereby stably completing the file write.
When the file write starts according to the write system call, an address of source data is written on a micro-journal. To commit data from the CPU cache, the micro-journaling uses memory barriers and cache line flush instructions. Through the commit operation, the micro-journaling performs an atomic and ordered file write on the file system.
Referring to
In the micro-journaling, the checkpoint operation 710 is performed after source data is written from a user space of a page cache to a kernel space. A target of a checkpoint may be user data and metadata related to the user data. The checkpoint represents a progress per page write for a write system call. During the checkpoint, information about the checkpoint is stored in a checkpoint information field 616 (
A micro journal may be generated per process, and when several processes are simultaneously performed, several micro journals may simultaneously exist in a page cache.
In the micro-journaling, the source data for write system calls may be committed for successive virtual memory pages, not by a page unit. For example, all cache lines of a CPU related to all virtual addresses from a base virtual address to range of a data length may be committed by using a I/O vector of a micro-journal. In one exemplary embodiment, if the cache lines of the CPU are already committed and there are no more dirty cache lines in a CPU cache, additional recording is not performed on the nonvolatile main memory 120.
Referring to
The system boot-up module 810 operates a booting operation after power is supplied to the system 800. The system boot-up module 810 may include a fresh boot module 812, a normal boot module 814, and a recovery boot module 816. Since the fresh boot module 812 initializes all kernel structures and page caches of the system 800, the fresh boot module 812 may initialize the system 800 to a factory state and generate an initial snapshot to be stored in the main memory 120. The fresh boot module 812 may boot the system 800 by using the initial snapshot stored in the non-volatile main memory 120.
When it is determined that there is no system error during a booting process, the normal boot module 814 may boot the system 800 by using a snapshot generated when the system 800 is shut down and stored in the non-volatile main memory 120. The system boot-up module 810 may determine a system error by referring to a system error signal received from the system operating module 820.
When it is determined that there is a system error by referring to the system error signal, the recovery boot module 816 may request a system error state module 822 for booting including recovery. Also, the recovery boot module 816 may receive system error information according to a sudden power-off of the system 800 from the system error state module 822, and perform recovery boot 850.
The system operating module 820 may include the system error state module 822 and a system normal state module 824. The system error state module 822 may store the system error information generated in the system 800, and may transmit the system error information to the system boot-up module 810 during booting.
The system normal state module 824 may store system state information when the system 800 normally operates. Also, the system normal state module 824 may receive a reboot request from the system error state module 822 when the recovery boot 850 is performed and the system 800 is normally recovered. Accordingly, the system normal state module 824 may update the stored system state information and request the system shutdown module 830 to shutdown the system 800.
According to the exemplary embodiments of the present inventive concepts, a boot mechanism may include shutdown and wake-up of the system 800, and may be realized based on an advanced configuration and power interface (ACPI) protocol. For example, the system 800 may shut down the system 800 by using a suspend-to-RAM or suspend-to-disk having a snapshot in an ACPI protocol.
The system 800 according to an exemplary embodiment may shut down by performing the suspend-to-RAM instead of the suspend-to-disk in response to a normal system shutdown request. The system 800 may perform the suspend-to-RAM to store various types of information required for booting during the shutdown in the non-volatile main memory 120, and then turns off the system 800. The suspend-to-disk is an operation of transferring various types of information required for booting during the shutdown to a storage device before the system 800 is turned off. At reboot, a normal boot is possible after a power-down by performing a suspend-to-RAM operation instead of suspend-to-DISK operation because update information is still remained in the non-volatile main memory 120.
The main memory 120 may include a boot region 841, a snapshot 842 used during a suspend state of the system 800, a user space 843 according to processes, a file system page 844, a page table 845 for managing a virtual memory, and a micro journal 846 for a reliable file write.
Normal boot means booting of the system 800 when there is no system error, and may be completed by performing a wake-up-from-RAM in the ACPI protocol. Since latest update information of the system 800 according to an exemplary embodiment of the present inventive concepts is stored in the non-volatile main memory 120, the normal booting is possible even by performing wake-up referring to the non-volatile main memory 120. As such, the suspend-to-RAM may be considered as a normal power-off and a wake-up-from-RAM may be a normal boot after complete power-off.
The recovery boot 850 means booting of the system 800 while recovering the system 800 by resuming a file write operation when the system 800 suddenly stops. The recovery boot 850 operates by using various types of information stored in the non-volatile main memory 120, and initial booting after the sudden stop of the system 800 may be set to be the recovery boot 850.
The recovery boot 850 may perform a general initialization operation of an OS in operation 851, and enables a file write operation to be normally completed by resuming the file write operation in operation 852. In operation 852, the file write operation is completed in the file system page 844 by using data in the user space 843 by referring to the micro journal 846 and the page table 845 before being recovered to a snapshot. Then, a state of the file system is recovered by using the snapshot 842 in operation 853, and the recovered file system is stored in a register in operation 854, and then the system 800 is restarted by requesting a reboot operation 855.
A recovery mechanism may be included as a part of a boot step. The recovery mechanism is used to reboot the system 800 after the sudden power-off of the system 800, and file writes for the file system recovery may be redone at this time. To perform unfinished writes due to the sudden power-off, the I/O vector 612 in the micro journal 846 may be scanned and related inodes may be looked up to figure out unfinished page-unit write operation to the file system. For redoing writes, data to be copied from the user space 843 may be searched for by using a base virtual address stored in a I/O vector 612. Since the micro journal 846 contains a pointer of a page directory copied to the control register, the recovery process may search source data of file writes. The file system may be restored across system shutdowns and reboots. Related kernel structures including inodes and page caches may be designed to be placed in a specific memory zone so as not to be mixed with kernel structures to initialize for reboot. Thus, the recovery file writes may proceed after the file system is restored for use.
Referring to
The micro-journaling may include the commit operation for flushing data in the CPU cache to the user space in the non-volatile main memory 120, the checkpoint operation performed in a page unit while the file write operation is performed through a system call, and data logs of the user space. Also, the micro-journaling may further include the I/O vector 612 and the page directory 614, wherein the I/O vector 612 has a pointer to a base address 617 of source data 630 and a length of data 616 for a system call for an unfinished file write caused by the sudden power-off of the system. The consistency and reliability of the file system may be maintained through the recovery process using the micro-journaling.
Referring to
The MRAM 12 includes a control logic and command decoder 14 that receives a plurality of command signals and clock signals from an external device, such as a memory controller, via a control bus. The command signals include, for example, a chip select signal CS_n, a write enable signal WE_n, a column address strobe (CAS) signal CAS_N, and a row address strobe signal RAS_n. The clock signals include, for example, a clock enable signal CKE and complementary clock signals CK_t and CK_c. Here, _n denotes an active low signal. t and _c denote a signal pair. The command signals CS_n, WE_n, RAS_n, and CAS_n may be driven by a logic value corresponding to a predetermined command, such as a read command or a write command.
The control logic and command decoder 14 includes a mode registers 15 providing a plurality of operation options of the MRAM 12. The mode registers 15 may program various functions, features, and modes of the MRAM 12. The mode registers 15 may control a burst length, a read burst type, column address strobe (CAS) latency, a test mode, delay-locked loop (DLL) reset, write recovery and read command-to-precharge command features, and DLL use during precharge power down. The mode registers 15 may store data for controlling DLL enable/disable, output drive intensity, additive latency (AL), write leveling enable/disable, termination data strobe (TDQS) enable/disable, and output buffer enable/disable. The mode registers 15 may store data for controlling CAS write latency (CWL), dynamic termination, and write cyclic redundancy check (CRC).
The mode registers 15 may store data for controlling a multi-purpose register (MPR) location function of the MRAM 12, an MPR operation function, a gear down mode, a per MRAM addressing (PDA) mode, and an MPR read format. The mode registers 15 may store data for controlling a power down mode of the MRAM 12, reference voltage (Vref) monitoring, a CS-to-command/address latency mode, a read preamble training (RPT) mode, a read preamble function, and a write preamble function. The mode registers 15 may store data for controlling a command and address (CA) parity function of the MRAM 12, a CRC error state, a CA parity error state, an on-die termination (ODT) input buffer power down function, a data mask (DM) function, a write data bus inversion (DBI) function, and a read DBI function. The mode registers 15 may store data for controlling a VrefDQ training value of the MRAM 12, a VrefDQ training range, VrefDQ training enable, and tCCD timing.
The control logic and command decoder 14 latches and decodes a command applied in response to the complementary clock signals CK_t and CK_c. The control logic and command decoder 14 generates a sequence of the clocking and control signals by using internal blocks for performing a function of an applied command.
The MRAM 12 further includes an address buffer 16 for receiving row, column, and bank addresses A0 through A17, BA0, and BA1, and bank group addresses BG0 and BG1 from the memory controller through an address bus. The address buffer 16 receives a row address, a bank address, and a bank group address applied to a row address multiplexer 17 and a bank control logic 18.
The row address multiplexer 17 applies the row address received from the address buffer 16 to a plurality of address latch and decoders 20. The bank control logic 18 activates the address latch and decoders 20 corresponding to the bank address BA1:BA0 and the bank group signal BG1:BG0 received from the address buffer 16.
The activated address latch and decoders 20 apply various signals to corresponding memory banks 21 so as to activate rows of memory cells corresponding to decoded row addresses. Each of the memory banks 21 includes a memory cell array including a plurality of memory cells. Data stored in the memory cells of the activated rows is detected and amplified by sense amplifiers 22.
A column address is applied to an address bus after row and bank addresses are applied to the address bus. The address buffer 16 applies the column address to a column address counter and latch 19. The column address counter and latch 19 latches the column address, and applies the latched column address to a plurality of column decoders 23. The bank control logic 18 activates the column decoders 23 corresponding to the received bank address and bank group address, and the activated column decoders 23 decode the column address.
According to an operation mode of the MRAM 12, the column address counter and latch 19 directly applies the latched column address to the column decoders 23, or applies a column address sequence starting with a column address provided by the address buffer 16 to the column decoders 23. The column decoders 23 activated in response to the column address from the column address counter and latch 19 apply decode and control signals to I/O gating and DM logic 24. The I/O gating and DM logic 24 accesses memory cells corresponding to the column addresses decoded from the rows of memory cells activated in the accessed memory banks 21.
According to a read command of the MRAM 12, data is read from the addressed memory cells, and is connected to a read latch 25 through the I/O gating and DM logic 24. The I/O gating and DM logic 24 provides N bit data to the read latch 25, and the read latch 25, for example, applies 4 N/4 bits to a multiplexer 26.
The MRAM 12 may have an N pre-fetch architecture corresponding to a burst length N in each memory access. For example, the MRAM 12 may have a 4n pre-fetch architecture retrieving 4 pieces of n bit data. The MRAM 12 may be an x4 memory device that provides and receives 4-bit data per edge. Also, the MRAM 12 may have an 8n pre-fetch. When the MRAM 12 has a 4n pre-fetch and an x4 data width, the I/O gating and DM logic 24 provides 16 bits to the read latch 25 and 4 pieces of 4-bit data to the multiplexer 26.
A data driver 27 sequentially receives N/4-bit data from the multiplexer 26. Also, the data driver 27 receives data strobe signals DQS_t and DQS_c from a strobe signal generator 28, and receives a delayed clock signal CKDEL from a DLL 29. A data strobe (DQS) signal is used by an external device, such as the memory controller, for synchronized reception of read data during a read operation.
In response to the delayed clock signal CKDEL, the data driver 27 sequentially outputs received data to a data terminal DQ according to a corresponding data word. Each data word is output on one data bus by being synchronized to rising and falling edges of the applied clock signals CK_t and CK_c. A first data word is output at a time according to CL programmed after a read command. Also, the data driver 27 outputs the data strobe signals DQS_t and DQS_c having rising and falling edges synchronized to the rising and falling edges of the clock signals CK_t and CK_c.
During a write operation of the MRAM 12, the external device, such as the memory controller, applies, for example, N/4-bit data words to the data terminal DQ, and applies a DQS signal and a corresponding DM signal on a data bus. A data receiver 35 receives each data word and related DM signals, and applies the related DM signals to input registers 36 clocked to the DQS signal.
The input registers 36 latch a first N/4-bit data word and a related DM signal in response to the rising edge of the DQS signal, and latches a second N/4-bit data word and a related DM signal in response to the falling edge of the DQS signal. The input registers 36 provide 4 patched N/4-bit data words and related DM signals to a write first in first out (FIFO) and driver 37 in response to the DQS signal. The write FIFO and driver 37 receives an N-bit data word.
A data word is clocked out in the write FIFO and driver 37, and is applied to the I/O gating and DM logic 24. The I/O gating and DM logic 24 transmits a data word to memory cells addressed in the accessed memory banks 21 upon receiving a DM signal. The DM signal selectively masks predetermined bits or a predetermined bit group from among data words to be written on addressed memory cells.
Referring to
The memory cell 30 may include a cell transistor CT and the MTJ device 40. In one memory cell 30, a drain of the cell transistor CT is connected to a pinned layer 41 of the MTJ device 40. A free layer 43 of the MTJ device 40 is connected to the bit line BL0, and a source of the cell transistor CT is connected to the source line SL0. A gate of the cell transistor CT is connected to the word line WL0.
The MTJ device 40 may be replaced by a resistive device, such as a phase change random access memory (PRAM) using a phase change material, a resistive random access memory (RRAM) using a variable resistance material, such as a complex metal oxide, or an MRAM using a magnetic material. Materials forming the resistive devices change a resistance value according to size and/or direction of a current or voltage, and have non-volatile features of maintaining the resistance value even when the current or voltage is blocked.
The word line WL0 is enabled by a row decoder 20 and is connected to a word line driver 32 driving a word line select voltage. The word line select voltage activates the word line WL0 so as to read or write a logic state of the MTJ device 40.
The source line SL0 is connected to a source line circuit 34. The source line circuit 34 receives an address signal and a read/write signal, and generates a source line select signal in the selected source line SL0 based on the received address signal and the read/write signal. A ground reference voltage is provided to the unselected source lines SL1 through SLn.
The bit line BL0 is connected to a column select circuit including the I/O gating and DM logic 24 and driven by column select signals CSL0 through CSLM. The column select signals CSL0 through SCLM are selected by a column decoder 23. For example, the selected column select signal CSL0 turns on a column select transistor in the column select circuit 24 and selects the bit line BL0. A logic state of the MTJ device 40 is read from the bit line BL0 through the sense amplifier 22. Alternatively, a write current applied through the data driver 27 is transmitted to the bit line BL0 and is written on the MTJ device 40.
Referring to
The MTJ device 40 may include a free layer 41, a pinned layer 43, and a tunnel layer 42 therebetween. A magnetization direction of the pinned layer 43 is fixed, and a magnetization direction of the free layer 41 may be parallel to or anti-parallel to the magnetization direction of the pinned layer 43 according to written data. In order to fix the magnetization direction of the pinned layer 43, for example, an anti-ferromagnetic layer (not shown) may be further included.
In order to perform a write operation of the STT-MRAM cell 30, a logic high voltage is applied to the word line WL0 to turn on the cell transistor CT. A program current, i.e., a write current, provided by a write/read bias generator 45 is applied to the bit line BL0 and the source line SL0. A direction of the write current is determined by a logic state of the MTJ device 40.
In order to perform a read operation of the STT-MRAM cell 30, a logic high voltage is applied to the word line WL0 to turn on the cell transistor CT, and a read current is applied to the bit line BL0 and the source line SL0. Accordingly, a voltage is developed at two ends of the MTJ device 40, sensed by the sense amplifier 22, and compared with a reference voltage from a reference voltage generator 44 to determine a logic state of the MTJ device 40. Accordingly, data stored in the MTJ device 40 may be determined.
Referring to
Referring to
In the exemplary embodiment of the present inventive concepts, the free and pinned layers 41 and 43 of the MTJ device 40 are shown as horizontal magnetic devices, but alternatively, the free and pinned layers 41 and 43 may be vertical magnetic devices.
Referring to
When a second write current IWC2 is applied from the pinned layer 43 to the free layer 41, electrons having a spin opposite to the pinned layer 43 return back to the free layer 41 and apply a torque. Accordingly, the free layer 41 is magnetized anti-parallel to the pinned layer 43. In other words, the magnetization direction of the free layer 41 in the MTJ device 40 may be changed by STT.
Referring to
The tunnel layer 52 may have a thickness that is smaller than a spin diffusion distance. The tunnel layer 52 may include a non-magnetic material. For example, the tunnel layer 52 may include at least one selected from the group consisting of magnesium (Mg), titanium (Ti), aluminum (Al), magnesium-zinc (MgZn), a magnesium-boron (MgB) oxide, a Ti nitride, and a vanadium (V) nitride.
The pinned layer 53 may have a magnetization direction fixed by the anti-ferromagnetic layer 54. Also, the pinned layer 53 may include a ferromagnetic material. For example, the pinned layer 53 may include at least one selected from the group consisting of CoFeB, Fe, Co, Ni, Gd, Dy, CoFe, NiFe, MnAs, MnBi, MnSb, CrO2, MnOFe2O3, FeOFe2O3, NiOFe2O3, CuOFe2O3, MgOFe2O3, EuO, and Y3Fe5O12.
The anti-ferromagnetic layer 54 may include an anti-ferromagnetic material. For example, the anti-ferromagnetic layer 54 may include at least one selected from the group consisting of PtMn, IrMn, MnO, MnS, MnTe, MnF2, FeCl2, FeO, CoCl2, CoO, NiCl2, NiO, and Cr.
Since the free layer 51 and the pinned layer 53 of the MTJ device 50 are each formed of a ferromagnetic material, a stray field may be generated at an edge of the ferromagnetic material. The stray field may decrease magnetoresistance or increase resistance magnetism of the free layer 51. Moreover, the stray field affects a switching characteristic, thereby forming asymmetrical switching. Accordingly, a unit for decreasing or controlling a stray field generated by the ferromagnetic material in the MTJ device 50 may be used.
Referring to
Referring to
In order to realize the MTJ device 70 having a vertical magnetization direction, the free layer 71 and the pinned layer 73 may be formed of a material having high magnetic anisotrophy energy. Examples of the material having high magnetic anisotrophy energy include an amorphous rear earth raw material alloy, a thin film such as (Co/Pt)n or (Fe/Pt)n, and a superlattice material having an L10 crystalline structure. For example, the free layer 71 may be an ordered alloy, and may include at least any one of Fe, Co. Ni, palladium (Pa), and platinum (Pt). Alternatively, the free layer 71 may include at least any one of a Fe—Pt alloy, a Fe—Pd alloy, a Co—Pd alloy, a Co—Pt alloy, a Fe—Ni—Pt alloy, a Co—Fe—Pt alloy, and a Co—Ni—Pt alloy. The alloys above may be, for example, Fe50Pt50, Fe50Pd50, Co50Pd50, Co50Pt50, Fe30Ni20Pt50, Co30Fe20Pt50, or Co30Ni20Pt50 in terms of quantitative chemistry.
The pinned layer 73 may be an ordered alloy, and may include at least any one of Fe, Co, Ni, Pa, and Pt. For example, the pinned layer 73 may include at least any one of a Fe—Pt alloy, a Fe—Pd alloy, a Co—Pd alloy, a Co—Pt alloy, a Fe—Ni—Pt alloy, a Co—Fe—Pt alloy, and a Co—Ni—Pt alloy. These alloys may be, for example, Fe50Pt50, Fe50Pd50, Co50Pd50, Co50Pt50, Fe30Ni20Pt50, Co30Fe20Pt50, or Co30Ni20Pt50 in terms of quantitative chemistry.
Referring to
When magnetization directions of the first and second pinned layers 81 and 85 are fixed to opposite directions, magnetic forces by the first and second pinned layers 81 and 85 substantially counterbalance. Accordingly, the dual MTJ device 80 may perform a write operation by using a smaller current than a general MTJ device.
Since the dual MTJ device 80 provides higher resistance during a read operation by the second tunnel layer 84, an accurate data value may be obtained.
Referring to
Here, when magnetization directions of the first and second pinned layers 91 and 95 are fixed in opposite directions, magnetic forces by the first and second pinned layers 91 and 95 substantially counterbalance. Accordingly, the dual MTJ device 90 may perform a write operation by using a smaller current than a general MTJ device.
The STT-MRAM may be used as a main memory of a system. Since the STT-MRAM is byte-addressable and is capable of permanently storing data, a double duplication process of data performed to increase reliability of a file system may be omitted. Also, since the STT-MRAM is used in the micro-journaling for recording logging information and checking a point during file writes, the file system may be recovered when the system is suddenly turned off.
While the inventive concept has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2014-0002082 | Jan 2014 | KR | national |