A computing device may execute an operating system. The operating system may read and write data stored using a journaling file system.
Certain examples are described in the following detailed description and in reference to the drawings, in which:
A computing device may comprise a processor, such as a central processing unit (CPU). The CPU may execute an operating system (OS). The OS stores data on one or more storage devices using a file system. The file system defines an organization and method of writing data to the storage device so that the OS can reliably read from, and write data to the one or more storage devices.
Most modern file systems implement file system journaling. In a file system that implements file system journaling (a journaling file system), when the OS receives a request to modify the file system (i.e. a write request), the OS writes an entry to a journal of the file system comprising the operations that need to be performed for the operation to fully complete. After the journal entry has been written, the OS executes the write operation by replaying the write stored in the journal entry. Thus, each journal entry is associated with one or more operations that have not yet committed to the file system.
Once the operations specified in the journal entry have successfully been committed to the file system, the OS deletes the associated journal entry. Journaling file systems are useful in the event of a power failure, hardware failure, or system crash. In such events, a write operation, which may comprise multiple sub-operations, may not fully complete. That is, some, but not all of the operations comprising the write operation may complete. A write operation that does not fully complete may leave the file system in a corrupt state.
With a journaling file system, if the OS detects that a write was in-progress, but did not fully complete, the OS re-attempts the write by reading the journal entry associated with the incomplete writes. Based on the data stored in the journal entry, the OS replays the operations indicated by the journal entry to complete the write. In this manner, a journaling file system may fix the issue of incomplete writes corrupting the file system by enabling the OS to replay the incomplete write based on the journal entry associated with the write.
A journaling file system may create a journal entry for each pending write operation. Thus, a downside to file system journaling is that each write operation that causes the file system to write a journal entry incurs additional write overhead. More particularly, a journaling file system may incur twice as many writes (one write for creating a journal entry, and another write when the operations in the journal entry are replayed to actually write data to the file system) as compared a non-journaling file system.
The techniques of this disclosure enable an operating system to reduce the amount of writes to a file system, thereby improving the performance of the file system. More particularly, an OS as described herein may determine when there are multiple pending writes to a same page of the file system. The OS may determine whether there are multiple writes pending to a same page based on generation counters stored in the file system journal and in the file system page.
The generation counters may indicate a number of writes that are pending, or have committed to the page. If the OS determines that the generation counter value stored in the journal entry for the page differs from the generation counter stored in the page, the OS determines that additional writes are pending for the page, and therefore, that the results of any earlier-pending writes will be overwritten and can therefore be skipped. Skipping execution of the write operations increases file system write performance because fewer replays of writes from a journal entry will occur when there are multiple writes pending for a particular page.
Journaling file system 106 may comprise any file system that stores journal entries associated with an operation for modifying data stored in journaling file system 106, and that replays each entry to execute the modifying operation. In various examples, journaling file system 106 may execute in user space, or as part of an operating system kernel. In some examples, journaling file system 106 may comprise a package or a module of OS 104. In some examples, journaling file system 106 may comprise a virtual file system, which may be associated with one or more virtual machines.
Storage device 110 is illustrated as a single storage device for the purposes of example. However, in some examples, storage device 110 may comprise multiple storage devices, a storage array, storage area network (SAN), one or more virtual storage devices, or any combination thereof. In some examples, storage device 110 may comprise a plurality of blocks. Each block may comprise a logically addressable unit of storage device 110 to which data can be written. Journaling file system 114 may write data to a single block, or to a plurality of blocks. A plurality of blocks is referred to herein as a page. Page 108 is an example of a page. It should be understood that storage device 110 comprises a plurality of pages.
OS 104 may receive a request to write data to a page of data, e.g. page 108, of storage device 110. Responsive to receiving a write request, OS 104 passes the write request to journaling file system 106. Responsive to journaling file system 106 receiving a write request, journaling file system 106 may create a journal entry associated with the write request. In the event that the write request does not complete, OS 104 may replay the write request from the journal entry to successfully complete the write, as described above.
In the example of
Journaling file system 106 may receive a second write request for page 108. Journaling file system 106 may create a second journal entry (not pictured) corresponding to the second write request responsive to receiving the second write request. OS 104 may determine that the first write request and the second write request are bound for the same page 108 based on an address indicated by the write request. The journal entry may indicate the data to be written to page 108.
As will be described herein in greater detail, journaling file system 106 may determine based, based on data stored in journal entry 112, and data stored in page 108, that second pending write 118 will occur after first pending write 116. Based on the determination that second pending write 118 will execute after first pending write 116, OS 104 may determine that second pending write 118 will overwrite the data stored in page 108 and that OS 104 may skip execution of first pending write 116.
Thus, computing device 100 represents an example computing device in which processor 102 executes OS 104. Processor 102 determines, based on page 108 of journaling file system 106 and corresponding journal entry 112 associated with first pending write 116, whether a second pending write 118 is pending for page 108, wherein the second pending write 118 will occur after first pending write 116. Responsive to determining that second pending write 118 will occur after first pending write 116, processor 102 may skip execution of first pending write 116.
In the example of
As described responsive to receiving a write request (e.g. first pending write 116), journaling file system 106 may create a journal entry, e.g. journal entry 112. Each journal entry may comprise a counter, e.g. counter 202. In some examples, counter 202 may indicate a number of writes pending to page 108. In various examples, counter 202 may comprise a generation counter. The generation counter may indicate how many times the page has been modified or a number of writes pending for the page.
OS may store a copy of page 108 in memory in some examples. Before creating a journal entry for a write operation, e.g. journal I entry 112 for first pending write 116, journaling file system 106 reads a value of counter 204 from the in-memory copy of page 108. File system 106 increments counter 202 and counter 204 to indicate that first pending write 116 will modify page 108.
In the example of
During the creation of the second journal entry associated with second pending write 118, journaling file system 106 increments counter 204, which is stored in the in-memory copy of page 108, as well as the counter stored the second journal entry associated with second pending write 118.
In this example, after journaling file system 106 has incremented counter 204 responsive to receiving the second write request, the value of counter 202 associated with first pending write operation 116 will be less than the value of counter 204. The value of the counter stored in the second journal entry associated with second pending write 118 will be equal to the value of counter 204.
When file system 106 reads a journal entry to replay a pending write operation, OS 104 compares the value of the counter stored in the journal entry with the counter stored in the in-memory copy of the page associated the journal entry. In the example of
In various examples, method 300 may be performed by hardware, software, firmware, or any combination thereof. Other suitable systems and/or computing devices may be used as well. Method 300 may be implemented in the form of executable instructions stored on at least one machine-readable storage medium of the system and executed by at least one processor of the system. Alternatively or in addition, method 300 may be implemented in the form of electronic circuitry (e.g., hardware). In alternate examples of the present disclosure, one or more blocks of method 300 may be executed substantially concurrently or in a different order than shown in
Method 300 may start at block 302 at which point processor 102 may cause operating system 104 to determine based on a page (e.g. page 108) of a journaling file system (e.g. journaling file system 106) and a journal entry (e.g. journal entry 112) of the file system associated with a first pending write (e.g. first pending write 166), whether a second write (e.g. second pending write 118) is pending for the page, wherein the second write will occur after the first pending write (302).
At block 304, responsive to determining that second pending write 118 will occur after first pending write 116: OS 104 may skip execution of first pending write 116.
Alternatively or in addition, method 400 may be implemented in the form of electronic circuitry (e.g., hardware). In alternate examples of the present disclosure, one or more blocks of method 400 may be executed substantially concurrently or in a different order than shown in
In various examples, method 400 may start at block 402, at which block processor 102 may cause operating system 104 to determine based on a counter (e.g. counter 204) stored in a page (e.g. page 108) of a journaling file system (e.g. journaling file system 106) and a counter (e.g. counter 202) stored in a corresponding journal entry (e.g. journal entry 112) of the file system associated with a first pending write (e.g. first pending write 116), whether a second pending write (e.g. second pending write 118) will occur after the first pending write. In various examples counters 202 and 204 may comprise generation counters. The generation counters may indicate a number of writes that are pending for the page.
At decision block 404, OS 104 may determine whether the second write is pending for the block, wherein the second pending will occur after the first pending write. If OS 104 determines that the second write is not pending for the page (“NO” branch of decision block 404), OS 104 may execute block 408. Otherwise, (“YES” block of decision branch 404), OS 104 may execute block 406. At block 406, OS 104 may skip execution of the first pending write (e.g. first pending write 116). At block 408, OS 104 may execute the first pending write. In some examples, to determine whether the second pending write is pending for the page, OS 104 may determine whether the second pending write will overwrite data from the first pending write.
In some examples, to determine whether the second pending write is pending for the page, OS 104 may compare the values of the counter of the journal entry (e.g. counter 202), and the value of the counter of the page (e.g. counter 204). OS 104 may execute block 406, and skip execution of the write responsive to determining that counters 202 and 204 are not equal. OS 104 may execute block 404 and execute the first pending write (e.g. first pending write 116) responsive to determining that counters 202 and 204 are equal.
Processor 510 may be one or more central processing units (CPUs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 520. In the particular example shown in
Machine-readable storage medium 520 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions. Thus, machine-readable storage medium 520 may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like. Machine-readable storage medium 520 may be disposed within system 500, as shown in
Referring to
Responsive to determining that the second pending write will occur after the first pending write, processor 510 may execute write skip instructions 524. Write skip instructions 524, when executed by a processor (e.g., 510), may cause system 500 to skip execution of the first pending write.
Processor 610 may be one or more central processing units (CPUs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 620. In the particular example shown in
Machine-readable storage medium 620 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions. Thus, machine-readable storage medium 620 may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like. Machine-readable storage medium 620 may be disposed within system 600, as shown in
Referring to
At block 624 processor 610 may execute counter determination instructions 624, which when executed, cause processor 610 to determine whether the second write is pending for the page and will occur after the first pending write based on a generation counter of the journal entry and a generation counter of the page. The generation counter of the journal entry and the generation counter of the page may indicate a number of writes that are pending for the page. In some examples, the generation counter may indicate a number of writes that will commit to the page.
Responsive to determining that the second pending write will occur after the first pending write (e.g. based on the counters), processor 610 may execute write skip instructions 626. Write skip instructions 626, when executed by a processor (e.g., 610), may cause system 600 to skip execution of the first pending write responsive to determining that the generation counter of the journal entry and the generation counter of the page are not equal.
Responsive to determining that the second pending write will occur after the first pending write, processor 610 may execute write execution instructions 628. Write execution instructions 628, when executed by a processor (e.g., 610), may cause system 600 to execute the first pending write.