1. Field of the Invention
The present application relates generally to computer systems and software. More particularly, the present application relates to computer operating systems with virtual memory.
2. Description of the Background Art
Computer systems typically include a processor and a main memory. The main memory functions as the physical working memory of the computer system, where data is stored that has been or will be used by the processor and other system components.
In computer systems that implement “virtual memory,” software programs executing on the computer system reference main memory through the use of virtual addresses. A memory management unit (“MMU”) translates each virtual address specified by a software program instruction to a physical address that is passed to the main memory in order to retrieve the requested data. The use of virtual memory permits the size of programs to greatly exceed the size of the physical main memory and provides flexibility in the placement of programs in the main memory.
Implementing a virtual memory system requires establishing a correspondence between virtual address space and physical address space in the main memory. A common technique by which to have virtual address space correspond with physical address space involves separately dividing virtual address space and its corresponding physical address space into contiguous blocks called pages. Each page has a virtual page number address in virtual address space that corresponds to the physical page number address of the page in physical address space.
For each access to main memory, a virtual page number address in virtual address space is translated into the corresponding physical page number address in physical address space, and a page offset within the physical page is appended to the physical page number address. Thus, the virtual address subdivided into a Virtual Page Number Address:Page Offset is translated into a physical address consisting of Physical Page Number Address:Page Offset. The physical address is then used to access main memory. Translation of the virtual page number address into its corresponding physical page number address occurs through the use of page tables stored in physical main memory.
In order to reduce the total number of page table main memory accesses required per virtual-to-physical address translation, one or more translation-lookaside buffers (TLBs) are often provided in the MMU. A TLB is a cache-like memory, typically implemented in Static Random Access Memory (“SRAM”) and/or Content Addressable Memory (“CAM”), that holds virtual page number address to physical page number address translations that have recently been fetched from the page table in physical main memory. Access to a TLB entry holding an output physical page number address corresponding to an input virtual page number address obviates the need for, and is typically significantly faster than, access to the page table in main memory. Hence, TLB accesses reduce the overall average time required to perform the steps of a virtual-to-physical address translation.
If the TLB does not contain the requested translation (i.e., a TLB “miss” occurs) then the MMU initiates a search of page tables stored in main memory for the requested virtual page number address. A TLB miss handler then loads the physical page number address referenced by the virtual page number address into the TLB, where it may be available for subsequent fast access should translation for the same input virtual page number address be required at some future point.
One solution to reduce translation lookaside buffer misses is to use larger page sizes so that the same physical main memory can be described by many fewer virtual page number addresses. TLB misses for a system with large page sizes are much less likely. For example, if the small page sizes are such that physical main memory can be mapped into a total of 64 pages while the TLB can only hold 16 virtual-to-physical page translations, then a random TLB access will miss 75% of the time. Alternatively, if the virtual memory system is implemented with large page sizes such that physical main memory can be mapped into a total of 32 pages while the TLB can still hold 16 virtual-to-physical page translations, then a random TLB access will miss only 50% of the time.
However, large page sizes result in more complex hardware to access the page offset within the physical page and also increase unused space within the pages (due to internal fragmentation). For this reason, high-performance processors generally allow any of a plurality of page sizes to be selected for different purposes.
It is highly desirable to improve performance of computer systems and software. More particularly, it is highly desirable to improve performance of computer operating systems with virtual memory.
Main memory 106 generally comprises physical memory in the form of random access memory (RAM) that is internal to the computer system. Since the amount of main memory 106 is typically limited, disk storage 102 is used for additional data storage. Disk storage 102 refers to an external mass storage device, typically one or more disk drives, which store computer-readable data.
Prior to being accessed by the main processor, a file is generally copied into main memory 106. For example, when a binary program file is to be executed, it is typically copied from disk storage 102 into main memory 106.
A file cache 104 may be utilized to speed up access to files in the disk storage 102. The file cache 104 is typically implemented using RAM and configured to store segments of active files in anticipation of future requests. The file cache 104 speed access to files that are used by multiple users or multiple applications. Requests to access files are diverted to check the file cache 104 prior to accessing them from the disk storage 102. If the data requested is already in the file cache 104, then there is a cache hit, and it is not necessary to access the relatively slow disk storage 102. On the other hand, if the data requested is not in the file cache 104, then there is a cache miss. In the event of a cache miss, the requested data may be read from the disk storage 102 to the file cache 104, and then accessed by the main memory 106 from the file cache 104.
At the execution of the binary file, the operating system kernel reads a program header table of the file from disk storage 102 and may also bring in a further portion of the binary file, as shown in the second block 204. If the portion read overlaps with the text segment of the binary, then it will be unlikely to get large pages for text later when the binary executes, as shown in the third block 206.
At the execution of the binary file, the operating system kernel again reads a program header table and a further portion of the file from disk storage 102, as shown in the second block 224. As shown in the third block 226, this may result in previously established mapped text pages (by the first instance of the binary) being fragmented or demoted to smaller page sizes.
As discussed above, execution performance may be slowed by binary text fragmentation and unnecessary reads to executable header data. Solutions to the aforementioned problems are described below.
Applicants have determined certain circumstances and problems which hinder the performance of program execution in at least some computer operating systems. These circumstances and problems may lead to binary text fragmentation and unnecessary reads of executable header data.
For example, reasons leading to binary text fragmentation may include the following. First, compiling and/or changing an attribute of a binary file will typically bring text pages into the file cache, where the text pages may have a much smaller page size than the preferred text page size which is desired when the binary executes. Since these smaller pages of the binary are already in the file cache, the kernel may not bother to set up a fresh large page and bring in binary data from disk storage. Second, the first execution (the first instance) of a binary typically does a read for the program header table which may bring in a portion of the binary with a smaller page size, and the presence of the smaller page size may prevent text from getting a larger page later during execution. Third, at the first execution (the first instance) of the binary, the text segment is mapped into the process address space. Executing the binary a second time (the second instance) results in reading the program header table from the binary file on disk storage. This may cause the already established mapped text pages (by the first instance of the binary) to get fragmented.
In accordance with an embodiment of the invention, per the third block 306, program header table data (that was just read) is stored in cache memory and associated with the binary file. As shown in the fourth block 308, reading the program header table from the disk storage 102 often results in the file cache 104 being populated with text at smaller page sizes than a preferred text page size attributed to the binary file.
In accordance with an embodiment of the invention, a flush procedure is then applied per the fifth block 310 to pages in the file cache 104 that correspond to the binary file. An embodiment of the flush procedure is described further below in relation to
If valid program header table data is cached, then the kernel bypasses reading the program header table from the binary file in the disk storage 102. Instead, the kernel retrieves the program header table data from the cache, as indicated in the third block 326. Advantageously, this avoids previously mapped text pages from being fragmented as a result of smaller text pages being read into the file cache 104 from the disk storage 102. Such fragmentation of previously mapped text pages is discussed further below in relation to
On the other hand, if valid program header table data is not cached, then, as shown in the fifth block 330, the kernel goes ahead and reads the program header table from the binary file in the disk storage 102. In accordance with an embodiment of the invention, per the sixth block 332, program header table data (that was just read) is stored in cache memory and associated with the binary file. As shown in the seventh block 334, reading the program header table from the disk storage 102 often results in the file cache 104 being populated with text at smaller page sizes than a preferred text page size attributed to the binary file.
In accordance with an embodiment of the invention, a flush procedure is then applied per the eighth block 336 to pages in the file cache 104 that correspond to the binary file. An embodiment of the flush procedure is described further below in relation to
In the first block 502 of the flush algorithm, a determination is made as to the preferred text page size for the binary file. A binary file may have internal characteristics, including preferred page sizes (page size hints) for its text and data segments. These internal characteristics may be set or changed, for example, by using a “chatr” (change attribute) command in the HP-UX operating system. The preferred text page size may be determined from the program header table (i.e. in the executable header).
In the second block 504, a determination is made as to whether the preferred text page size for the binary is greater than or equal to a first threshold size. The first threshold size may be a predetermined size, for example, 256 kilobytes. The predetermined size may be pre-set depending on system parameters, such as the size of the file cache and other parameters.
If the preferred text page size is greater than or equal to the first threshold size, then the process goes on to the third block 506 where all text pages which correspond to the binary file are flushed (i.e. removed or invalidated) from the file cache 104. Advantageously, flushing the file cache of these text pages allows text to get larger page sizes later during execution of the binary. Thereafter, the process returns to the calling routine, as indicated by the fourth block 508.
On the other hand, if the preferred text page size is less than the first threshold size, then the process goes on to the fifth block 510 where a determination is made as to whether the preferred text page size for the binary is greater than or equal to a second threshold size, where the second threshold size is a fraction of the first threshold size. Like the first threshold size, the second threshold size may be a predetermined size, for example, 64 kilobytes. The predetermined size may be pre-set depending on system parameters, such as the size of the file cache and other parameters.
If the preferred text page size is less than the second threshold size, then the process goes on to the sixth block 512 and makes the determination not to flush the file cache 104. For example, if the second threshold size is 64 kilobytes, then the decision not to flush would be made if the preferred text page size is less than 64 kilobytes. In that case, the preferred text page size is small enough such that flushing the file cache 104 is unnecessary or is unlikely to be of significant benefit. Thereafter, the process returns to the calling routine, as indicated by the fourth block 508.
On the other hand, if the preferred text page size is greater than or equal to the second threshold size, then the process goes on to the seventh block 514 where a determination is made as to whether the number of pages of the binary file already in the file cache 104 is greater than a predetermined threshold fraction of the total pages of the binary file. The predetermined threshold fraction may be, for example, one quarter or some other fraction.
If more than the threshold fraction of pages of the binary file are already in the file cache 104, then the process goes back to the third block 506 where all text pages which correspond to the binary file are flushed (i.e. removed or invalidated) from the file cache 104. Advantageously, flushing the file cache of these text pages makes it much more likely for text to get larger page sizes later during execution of the binary. Thereafter, the process returns to the calling routine, as indicated by the fourth block 508.
On the other hand, if less than the threshold fraction of pages of the binary file are already in the file cache 104, then the process goes to the sixth block 512 and makes the determination not to flush the file cache 104. Thereafter, the process returns to the calling routine, as indicated by the fourth block 508.
For example, in
Subsequently, during a second execution instance, a second virtual text page V2 may be read from disk. The second virtual page V2 may be a smaller sized page relating to the second execution instance. For example, V2 may be a virtual page which is only 8 k in size. Due to the aforementioned memory mapping restriction, however, both V1 and V2 cannot be mapped to P1. This is because V1 and V2 have different page sizes. As a result, the kernel resolves this problem by fragmenting V1 into smaller text page fragments.
In one particular instance, V1 may include header data from the executable file as mapped by the first execution instance, and V2 may include header data from the executable file as mapped by the second execution instance. If V2 has a smaller text page size than V1 (as shown in
In the above description, numerous specific details are given to provide a thorough understanding of embodiments of the invention. However, the above description of illustrated embodiments of the invention is not intended to be exhaustive or to limit the invention to the precise forms disclosed. One skilled in the relevant art will recognize that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures or operations are not shown or described in detail to avoid obscuring aspects of the invention. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the claims. Rather, the scope of the invention is to be determined by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.