This relates generally to processing units to handle page faults that arise in specialized devices, such as graphics processing units.
A page fault is an interrupt that occurs when software attempts to read from or to write to a virtual memory location that is marked as “not present” or when a page permission attribute prohibits corresponding access. Virtual memory systems maintain such status information about every page in a virtual memory address space. These pages are mapped onto physical addresses or are “not present” in physical memory. For example, when a read or write is detected to an unmapped virtual address or when page access permissions are violated, the device “page walker” generates a page fault interrupt. The operating system (OS) page fault handler responds to this page fault by swapping in data from disk to system memory, or by allocating new page (“copy on write”) and updating the status information in page table.
In order to avoid the possibility of page faults in graphics processing units, graphics processing units are generally constrained to using pinned memory. This means that in the last case, the page which is in graphics processor use, is pre-allocated and cannot be swapped to disk or remapped to new location in system memory.
In conventional systems, separate page tables are used by the central processing unit and the graphics processing unit. The operating system manages the host page table used by the central processing unit and a graphics processing unit driver manages the page table used by the graphics processing unit. The graphics processing unit driver copies data from user space into the driver memory for processing on the graphics processing unit. Complex data structures must be repacked into an array when pointers are replaced by offsets. The overhead related to copying and repacking limits graphics processing unit applications where data is represented as arrays. Thus, graphics processing units may be of limited value in some applications, including those that involve complex data structures such as databases.
In some embodiments, graphics processing applications may use complex data structures, such as databases, by using a shared virtual memory model that does not require pinning of shared memory. Pinning of shared virtual memory reduces an operating system's ability to manage system memory. In some embodiments, unpinned shared virtual memory may be used on the graphics processing unit when there is no guarantee that the page used by the graphics processing unit is present in system memory.
The graphics processing unit driver propagates page faults on the graphics processing unit to a shadow thread on the host/central processing unit. The host then emulates the page faults as if they occurred on the central processing unit to trigger the operating system to resolve the fault for the benefit of the graphics processing unit.
While the term graphics processing unit is used in the present application, it should be understood that the graphics processing unit may or may not be a separate integrated circuit. The present invention is applicable to situations where the graphics processing unit and the central processing unit are integrated into one integrated circuit.
In addition, while an example relating to graphics processing is given herein, in other embodiments, the same page fault handling techniques may be used in other specialized processing units, such as video processing, cards and input/output devices. In general, the page fault handling techniques may be used with any device that may experience page faults and which is accompanied by a processor that may act as a proxy to resolve those page faults. As used herein, a processor or processing unit may be a processor, controller, or coprocessor.
Referring to
As shown in
The graphics processing unit 18 includes, in user level 12, the gthread 28 which sends and receives control and exceptions messages to the operating system 30. A gthread is user code that runs on the graphics processing unit, sharing virtual memory with the parent thread running on the central processing unit. The operating system 30 may be a relatively small operating system, running on the graphics processing unit, that is responsible for graphics processing unit exceptions. It is a small relative to the host operating system 24, as one example.
User applications 20 are any user process that runs on the central processing unit 16. The user applications 20 spawn threads on the graphics processing unit 18.
An eXtended Threaded Library or XTL is an extension to create and manage user threads on the graphics processing unit. This library creates the shadow thread for each gthread.
User applications offload computations to the graphics processing unit using an extension of a traditional multithreaded model such as:
The gthread or worker thread created on the graphics processing unit shares virtual memory with the parent thread. It behaves in the same way as a regular thread in that all standard inter-process synchronization mechanisms, such as Mutex and semaphore, can be used. At the same time, a new shadow thread is created on the host central processing unit 16. This shadow thread works as a proxy for exception handling units and synchronization between threads on the central processing unit and the graphics processing unit.
In some embodiments, the parent thread, the host shadow thread and the graphics processing unit worker threads may share unpinned virtual memory as shown in
Referring to
The graphics processing unit operating system 30 initially receives a page fault as indicated by the word “exception” and the corresponding arrow in
At 50, the shadow thread performs a blocking read to stop other activities until the page fault is resolved. Then the shadow thread 22 receives the page fault data. After checking to see if the page is faulty (diamond 52), the shadow thread reproduces the same access to the faulty address, as indicated a block 54, if the page is faulty. If the page is not faulty, the flow goes to block 58 to check for other exceptions, bypassing block 54. Then the block read is released at 56.
The host operating system 24 handles the page fault in the page fault handler 42. Effectively, the host operating system is tricked into handling the exception for the graphics processing unit. Then the translation lookaside buffer (TLB) may be flushed at 44. A check at diamond 46 determines if the page fault is good, i.e. fixed, in which case it advises the shadow thread 22. Otherwise, a bad page fault is indicated at 48, which may, for example, result in an error.
The shadow thread 22 sends the page fault resolved message (i.e. RESUME EXECUTION) to the driver 26. Then the shadow thread goes to a sleep state waiting for the next message from the driver using another blocking read 56.
The driver 26 receives the resume execution message from the shadow thread and sends a PassGPUCommand to the operating system 30 as indicated by the block 64. The message may include the opcode to resume execution with no data. The operating system 30 marks the thread as ready for execution, as indicated at 68, and returns from the exception by sending a resume message to the gthread 28.
The computer system 130, shown in
In the case of a software implementation, the pertinent code may be stored in any suitable semiconductor, magnetic, or optical memory, including the main memory 132 (as indicated at 139) or any available memory within the graphics processor. Thus, in one embodiment, the code to perform the sequences of
The graphics processing techniques described herein may be implemented in various hardware architectures. For example, graphics functionality may be integrated within a chipset. Alternatively, a discrete graphics processor may be used. As still another embodiment, the graphics functions may be implemented by a general purpose processor, including a multicore processor.
References throughout this specification to “one embodiment” or “an embodiment” mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation encompassed within the present invention. Thus, appearances of the phrase “one embodiment” or “in an embodiment” are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be instituted in other suitable forms other than the particular embodiment illustrated and all such forms may be encompassed within the claims of the present application.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.