Embodiments described herein disclose the use of transactional memory for platform firmware execution regimes in multi-thread or multi-core processing systems. Systems and methods provide concurrent processing for the System Management Mode (SMM) of a multi-core microprocessor or highly parallel processing system using transactional memory (TM). Embodiments allow highly concurrent, contention-free execution of SMM code through the use of hardware and/or software transactional memory to allow multi-thread processing on shared data structures, memory locations, locks, and other shared data resources. The SMI occupancy time can be reduced by parallelizing the SMM flows and using hardware or software transactional memory structures to ensure that lock contention among the parallel flows do not impact task dispatching. Embodiments of the TM implemented SMM code mitigate lock-contention in highly parallel technologies to advantage a platform design with highly parallel firmware/SMM flows.
In one embodiment, executable content in the form of a plurality of software drivers or similar code are loaded into the System Management Mode (SMM) of an Intel® 32-bit family of microprocessor (i.e., IA-32 processors), or the native mode of an Itanium™-based processor with a PMI signal activation, and concurrently executed on multiprocessor computer systems that employ IA-32 and Itanium-based processors. SMM represents one type of execution environment for platform firmware, and other types of firmware execution regimes are also possible.
The state of execution of code in IA32 SMM is initiated by an SMI signal and that in Itanium processors is initiated by a PMI signal; for simplicity, these will generally be referred to as SMM. The mechanism allows for multiple drivers, possibly written by different parties, to be installed for SMM operation. An agent that registers the drivers runs in the EFI (Extensible Firmware Interface) boot-services mode (i.e., the mode prior to operating system launch) and is composed of a CPU-specific component that binds the drivers and a platform component that abstracts chipset control of the xMI (PMI or SMI) signals. The API's (application program interfaces),providing-these sets of functionality are referred to as the SMM Base and SMM Access Protocol, respectively.
In conventional SMM implementations, SMM space is often locked by the platform software/firmware/BIOS via hardware mechanisms before handing off control; this grants firmware the ability to abstract the control and security of this binding. In contrast, the software abstraction via the SMM Access protocol provided by embodiments of the disclosed system obviate the need of users of this facility to know and understand the exact hardware mechanism, thus allowing drivers to be portable across many platforms.
Embodiments of the concurrency mechanisms for SMM described herein include the following features: a library in SMM for the drivers' usage, including an I/O access abstraction and memory allocation services; a means to communicate with drivers and applications executing in non-SMM mode; an optional parameter for periodic activation at a given frequency; a means to authenticate the drivers on load into SMM; the ability to close the registration capability; the ability to run in a multi-processor environment where many processors receive the xMI activation. Embodiments further include a transactional memory for sharing stored resources and mediating shared resource accesses among different requesting processes or threads.
In an optional mode, the EFI SMM base protocol driver may scan various firmware volumes to identify any drivers that are designated for servicing xMI events via SMM. In one embodiment, these drivers are identified by their file type, such as exemplified by a “DRIVER7.SMH” file 25 corresponding to an add-on driver 7. During the installation of the EFI SMM base protocol driver, an SMM Nub 24 is loaded into transactional memory (TM) 26, which can comprise an SMM-only memory space. The SMM Nub 24 is responsible for coordinating all activities while control is transferred to SMM, including providing an SMM library 28 to event handlers that includes PCI and I/O services 30, memory allocation services 32, and configuration table registration 34.
Registration of an SMM event handler is the first operation in enabling the handler to perform a particular xMI event servicing function it is designed to perform. An SMM event handler comprises a set of code (i.e., coded machine instructions) that when executed by a system processor (CPU) performs an event service function in a manner similar to an interrupt service routine. Typically, each SMM event handler will contain code to service a particular hardware component or subsystem, or a particular class of hardware. For example, SMM event handlers may be provided for servicing errors caused by the system's real time clock, I/O port errors, PCI device errors, etc. In general, there may be some correspondence between a given driver and an SMM event handler. However, this is not a strict requirement, as the handlers may comprise a set of functional blocks extracted from a single driver file or object.
When the event handler for legacy driver 1 is registered, it is loaded into TM 26 as a legacy handler 36. A legacy handler is an event handler that is generally provided with the original system firmware and represents the conventional mechanism for handling an xMI event. As each add-on SMM event handler is registered in block 22, it is loaded into an add-on SMM event handler portion 38 of TM 26; once all of add-on event handlers are loaded, add-on SMM event handler portion 28 comprises a set of event handlers corresponding to add-on drivers 2-7, as depicted by a block 42. In addition, as each SMM event handler is registered, it may optionally be authenticated in a block 44 to ensure that the event handler is valid for use with the particular processor and/or firmware for the computer system. For example, an encryption method that implements a digital signature and public key may be used. As SMM event handlers are registered, they are added to a list of handlers 46 stored in a heap 47 maintained by SMM Nub 24.
Once all of the legacy and add-on SMM event handlers have been registered and loaded into TM 26 and proper configuration data (metadata) is written to SMM Nub 24, the TM is locked, precluding registration of additional SMM event handlers. The list of handlers is also copied to a handler queue 48, which may be stored in heap 47 and accessed by SMM Nub 24 or stored directly in SMM Nub 24. The system is now ready to handle various xMI events via SMM.
As shown in
Transactional Memory systems offer an alternative method to lock-based synchronization, and are typically implemented to be lock-free. Transactions are executed as a series of reads and writes to shared memory, which logically occur at a single instant in time. Using TM, every thread completes its modifications to shared memory without regard to the activities of other threads, and read/write operations are recorded in a log. Changes to shared memory for an entire transaction are validated and committed if other threads have not concurrently made changes. A transaction may be aborted, which causes all of its prior changes to be rolled back (undone). If a transaction cannot be committed due to conflicting changes, it is typically aborted and re-executed from the beginning until it succeeds. In general, when using TM, no thread needs to wait for access to a resource, and different threads can simultaneously modify different parts of a data structure that would be protected under the same lock. Through the use of the transactional memory, the SMI occupancy time can be reduced by parallelizing the SMM flows and using the transaction memory to ensure that lock contention among the parallel flows do not impact task dispatching. TM generally features the ability to be implemented on top of cache-coherence protocols and provides transactions with the properties of atomicity (all-or-nothing) and serializability (one-at-a-time order).
In one embodiment, the use of TM mechanisms can be implemented using one or more instructions defined by the compiler. The following code segment illustrates sample code that can implement an HTM-based access, according to an embodiment.
One potential problem with the multi-processor configuration of
Unlike software locking schemes that involves the storage of semaphore data and software exchanges to set/reset the semaphore, the linked list mechanism 55 within the transactional memory 26 allows access to shared resources in an automatically sequential manner that is analogous to hardware buffer accesses. If multiple processes try to access the same resource, access is granted to the first process, and the other processes retry using standard memory access cycles. This mechanism potentially saves a great deal of time over software locking methods, which require a delay until the semaphore corresponding to the accessed resources are cleared.
In one embodiment, the transactional memory 202 is implemented as shared system memory that accessed through Application Program Interfaces (APIs) by the processors (e.g., CPUs 1 and 2). There can be various different access methods corresponding to different APIs. One such method is a Load-Transactional (LT) method in which the value of a shared memory location is read into a register. A second method is a Load-Transactional-Exclusive (LTX) method in which the value is read into a register, and there is an indication that the location read is likely to be updated soon. A third method is a Store-Transactional method (ST) in which a value is tentatively written from a register to a shared memory location, and this value becomes visible to the other processors only when the transaction successfully commits.
Different APIs can also be used to manipulate a transaction state. A transaction, T, is successfully committed only if there are no memory-access conflicts. That is, no other transaction has written locations read or written by T, and no other transaction has read locations written by the T. An abort transaction causes all transaction updates to be discarded. A validate transaction returns the current status of T (i.e., whether T has aborted or not), and discontinues the transaction after it aborts.
As described in embodiments shown herein, the transactional memory mechanism moves critical section management from software to the hardware or data structure. The composing of critical sections on each CPU does not require orchestration by software, but is instead managed by an STM algorithm or the cache/virtual memory subsystem of the HTM.
As shown in
In one embodiment, the system includes a transactional cache to hold the transactional data. For this, each transactional operation (i.e., LT, LTX, ST) caches two copies of the line in the transactional cache. A “committed” copy contains the last committed data, and a “tentative” copy contains the data modified by the transaction. An abort discards all tentative copies, and a commit marks all tentative copies as the latest committed copies. The system also implements a cache coherency protocol to allow two types of access rights, exclusive and non-exclusive, to a location (shared resource). In a read-write conflict, before a processor P can read from a shared location L, it must acquire non-exclusive access to L. Before a second processor Q can write to L, it must acquire exclusive access to L. In a read-write conflict, the process aborts either the first processor's or second processor's transaction. Interrupt signals and overflow conditions can also abort the current transaction.
Request to the shared resources are performed by handlers that translate xMI requests from system firmware/BIOS elements, such as sensors (e.g., temperature, voltage, etc.), hardware components (I/O ports, etc.), processes (e.g., power-up, etc.), and so on, into corresponding requests for access to shared resources. As shown in
As discussed above, SMM Nub 24 is responsible for coordinating activities while the processors are operating in SMM. The various functions and services provided by one embodiment of SMM Nub 24 are graphically depicted in
SMM Nub 24 provides a set of services to the various event handlers through SMM library 28, including PCI and I/O services 30, memory allocation services 32, and configuration table registration services 34. In addition, SMM Nub 24 provides several functions that are performed after the xMI event is serviced. If the computer system implements a multiprocessor configuration, these processors are freed by a function 148. A function 150 restores the machine state of the processor(s), including floating point registers, if required. Finally, a function 152 is used to execute RMS instructions on all of the processors in a system.
A monitor 318 is included for displaying graphics and text generated by software programs that are run by the personal computer and which may generally be displayed during the POST (Power-On Self Test) and other aspect of firmware load/execution. A mouse 320 (or other pointing device) is connected to a serial port (or to a bus port) on the rear of processor chassis 302, and signals from mouse 320 are conveyed to motherboard 308 to control a cursor on the display and to select text, menu options, and graphic components displayed on monitor 318 by software programs executing on the personal computer. In addition, a keyboard 322 is coupled to the motherboard for user entry of text and commands that affect the running of software programs executing on the personal computer.
Personal computer 300 also optionally includes a compact disk-read only memory (CD-ROM) drive 324 into which a CD-ROM disk may be inserted so that executable files and data on the disk can be read for transfer into the memory and/or into storage on hard drive 306 of personal computer 300. If the base BIOS firmware is stored on a re-writeable device, such as a flash EPROM, machine instructions for updating the base portion of the BIOS firmware may be stored on a CD-ROM disk or a floppy disk and read and processed by the computer's processor to rewrite the BIOS firmware stored on the flash EPROM. Updateable BIOS firmware may also be loaded via network 314.
Although the present embodiments have been described in connection with a preferred form of practicing them and modifications thereto, those of ordinary skill in the art will understand that many other modifications can be made within the scope of the claims that follow. Accordingly, it is not intended that the scope of the described embodiments in any way be limited by the above description, but instead be determined entirely by reference to the claims that follow.
For example, embodiments can be implemented for use on a variety of different multiprocessing systems using different types of CPUs, such as Itanium Processors, and so on. Furthermore, although embodiments have been described for the use of transactional memory with SMM code, it should be understood that aspects can cover the use of transactional memory with any type of execution environment for platform firmware, and can cover any runtime modes, such as 16-bit, 32-bit, 64-bit, 128-bit, or more. Embodiments could also be directed to use as a multiprocessor driver, that is, for general boot-time, pre-OS, firmware flows.
For the purposes of the present description, the term “processor” or “CPU” refers to any machine that is capable of executing a sequence of instructions and should be taken to include, but not be limited to, general purpose microprocessors, special purpose microprocessors, application specific integrated circuits (ASICs), multi-media controllers, digital signal processors, and micro-controllers, etc.
The memory associated with system 100, including TM 26, may be embodied in a variety of different types of memory devices adapted to store digital information, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), and/or double data rate (DDR) SDRAM or DRAM, and also non-volatile memory such as read-only memory (ROM). Moreover, the memory devices may further include other storage devices such as hard disk drives, floppy disk drives, optical disk drives, etc., and appropriate interfaces. The system may include suitable interfaces to interface with I/O devices such as disk drives, monitors, keypads, a modem, a printer, or any other type of suitable I/O devices.
Aspects of the methods and systems described herein may be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (“PLDs”), such as field programmable gate arrays (“FPGAs”), programmable array logic (“PAL”) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits. Implementations may also include microcontrollers with memory (such as EEPROM), embedded microprocessors, firmware, software, etc. Furthermore, aspects may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. Of course the underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (“MOSFET”) technologies like complementary metal-oxide semiconductor (“CMOS”), bipolar technologies like emitter-coupled logic (“ECL”), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, etc.
While the term “component” is generally used herein, it is understood that “component” includes circuitry, components, modules, and/or any combination of circuitry, components, and/or modules as the terms are known in the art.
The various components and/or functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signaling media or any combination thereof. Examples of transfers of such formatted data and/or instructions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mail, etc.) over the Internet and/or other computer networks via one or more data transfer protocols.
Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list; all of the items in the list; and any combination of the items in the list.
The above description of illustrated embodiments is not intended to be exhaustive or limited by the disclosure. While specific embodiments of, and examples for, the systems and methods are described herein for illustrative purposes, various equivalent modifications are possible, as those skilled in the relevant art will recognize. The teachings provided herein may be applied to other systems and methods, and not only for the systems and methods described above. The elements and acts of the various embodiments described above may be combined to provide further embodiments. These and other changes may be made to methods and systems in light of the above detailed description.
In general, in the following claims, the terms used should not be construed to be limited to the specific embodiments disclosed in the specification and the claims, but should be construed to include all systems and methods that operate under the claims. Accordingly, the method and systems are not limited by the disclosure, but instead the scope is to be determined entirely by the claims. While certain aspects are presented below in certain claim forms, the inventors contemplate the various aspects in any number of claim forms. Accordingly, the inventors reserve the right to add additional claims after filing the application to pursue such additional claim forms for other aspects as well.