The present invention relates generally to memory operations in microprocessors for both RISC (load store architectures) and CISC (memory array architectures) type computers. Specifically, a method and mechanism for moving the contents of a register file belonging to an execution mode both to and from memory is described.
Many modern high-performance microprocessors offer a programming model that supports multiple execution modes or multiple execution states. For example, application programs or software processes running in a multitask operating system environment may run or execute in dedicated execution modes. Different execution modes or execution states may have a variety of different privilege levels.
In a multitask environment, the operating system shares the processor among the various processes which may execute in different execution states. This processor sharing is implemented by switching between processes and execution states. For example, each process is allocated a fixed time period by the operating system, and the operating system then switches to another process or execution state. This switching is also known as context switching.
Each process operates on a fixed set of registers within the processor architecture. Referring to
A context switch is implemented by swapping out the register contents or register file for the current process or execution state, and swapping in the register file associated with the next process or execution state. A process or execution list is scheduled and the current register file is swapped with a shadow register file when a context switch occurs. The current register file is loaded into the shadow register and the next register file is loaded from a shadow register file into the register set or register structure for the next process or execution state.
A dedicated cache may be used to store a register file. However, the disadvantage of this approach is that extra cache hardware is required. U.S. Patent Application Publication No. 2003/0051124A1 to Dowling entitled “Virtual Shadow Registers and Virtual Register Windows” describes multiple register sets controlled by a dedicated hardware circuit to perform a fast register set save and restore operation. However, additional interface circuitry must be built into the processor core, and an entire register file or register set is selected and switched for each operation.
Typically, when a particular execution mode is running or operating, all of the operations that are performed on the application registers belong to a particular execution state. However, it may be useful to allow certain processes, operations, or other execution states to access or manipulate a register file or selected registers belonging to a particular execution state. Context switch latency affects the execution time of a process because during this process, the processor remains idle. What is needed is an ability to allow one or more execution states to flexibly operate on a register file or operate on selected registers within a register file that belongs to another execution state without requiring a task switch that swaps an entire register file.
An exemplary embodiment of the present invention provides at least one additional instruction to a processor's instruction set. The instruction either loads data (content) from memory into a shadow register, or stores data from a shadow register to memory. The instruction may load several shadow registers with content found in a continuous memory space, or store the content from several shadow registers to a contiguous data space.
One advantage of the present invention is an overall speed improvement for task or context switching for multitask operating systems. A microprocessor system is not required to switch execution states or execution modes before copying the contents of a register file to memory. Also, instead of switching tasks, or copying an entire register set, a single instruction may identify a single shadow register or register range associated with an inactive execution state and copy either the content of one or more shadow register to a memory location (or range), or copy a memory location (or range) to one or more shadow register. Additionally, the instruction or method may be used for debugging purposes where the content of one or several register sets or register files may be copied to memory.
An exemplary core processor will normally load, decode, and execute instructions. An exemplary processor system or computing system contains the core processor and other functional units such as memory, for example, a RAM or a cache memory. The architecture of the core processor is configured to support shadow registers.
A single instruction, included in a processor's instruction set, loads data (content) related to another execution state from memory into at least one shadow register, or a single instruction stores data (content) related to another execution state from a shadow register to memory. The instruction may load several shadow registers with content found in a continuous memory space, or store the content from several shadow registers to a contiguous data space. The instruction may support an optional parameter that specifies which register bank or shadow register file to load data into (or store data from) or specifies a particular shadow register, multiple shadow registers, or a shadow register range. The instruction may also support an optional parameter specifying the size of the data to be loaded or stored, the size of a register, and whether memory data are a shorter size than the register size, or should be zero or sign extended. The instruction may be optionally restricted to only operate in privileged modes.
Two additional instructions are added to the core processor. Each instruction provides an improved and flexible method or mechanism to speed up the transfer of information between different execution states, for example, execution states that are controlled by an operating system. Also, implementation of each processor instruction minimizes an amount of added circuitry to the processor core in comparison to adding dedicated cache, adding multiple hardware registers, or adding dedicated multiplexer or select circuitry.
A first instruction, executed by an active process or active execution state, loads a content of an identified or designated memory location or multiple memory locations to a shadow register file that corresponds with an inactive execution state. A second instruction, executed by an active process or active execution state, stores a content of an identified or designated shadow register or shadow register range that correspond with another inactive execution state, to an identified or designated memory location or multiple memory locations. Generally, a pointer register, within the register set belonging to the current execution state, is identified. The pointer register contains a memory location where data or content from a designated shadow register will be stored, or where the data or content of the addressed memory will be loaded into a designated shadow register.
Referring to
A processor architecture provides shadow registers (not shown in
A first exemplary instruction (for example, Load Multiple Registers for Task Switch, or LDMTS) loads the content of a memory location into a specified shadow register file. The register file, for example, may be in a hardware memory accessed by activating a multiplexer or decode circuit, within a memory range in addressed memory, or in a cache memory. Referring to
An exemplary first instruction name, syntax, and pseudo code is listed below.
LDMTS—Load Multiple Registers for Task Switch
Description: Loads the consecutive words pointed to by Rp into the registers specified in the instruction. The target registers reside in the Application Register Context, regardless of which context the instruction is called from. If the opcode field [++] is set, an optional write-back of the updated pointer value may be performed. Rp;
* (Loadaddress+ +);
Loadaddress;
The single load LDMTS instruction loads any consecutive words pointed to by an identified register pointer (Rp), from the register set associated with a current active execution state, into identified shadow registers (Reglist16) associated with an inactive execution state. In one embodiment, the target shadow registers may reside in an Application Register Context that is controlled by an operating system, regardless of which context the LDMTS instruction is called from. In another embodiment, the program counter (PC) may be loaded, resulting in a jump to the loaded value. Also, for example, parameters may be set to perform a variety of alternate operations, for example, if the opcode field [++] (bit 25), is set an optional write-back of an updated pointer value may be performed.
Referring to
A second exemplary instruction (for example, Store Multiple Registers for Task Switch or STMTS) stores the content of a specified shadow register file into a specified memory location. The register file, for example, may be in a hardware memory accessed by activating a multiplexer or decode circuit, within a memory range in addressed memory, or in a cache memory. Referring to
An exemplary second instruction name, syntax, and pseudo code is listed below.
STMTS—Store Multiple Registers for Task Switch
Description: Stores the registers specified to the consecutive memory locations pointed to by Rp. The registers specified reside in the application context. If the opcode field [−−] is set, an optional write back of the updated pointer value may be performed. Rp;
RiAPP;
Storeaddress;
RiAPP;
The single store STMTS instruction stores the content of specified consecutive register(s) to consecutive memory locations pointed to by an identified register pointer (Rp), from the register set associated with a current active execution state, into identified shadow registers (Reglist16) associated with an inactive execution state. In one embodiment, all the registers reside in the application context that is controlled by an operating system. In an alternate embodiment, parameters may be set to perform a variety of alternate operations, for example, if the opcode field [−−] (bit 25) is set, a series of store operations are performed while decrementing a memory address pointer, and the memory address pointer may optionally be written back. In another embodiment, when the opcode field [−−] (bit 25) is cleared, the memory address pointer is incremented and no write-back is performed.
Referring to
For the load instruction, the content of a memory location range (multiple memory locations) may be read and copied to a shadow register file. For the store instructions, the content of a shadow register file (multiple shadow register content) may be copied to a memory location range. For example, referring to
Exemplary embodiments of additional instructions to a processor's instruction set that loads data from memory into a shadow register, or that stores data (content) from a shadow register to memory are presented. The instruction may load several shadow registers with content found in a continuous memory space, or store the content from several shadow registers to a contiguous data space. Those of skill in the art will recognize that the invention can be practiced with modification and alteration within the spirit and scope of the appended claims and many other embodiments will be apparent to those of skill in the art upon reading and understanding the description presented herein. For example, alternate op-codes, naming conventions, and syntax may be used. Also, although the operations are performed by a single instruction, the single instruction may be co-executed with other instructions in a pipelined processor system. The instructions may be implemented by or included in the instruction set of RISC, CISC, or other processor types. The number of registers or types of registers may vary. For example, each register file may contain 16 registers (R0-R15) having a program counter (PC) residing in R15. Although shadowing of application registers are described, other register types such as supervisor or interrupt register sets may also be included as targets for the described instructions. In addition, other architectures or processor implementations that support shadow registers may be used. Therefore, the description is to be regarded as illustrative instead of limiting.