The present disclosure generally relates to systems and methods for processing of memory access instructions, such as load and store instructions, in a processor-based system. More specifically, the present disclosure relates to processor hardware pipeline configurations for enabling a memory access operation request by a register-operand based virtual machine to extract a memory location to be accessed and to perform the requested memory access operation (e.g., load or store) on the extracted memory location in a single pass through the pipeline.
A virtual machine (VM), sometimes referred to as a process VM or application
VM, runs as an application inside an operating system (OS) and supports a process. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware or OS, and allows a program to execute in the same way on any platform. Thus, a VM generally provides a high-level abstraction—that of a high-level programming language. Such VMs may be implemented using an interpreter; and performance comparable to compiled programming languages may be achieved in some instances by the use of just-in-time (JIT) compilation, for example.
VMs may be implemented using a stack-based model (such as the Java Virtual Machine (JVM)) or a register-based model. In a stack-based model, most instructions implicitly operate on values at the top of the stack and replace those values with the result. Such stack-based VMs may also have a “load” and a “store” instruction that reads and writes to arbitrary memory locations. Like all other instructions, the “load” and “store” instructions in stack-based VMs need no operands because they take the memory address from the top of the stack. Register-based models, on the other hand, generally employ a register-based addressing scheme in which a VM Instruction (VmI) of a program executing on the VM includes virtual register operands. The VmI in turn is loaded into the target processor hardware register “Rm” for further processing leading to the execution of VmI using a sequence of target processor instructions (TpI).
Dalvik, as one example, is a register-based virtual machine that runs a Java-like platform on Android mobile devices. Dalvik runs applications that have been converted into a compact Dalvik Executable (.dex) format suitable for systems constrained in terms of memory and processor speed. A tool called dx is used to convert Java .class files into the .dex format. Multiple classes may be included in a single .dex file. Duplicate strings and other constants in multiple class files are included only once in the .dex output to conserve space. Java bytecode is also converted into an alternate instruction set used by the Dalvik VM. An uncompressed .dex file can be a few percent smaller in size than a compressed .jar (Java Archive) derived from the same .class files.
As further shown, the memory 102 includes data 105 that may be accessed by the VM program 104. For instance, loads and stores may be performed for reading data from a referenced memory location (e.g., from SRC1 and SRC2) and/or for writing data to a referenced memory location (e.g., Dest). A register operand based interpreter of the VM may be employed, where such interpreter determines/computes the addresses of the referenced operands (e.g., SRC1, SRC2, Dest), which may be referred to as “address generation” or “address determination.” Generally, the address is determined through extracting the bit-fields for the operands in VmI 110 (now available in processor register Rm) and then applying an affine transformation. An affine function of the extracted bit-field (for the operand) provides the index where the actual data for the particular operand is stored in memory. Hence, an affine function of the operand field is generally the offset in the “base+offset” addressing mode of LOAD operations used to load the operand value from memory.
Conventionally, memory access operations (LOADS and STORES) for register operand based addressing on a VM are each two-pass operations. That is, to perform a LOAD of a value from a referenced memory location or to perform a STORE of a value to a referenced memory location by a register-based VM, two separate instructions are required to be executed.
For example, suppose that a program 104 executing on the VM 103 includes in its sequence of instructions, a Dalvik ADD operation, such as shown in the exemplary register Rm of
As can be seen above, the memory access operations of loading the referenced value SRC1, loading the referenced value SRC2, and storing the computed result of the ADD operation (RD to the referenced Dest memory location each require two instructions, and thus two passes through the processor's pipeline.
Ins1→R1=extract(Rm, width1, offset1);
Ins2→Rx=load(Rbase+R1<<C);
Ins3→R2=extract (Rm, width2, offset2);
Ins4→Ry=load(Rbase+R2<<C);
The acronyms used for the pipeline stages in
IF=Instruction fetch,
D & RR=Decode and Register read,
DC=Data computation,
RW=Register Write,
ADDRC=Address Computation, and
MA=Memory Access.
In this example, data computation and address computation are the same stage when it is the same pipeline. MA is an additional Stage for Load/Store instructions.
The illustrated instruction includes two LOAD operations, one to load operand
Rx and one to load operand Ry. As shown in
Ins1→R1=extract(Rm, width1, offset1);
Ins2→Rx=load(Rbase+R1<<C);
Ins3→R2=extract (Rm, width2, offset2);
Ins4→Ry=load(Rbase+R2<<C);
As shown in
Various types of fused operations have been proposed in the past. For example, many fused operations have been proposed for performing computations, such as fusing operations together for performing addition, multiplication, or other computations in a single “fused” operation. However, while various types of fused operations have been proposed in the prior art, a proposal for fusing address extraction (for a register-based operand) and memory access (e.g., LOAD or STORE) into an operation that is performed in a single pass of a processor pipeline has gone unrecognized
The present disclosure generally relates to systems and methods for processing of memory access instructions, such as load and store instructions, in a processor-based system. Certain aspects of the present disclosure relate more specifically to processor hardware pipeline configurations for enabling efficient performance of memory access instructions, such as a pipeline configuration that enables, for a memory access operation request by a register-operand based virtual machine, extraction of a memory location to be accessed and performance of the requested memory access operation (e.g., load or store) on the extracted memory location. The instructions are performed in a single pass through the pipeline.
In one aspect of the present disclosure, a method to address and access an electronic memory includes receiving a microprocessor instruction in a processor having a hardware pipeline. The method also includes, responsive to the microprocessor instruction, computing a memory address to access, and executing a memory access operation on the computed address both in a single-pass through the hardware pipeline.
In yet another aspect, a method for accessing an electronic memory includes, responsive to a register-operand based virtual machine (VM) instruction for performing a memory access operation, performing derivation of a memory address with a bit-field value extracted from a processor register, combined with other registers or constants and a memory access operation at the derived memory address in a single pass through a processor's hardware pipeline.
In one aspect of the present disclosure, a method for performing a memory access operation by a register-operand based virtual machine is provided. The method includes receiving a single instruction to perform (i) an address extraction operation to extract an address and (ii) a memory access operation on the extracted address. The method also includes executing the single instruction to extract the memory address and to perform the memory access operation on the extracted memory address. In certain embodiments, the executing includes performing the address extraction operation and the memory access operation in a single pass through a processor's hardware pipeline.
In another aspect of the present disclosure, a system has an electronic memory; and a register-operand based virtual machine having an address mode enabling use of a single instruction to perform (i) an address extraction operation for extracting an address and (ii) a memory access operation on the extracted address. The system also has a processor having a defined instruction pipeline, configured to enable the address extraction operation and the memory access operation to be performed in a single pass through the instruction pipeline.
In yet another aspect, system has a processor having a hardware pipeline configured to perform, in a single pass through the hardware pipeline, both computation of a memory address of a virtual machine (VM) register that first involves extracting a bit-field value from a VM instruction already present in a processor register, and accessing the computed memory address.
For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings.
As described above, conventional memory access operations for register operand based addressing on a virtual machine (VM) are each two-pass operations. Using two instructions, however, degrades performance, wastes power, etc. Thus it is desirable to design a system where a load or store could be completed in a single instruction.
Certain embodiments of the present disclosure enable a single-pass instruction to be implemented for performing address extraction (for a register-based architecture) and a memory access operation. For instance, in certain embodiments, an instruction and a processor hardware pipeline configuration are provided that enable extraction of an address to be accessed and a memory access operation (LOAD or STORE) on the extracted address to be performed by a register-operand based VM in a single pass through the processor's pipeline. Various exemplary pipeline configurations that may be implemented for a processor to enable such single-pass address extraction and memory access operation for a register-based instruction are disclosed.
According to one embodiment, the following exemplary single-instruction address extraction and memory access operations for performing a load or a store may be employed:
Rx=LOAD(Rb+extract(Rm, width, offset));
STORE(Rb+extract(Rm, width, offset))=Rx.
Thus, the determination (or “extraction”) of the memory location to be accessed, and the performance of the memory access operation (e.g., load or store) may be combined into a single instruction, and those operations (address extraction and memory access of the extracted address) may be performed in a single pass through the processor's hardware pipeline. As described further, not only are the extract and memory access operations combined in certain embodiments, which, without the concepts present herein are conventionally done separately by two independent target processor instructions, but a processor hardware pipeline configuration may be implemented that enables the new instruction (i.e., the combined extraction and memory access operations) to be performed in a single pass through the pipeline.
Thus, in accordance with one embodiment, the above-mentioned Dalvik ADD operation, as one example, may be reduced to:
I. R2=LOAD(Rb+extract(Rm, width1, offset1)) ** single-pass LOAD **
II. R3=LOAD(Rb+extract(Rm, width2, offset2)) ** single-pass LOAD **
III. R1=R2 ADD R3
IV. STORE(Rb+extract(Rm, width3, offset3))=R1 ** single-pass STORE **
As can be seen above, the memory address extract operation may be embedded within the memory access operation, thereby enabling a single-instruction for performing an address extraction and a memory access of the extracted address for a register-operand based VM. Further, as discussed below, various different pipeline configurations may be employed for enabling the single VM instruction to be performed in a single pass through the pipeline.
An exemplary implementation of a pipeline configuration is shown as a pipeline 400 in
A logical representation of the operations performed in stage 402 is shown in block-diagram form as operational block 402A. That is, block 402A shows a block diagram illustrating the operations performed in stage 402 of the pipeline 400. That is, a decode 416 occurs, after which the register Rx is read 418 and the register Ry is read 420. Similarly, operational block 403A shows the logical operations performed in stage 403. Operational block 403A will be described in more detail below.
In this exemplary pipeline configuration 400, a generic “extract bits” operation 410 (for extracting a memory location for a referenced operand) may be fully implemented in various stages of the pipeline, either as a single stage operation 410 or split into a first stage of bit extraction 412 and a second stage of bit extraction 414. The integration of this new functionality into the pipeline provides a benefit to register-operand based virtual machines.
Choosing a location of the Extract Bits functionality in different decode/address computation stages is a matter of design choice offering different advantages. A number of embodiments for placement of the Extract Bits functionality are described below.
In one embodiment the Extract Bits functionality of operational block 410 is integrated as new functionality into the address-computation stage 403 of the processor pipeline 400. The Extract Bits operation may be inserted before the shifting block 422 in the address-computation stage 403 as shown by line 430. In this embodiment one of the fields of the VM instruction is extracted, shifted by a constant C (in block 422) and added to a hardware register Rx (in operational block 424). The result is the address of the virtual register in memory and will be accessed for loading/storing. The exemplary embodiment described above would support extraction of any bit field from the VM instruction, as shown in the exemplary implementation of
If the fields of the VM instruction are fixed-width and at fixed location, then the Extract Bits operation can be just a set of multiplexers, as shown in
In one embodiment a modified Extract Bits functionality of operational block 410 is integrated into the existing logical shift left operation 422. Incorporating Extract Bits functionality with the logical shift left allows for an implementation with fewer gates and faster cycle time. In this embodiment the modified Extract Bits functionality may only support a limited set of widths and offsets, examples of which are shown in
In another embodiment the Extract Bits functionality of operational block 410 is integrated as new functionality into the decode & register read stage 402 of the processor pipeline 400. The Extract Bits operation may be inserted after the Read Ry block 420 in the decode & register read stage 402 as shown by line 432. This implementation may be used if one determined that there is slack in the decode stage 402 of a given system such that it can still meet cycle timing with this extract operation in the path.
In another embodiment the Extract Bits functionality of operational block 410 is split into two pipeline stages, a first stage 412 and a second stage 414. For example, if two levels of multiplexing are needed to extract bits in a given system implementation, the first level might happen in the decode stage 402 and the second in the address-compute stage 403. Similarly, when the extract-bits operation is implemented by a cascade of two shifters (LSL, followed by LSR), as in the example shown in
In another embodiment the address computation stage 403 may be split into two stages where the first address computation stage contains the Extract Bits functionality 410 and the second address computation stage contains the logical functionality shown in block 403A. This alternative may be desirable for a highest-speed implementation. In this configuration, the first address computation stage may include the shift operation block 422. This configuration may be useful to balance the two stages of address computation.
The above implementations configure the processor's hardware pipeline to support both extracting a memory location for a referenced operand and accessing (performing a load or store) the extracted memory location in a single pass through the pipeline.
Ins5→Rx=load(Rbase+extract(Rm, width1, offset1)<<C);
Ins6→Ry=load(Rbase+extract(Rm, width2, offset2)<<C).
The acronyms used for the pipeline stages in
IF=Instruction fetch,
D & RR=Decode and Register read,
RW=Register Write,
ADDRC=Address Computation, and
MA=Memory Access.
Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
In one or more exemplary designs, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code means in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Although the present disclosure and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the technology of the disclosure as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present disclosure. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.