The present invention relates to embedded processor cores and is particularly concerned with power savings at lower clock rates by implementing multiple width, merged architecture processing units.
Most modern electronic devices from cell phones, to DVD players, to high-speed computers, rely extensively on embedded processor cores to provide the flexibility and function for a continually complex environment. As functional complexity increases the embedded cores are required to provide increased processing power. Traditionally this required increase is accomplished through either creating a wider processor core that processes more data per instruction, increasing the clock rate to process more instructions per unit of time, or a combination of both techniques.
Power dissipation for embedded processors is important, especially for battery powered devices like cell phones and tablet computers. Every time a switching element within a design switches, power is dissipated. Designs that switch faster, with a higher clock rate, dissipate more power per unit of time. Designs that switch more elements every clock cycle dissipate more power per clock cycle. Therefore, processor internal power dissipation is a function of the number of switching elements (Clock Fan Out) and the number of clock cycles per unit of time (Clock Frequency). When processing power is increased, by increasing the clock frequency, the internal power dissipation is equally increased due to the increased number of switches per unit of time. When processing power is increased, by implementing a wider core or more switching elements per clock, the internal power dissipation is equally increased due to the increased clock fan out. Power is dissipated through the clock fan out tree even when the switching element does not switch.
Traditional power saving processor designs have reduced power by turning off the clock until an interrupt event happens that requires the processor to process some data. Many times this entails just a register check and does not require the full processing capability of the processor, however, the full clock fan out must be switched, which uses the full power requirement of the processor.
A second solution is to completely shut off the clock fan out tree and the PLL (Phase Locked Loop) that drives it. This typically requires a significant amount of time to restart the clock and is not conducive to multiple starts and stops in a short period of time.
There is thus demonstrated a need for an improved embedded processor architecture.
It is therefore an object of the present invention to provide an improved embedded processor architecture.
Other objects and advantages of the present invention will become obvious to the reader and it is intended that these objects and advantages are within the scope of the present invention. To the accomplishment of the above and related objects, this invention may be embodied in the form illustrated in the accompanying drawings. Attention is called to the fact, however, that the drawings are illustrative only, and that changes may be made in the specific construction illustrated and described within the scope of this disclosure.
In accordance with an aspect of the present invention there is provided a multiple width, merged architecture, embedded processor core which is compatible with multiple sets of industry standard instructions. The core has two distinct modes within one load/store context. A reduced width mode is used for low power “book keeping” instructions. A wider, faster, and higher power mode is used for required processing.
The multi-width embedded core is a synthesizable core using an industry standard instruction set and architecture. The architecture is a traditional load/store architecture. The memory containing the instructions codes and the data is tightly coupled to the processor and is accessed on separate, 32 bit buses.
Referring to
Referring to
Referring to
When additional processing is needed, 32 bit mode can be entered by either calling a 32 bit mode function as defined in the interrupt table, or by setting the 32 bit mode bit which causes the processor to jump to the code address location defined by the DPTR register in the processor register space.
32 bit mode is the higher power mode. 32 bit instructions are compatible with an industry standard 16 bit instruction set for 32 bit processors. In this mode all instructions are 16 bits wide and 32 bits of working memory or ALU are accessed at a time. The internal working memory is directly available to the 32 bit mode instructions by overlapping a moveable, lower 8 bit register window on top of the working memory. This memory window allows the 32 bit register window to access all the 256 byte working memory. This mode accesses the internal working memory, four 8 bit bytes at a time.
When the 32 bit processing has completed, the processor is returned to 8 bit mode by the software. The software can return to 8 bit mode in one of two ways. If 32 bit mode was entered through a hardware or software interrupt, a “return from interrupt” instruction will disable 32 bit mode and continue processing from the vector table, then returning to the calling location. If 32 bit mode was entered by setting the 32 bit mode bit in the status register, software will clear the 32 bit mode bit. This will allow the processor to continue processing 8 bit instructions at the 8 bit program counter location.
Software program development for the core requires two industry standard compilers. The software is developed in an industry standard high level language like C++. At compile time the source code is pre-processed into two separate groups, 8 bit and 32 bit as defined in PRAGMA's in the code. Each set of high level code is routed to the appropriate compiler. At link time the two sets of code objects are linked to different locations within the instruction memory. The different sets of code are then used as described above.
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the present invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiment was chosen and described in order to best explain the principles of the present invention and its practical application, to thereby enable others skilled in the art to best utilize the present invention and various embodiments with various modifications as are suited to the particular use contemplated.