I. Field
The present disclosure generally relates to digital signal processors and devices that use such processors. More particularly, the disclosure relates to digital signal processor register files.
II. Description of Related Art
Advances in technology have resulted in smaller and more powerful personal computing devices. For example, there currently exist a variety of portable personal computing devices, including wireless computing devices, such as portable wireless telephones, personal digital assistants (PDAs), and paging devices that are small, lightweight, and easily carried by users. More specifically, portable wireless telephones, such as cellular telephones and IP telephones, can communicate voice and data packets over wireless networks. Further, many such wireless telephones include other types of devices that are incorporated therein. For example, a wireless telephone can also include a digital still camera, a digital video camera, a digital recorder, and an audio file player. Also, such wireless telephones can include a web interface that can be used to access the Internet. As such, these wireless telephones include significant computing capabilities.
Typically, as these devices become smaller and more powerful, they become increasingly resource constrained. For example, the screen size, the amount of available memory and file system space, and the amount of input and output capabilities may be limited by the small size of the device. Further, the battery size, the amount of power provided by the battery, and the life of the battery is also limited. One way to increase the battery life of the device is to reduce the amount of time that a digital signal processor within the device is idle while the device is powered on.
Accordingly it would be advantageous to provide an improved digital signal processor for use in portable communication devices.
A processor device is disclosed and includes a memory and a sequencer that is responsive to the memory. The sequencer can support very long instruction word (VLIW) instructions and superscalar instructions. The processor device further includes a first instruction execution unit responsive to the sequencer, a second instruction execution unit responsive to the sequencer, a third instruction execution unit responsive to the sequencer, and a fourth instruction execution unit responsive to the sequencer. Further, the processor device includes a plurality of register files and each of the plurality of register files includes a plurality of registers. The plurality of register files are coupled to the sequencer and coupled to the first instruction execution unit, the second instruction execution unit, the third instruction execution unit, and the fourth instruction execution unit.
In a particular embodiment, each of the plurality of register files is a unified non-partitioned register file. Further, in a particular embodiment, each of the plurality of register files is a single file that includes at least sixteen data registers. In another particular embodiment, each of the plurality of register files includes thirty-two registers and each of the thirty-two registers includes thirty-two bits. In yet another particular embodiment, each of the plurality of register files includes at least one data operand and at least one address operand.
Further, in still another particular embodiment, the plurality of register files comprises six register files. Also, the memory within the processor device includes six instruction caches and each instruction cache is associated with one of the six register files. In a particular embodiment, the memory includes six instruction queues. Each instruction queue is associated with a single instruction cache within the memory and each instruction queue is coupled to the sequencer.
In yet another particular embodiment, at least one of the instruction execution units is a data shifting unit and another of the instruction execution units is a multiply and accumulate unit. In another particular embodiment, at least one of the instruction execution units is a load unit that retrieves data from the register file. Further, in another particular embodiment, at least one of the instruction execution units is a load and store unit that has an interface to the register file to receive data from the register file and to write data to the register file. In another particular embodiment, the sequencer is coupled to the memory via a sixty-four bit bus and the sequencer is configured to retrieve instructions having a length of thirty-two bits.
In another embodiment, a method of operating a digital signal processor is disclosed and includes fetching an instruction from an instruction cache and accessing a unified non-partitioned register file associated with the instruction cache. The unified non-partitioned register file includes one or more data operands and one or more address operands. The method further includes retrieving a data operand or an address operand associated with the instruction from the unified non-partitioned register file.
In yet another embodiment, a multithreaded processor device is disclosed and includes a memory, a sequencer responsive to the memory, and a plurality of instruction execution units responsive to the sequencer. The multithreaded processor device further includes a first unified non-partitioned register file that includes a first plurality of registers. The first unified non-partitioned register file is coupled to the memory and coupled to each of the plurality of instruction execution units. Also, the first unified non-partitioned register file supports execution of a program instruction of a first program thread and the first unified non-partitioned register file includes at least one data operand and at least one address operand. Additionally, the multithreaded processor device includes a second unified non-partitioned register file that includes a second plurality of registers. The second unified non-partitioned register file is coupled to the memory and is coupled to each of the plurality of instruction execution units. Further, the second unified non-partitioned register file supports execution of a program instruction of a second program thread and the second unified non-partitioned register file includes at least one data operand and at least one address operand.
In still another embodiment, a portable communication device is disclosed and includes a digital signal processor. The digital signal processor includes a memory, a sequencer that is responsive to the memory, at least one instruction execution unit that is responsive to the sequencer, and a plurality of unified non-partitioned register files that are coupled to the memory and that are coupled to the at least one instruction execution unit. Each of the plurality of unified non-partitioned register files includes at least one data operand and at least one address operand.
In yet still another embodiment, an audio file player is disclosed and includes a digital signal processor, an audio coder/decoder (CODEC) that is coupled to the digital signal processor, a multimedia card that is coupled to the digital signal processor, and a universal serial bus (USB) port that is also coupled to the digital signal processor. The digital signal processor includes a memory, a sequencer that is responsive to the memory, at least one instruction execution unit that is responsive to the sequencer, and a unified non-partitioned register file that is coupled to the memory and that is coupled to the at least one instruction execution unit. The unified non-partitioned register file includes at least one data operand and at least one address operand.
In still yet another embodiment, a processor device is disclosed and includes means for fetching an instruction from an instruction cache and means for accessing a unified non-partitioned register file associated with the instruction cache. The unified non-partitioned register file includes one or more data operands and one or more address operands. Further, the processor device includes means for retrieving at least one of the data operands or at least one of the address operands associated with the instruction.
An advantage of one or more embodiments disclosed herein can include using multiple resources multiple times for different processor threads.
Another advantage can include substantially simplified access to the data operands and address operands.
Still another advantage can include substantially reducing problems that are associated with multiple software programs requiring access to register files for data operands and address operands.
Other aspects, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.
The aspects and the attendant advantages of the embodiments described herein will become more readily apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings wherein:
In a particular embodiment, the memory 102 includes a first instruction cache 122, a second instruction cache 124, a third instruction cache 126, a fourth instruction cache 128, a fifth instruction cache 130, and a sixth instruction cache 132. During operation, the instruction caches 122, 124, 126, 128, 130, 132 can be accessed independently of each other by the sequencer 104. Additionally, in a particular embodiment, each instruction cache 122, 124, 126, 128, 130, 132 includes a plurality of instructions, instruction steering data for each instruction, and instruction pre-decode data for each instruction.
As illustrated in
During operation, the sequencer 104 can fetch instructions from each instruction cache 122, 124, 126, 128, 130, 132 via the instruction queue 134. In a particular embodiment, the sequencer 104 fetches instructions from the instruction queues 136, 138, 140, 142, 144, 146 in order from the first instruction queue 136 to the sixth instruction queue 146. After fetching an instruction from the sixth instruction queue 146, the sequencer 104 returns to the first instruction queue 136 and continues fetching instructions from the instruction queues 136, 138, 140, 142, 144, 146 in order.
In a particular embodiment, the sequencer 104 operates in a first mode as a 2-way superscalar sequencer that supports superscalar instructions. Further, in a particular embodiment, the sequencer also operates in a second mode that supports very long instruction word (VLIW) instructions. In particular, the sequencer can operate as a 4-way VLIW sequencer. In a particular embodiment, the first instruction execution unit 108 can execute a load instruction, a store instruction, and an arithmetic logic unit (ALU) instruction. The second instruction execution unit 110 can execute a load instruction and an ALU instruction. Also, the third instruction execution unit can execute a multiply instruction, a multiply-accumulate instruction (MAC), an ALU instruction, a program redirect construct, and a transfer register (CR) instruction.
As depicted in
During operation of the digital signal processor 100, instructions are fetched from the memory 102 by the sequencer 104, sent to designated instruction execution units 108, 110, 112, 114, and executed at the instruction execution unit 108, 110, 112, 114. Further, one or more operands are retrieved from the general register 116, e.g., one of the unified register files 148, 150, 152, 154, 156, 158 and used during the execution of the instructions. The results at each instruction execution unit 108, 110, 112, 114 can be written to the general register 116, i.e., to one of the unified register files 148, 150, 152, 154, 156, 158.
Referring to
In a particular embodiment, one or more instructions can be associated with the unified non-partitioned register file 200. Further, during the execution of each instruction, the unified non-partitioned register file 200 associated with each instruction can be accessed via the four read ports 206, 208, 210, 212 and the three write ports 214, 216, 218. However, due to the interleaved multithreading method described below, more than four operands for the instruction can be retrieved from the unified non-partitioned register file 200 via the four data read ports 206, 208, 210, 212.
Referring now to
At block 324, also during the decode clock cycle 308, the sequencer begins a full decode for the instruction. The full decode performed by the sequencer occurs within the second portion of the decode clock cycle 308 and the first portion of the register file access clock cycle 310.
During the register file access clock cycle 310, at block 326, the sequencer generates an instruction virtual address (IVA). Thereafter, at block 328, the sequencer performs a page check in order to determine the physical address page associated with a virtual address page number. Moving to the first execution clock cycle 312, at block 330, the sequencer performs an instruction queue lookup. At block 332, the sequencer accesses an instruction cache a first time and retrieves a first double-word for the instruction. In a particular embodiment, each instruction includes three double-words, e.g., a first double-word, a second double-word, and a third double-word. At block 334, during the first execution clock cycle 312, the sequencer aligns the double-word coming from the instruction cache.
Continuing to the second execution clock cycle 314, the sequencer accesses the instruction cache a second time in order to retrieve the second double-word for the instruction at block 336. Next, at block 338, the sequencer aligns the double-word retrieved from the instruction cache.
Proceeding to the third execution clock cycle 316, the sequencer accesses the instruction cache a third time in order to retrieve a third double-word at block 342. After the sequencer accesses the instruction cache the third time, the sequencer aligns the third double-word, at block 344.
As illustrated in
At block 356, during the second execution clock cycle 314, a data translation look-aside buffer (DTLB) performs an address translation for the first virtual address in order to generate a first physical address. Still within the second execution clock cycle 314, at block 358, the sequencer performs a tag check.
Moving to the third execution cycle 316, the sequencer accesses a data cache static random access memory (SRAM) in order to read data out of the SRAM, at block 360. Also, within the third execution cycle, at block 362, the sequencer updates the register file associated with the instruction a first time via a first data write port. In a particular embodiment, the sequencer updates the register with file the results of a post increment address. Next, during the writeback clock cycle 318, at block 364 a load aligner shifts data to align the data within the double-word. At block 366, also within the writeback clock cycle 318, the sequencer updates the register file for the instruction a second time via the first data write port with data loaded from the cache.
As depicted in
Proceeding to the second execution clock cycle 314, during the store routine, at block 378, the data translation look-aside buffer (DTLB) translates the previously generated virtual address for the instruction into a physical address. At block 380, within the second execution clock cycle 314, the sequencer performs a tag check. Also, during the second execution clock cycle 314, at block 382, a store aligner aligns a store data to the appropriate byte, half-word, or word boundary within a double-word before writing the data to the data cache. Moving to the third execution clock cycle 316, at block 384, the sequencer updates the data cache static random access memory. Then, at block 386, the sequencer updates the register file for the instruction a third time via a second data write port with the results of executing the instruction during the third execution clock cycle 316.
As illustrated in
Proceeding to the second execution clock cycle 314, at block 396, data retrieved during the fifth register file access and the sixth register file access is sent to a 64-bit shifter, a vector unit, and a sign/zero extender. Also, during the first execution clock cycle, at block 398, the data from the shifter, the vector unit, and the sign/zero extender is multiplexed.
Moving to the second execution clock cycle 314, the multiplexed data from the shifter, the vector unit, and the sign/zero extender is sent to an arithmetic logic unit, a count leading zeros unit, or a comparator at block 400. At block 402, the data from the arithmetic logic unit, the count leading zeros unit, and the comparator is multiplexed at a single multiplexer. After the data is multiplexed, the shifter shifts the multiplexed data in order to multiply the data by 2, 4, 8, etc. at block 404 during the third execution clock cycle 316. Then, at block 406, the output of the shifter is saturated. During the writeback clock cycle 318, at block 408, the register file for the instruction is updated a fourth time via a third write data port.
In a particular embodiment, as illustrated in
In a particular embodiment, the digital signal processor 424 utilizes interleaved multithreading to process instructions associated with program threads necessary to perform the functionality and operations needed by the various components of the portable communication device 420. For example, when a wireless communication session is established via the wireless antenna a user can speak into the microphone 438. Electronic signals representing the user's voice can be sent to the CODEC 434 to be encoded. The digital signal processor 424 can perform data processing for the CODEC 434 to encode the electronic signals from the microphone. Further, incoming signals received via the wireless antenna 442 can be sent to the CODEC 434 by the wireless controller 440 to be decoded and sent to the speaker 436. The digital signal processor 424 can also perform the data processing for the CODEC 434 when decoding the signal received via the wireless antenna 442.
Further, before, during, or after the wireless communication session, the digital signal processor 424 can process inputs that are received from the input device 430. For example, during the wireless communication session, a user may be using the input device 430 and the display 428 to surf the Internet via a web browser that is embedded within the memory 432 of the portable communication device 420. The digital signal processor 424 can interleave various program threads that are used by the input device 430, the display controller 426, the display 428, the CODEC 434 and the wireless controller 440, as described herein, to efficiently control the operation of the portable communication device 420 and the various components therein. Many of the instructions associated with the various program threads are executed concurrently during one or more clock cycles. As such, the power and energy consumption due to wasted clock cycles is substantially decreased.
Referring to
As further illustrated in
In a particular embodiment, as depicted in
Referring to
As further depicted in
In a particular embodiment, as indicated in
As further depicted in
As shown in
In a particular embodiment, as indicated in
Referring to
As further depicted in
In a particular embodiment, as indicated in
With the configuration of structure disclosed herein, the unified non-partitioned register files for a digital processor operating in an interleaved multi-threaded environment provides six unified non-partitioned register files that are associated with six instruction caches within a memory. Each unified non-partitioned register file is dedicated to one of the six instruction caches. Further, six processor threads can be established using the register files, the instructions caches, the sequencer, and the four instruction execution units. One or more of the instruction execution units can be used for one or more of the six processor threads. As such, some resources can be used multiple times for different processor threads.
Additionally, each of the unified non-partitioned register files is configured to include data operands and address operands. The use of unified non-partitioned register files substantially simplifies access to the data operands and address operands stored therein. Further, the use of the unified non-partitioned register files can substantially reduce problems associated with multiple software programs that require access to the register files for the data operands and the address operands.
Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, PROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or user terminal.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features as defined by the following claims.