The present disclosure relates, in general, to data processing systems and, more specifically, to register files with embedded shift and parallel write capability.
Processing an instruction at a processor may include stages such as fetch (to get the instruction), decode (to break down the instruction into the operation and the operands, (e.g., Operand A plus Operand B), retrieve operands from the register file, execute the instruction, and write back the result (e.g., the sum of Operand A plus Operand B).
General-purpose processors provide defined logic blocks (arithmetic logic units (ALUs), multiply-and-accumulate units (MACs or MACUs), etc.) for performing arithmetic and logical operations. Data to be processed by these logic blocks resides within a register file coupled to the logic blocks. In an exemplary operation, two operands are read from the register file and a result is written back to the register file. Therefore, the operations are generally relegated to selecting source and destination register addresses, as well as performing a logical or arithmetic function Even very simple logical operations may use both the register file and the ALU, rendering both unavailable for other tasks. In addition, while most register files only allow a single register entry to be written, tasks that use the same data for two calculations might require two copies of the same data. An additional clock cycle may be needed to copy from one location to the other, delaying the ALU or MAC hardware from performing additional instructions during that clock cycle.
According to some aspects of the disclosure, an apparatus includes a register file having a logical circuit. The register file is configured to perform one or more logical operations. The logical operations are performed in conjunction with the logical circuit in response to the register file receiving a register file control instruction.
According to some aspects of the disclosure, a method includes receiving a register file control instruction. The method may also include performing one or more logical operations. The one or more logical operations are performed in conjunction with a logical circuit of a register file in response to the register file receiving the register file control instruction.
According to some aspects of the disclosure, an apparatus includes means for storing information in a processor including a logical circuit. The storing means is configured to perform one or more logical operations. The one or more logical operations are performed in conjunction with the logical circuit in response to the storing means receiving a control instruction. The apparatus also has means for processing results of the logical operation. The processing means is coupled to the storing means
According to some aspects of the disclosure, an apparatus includes a memory and one or more processors coupled to the memory. The processor(s) is configured to receive a register file control instruction. The processor(s) is further configured to perform one or more logical operations. The logical operations are performed in conjunction with a logical circuit of a register file in response to the register file receiving the register file control instruction.
According to some aspects of the disclosure, a computer program product includes a computer-readable medium having non-transitory program code recorded thereon. The program code includes program code to receive a register file control instruction. The program code also includes program code to perform one or more logical operations. The one or more logical operations are performed in conjunction with a logical circuit of a register file in response to the register file receiving the register file control instruction.
Additional features and advantages of the disclosure will be described below. It should be appreciated by those skilled in the art that this disclosure may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the teachings of the disclosure as set forth in the appended claims. The novel features, which are believed to be characteristic of the disclosure, both as to its organization and method of operation, together with further objects and advantages, will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present disclosure.
For a more complete understanding of the present teachings, reference is now made to the following description taken in conjunction with the accompanying drawings.
FIGS. 2C(i), 2C(ii) and 2C(iii) are exemplary block diagrams of structures of a register file system for implementing shifting according to some aspects of the disclosure.
FIGS. 2D(i), 2D(ii) and 2D(iii) are exemplary block diagrams of structures of a register file system for implementing bi-directional shifting according to some aspects of the disclosure.
FIGS. 2E(i) and 2E(ii) are exemplary block diagrams of structures of a register file system for implementing cascaded shifting according to some aspects of the disclosure.
The detailed description set forth below, in connection with the appended drawings, is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring such concepts.
Data to be operated on by the logic blocks generally resides within the register file 112 coupled to the logic blocks (ALUs 102, MACs 104, etc.), in which two input data are read from the register file 112 and one result is written back to the register file 112. The register file 112 may be separate from but coupled to a data memory 114 via a memory interface 116. The register file 112 provides high speed memory storage for the microprocessor 100. The register file 112 may include general purpose registers for storing the input data and results for the MAC 104 and ALU 102. The ALU 102 may be coupled to the register file 112 to provide arithmetic computations for data stored in the register file 112. The register file 112 may include output ports 118, 120, 122 and 124 and input ports 126 and 128. In other aspects, the register file 112 may have any number of input and output ports.
For the MAC 104, a multiplier receives and multiplies two input data from the output ports 122 and 124 of the register file 112 and provides an output to the input port 128. The ALU 102 receives two inputs from the output ports 118 and 120 of the register file 112 and provides an output to the input port 128. The register file 112 can also receive data at an input port 126 from the data memory 114 via the memory interface 116.
The control instructions for the memory interface 116 may be independent of the ALU/MAC control instructions. The associated logical operations associated with the memory interface 116 are also independent of the ALU 102 or MAC 104 of the microprocessor 100. In some aspects, the microprocessor 100 may be configured to statically or pseudo-statically control data blocks of one or more register files 112 instead of receiving instructions associated with the ALU 102 or MAC 104 every clock cycle that controls the operation of the one or more register files 112. For example, the register file 112 may not need to receive new instructions every cycle. The register file 112 can be controlled on an instruction by instruction basis, but can also be controlled by a less frequent control mechanism. For example, a particular instruction may indicate that the register file be set to a fixed state that is independent of future instructions until the configuration is changed.
The register file 112 may be configured with embedded shift and parallel write capability based on independent control from the control unit. For example, the register file 112 can receive command instructions that allow all entries of the register file 112 or a subset of the registers file 112 to perform functions, e.g., shifting, in parallel with functions dependent on the ALU 102 and/or MAC 104 control instructions. The functions may include write-back of results from the ALU 102 or the MAC 104. Therefore, simple logical operations, for example, can be performed without the interaction with the ALU 102, MAC 104, or other logical/arithmetic-function blocks on the microprocessor 100. Additionally, within the same context, the register file 112 can save its input data to multiple locations simultaneously. These independently controlled functions of the register file 112 allow the ALU 102 and MAC 104 to perform other instructions simultaneously.
Each logical operation of the register file 112 can include a different control command or instruction. In some aspects, control instructions may be based on a static configuration implemented on the register file 112 that may be valid for a period of time or controlled through a simple sequence. The independent operations of the register file 112 may be implemented based on a control instruction, a time multiplexed instruction or a discrete multiplexed function. For example, a register file logical operation may be configured to execute once out of every X clock cycles; where X represents a number of clock cycles, e.g., 8. In some aspects, the command or control instruction can be in the form of data, e.g., an operand, configured to provide control sent to the memory interface 116, e.g., to select lines of a multiplexer. In other aspects, control may be implemented as an instruction opcode. Configuration flip-flops or latches may also control the select lines on multiplexers to control the inputs to the register file 112 during static or pseudo static control of the register files.
In some aspects, the register implementation of the microprocessor 100 can be implemented in, e.g., very long instruction word (VLIW) processors. The VLIW architecture is suitable for this register implementation because it is based on a predictable data stream and may include several parallel processing data paths. The register implementation in conjunction with a microprocessor allows other operations that are generally performed by other parallel data path components such as an ALU 102, a MAC 104 or a shifter to be performed by the register file 112. Such an implementation allows the parallel data path components to perform other computations. Therefore, for the same clock speed of the overall microprocessor hardware, the register implementation yields an improved computational efficiency.
In general, processor design minimizes the number of instruction bits to describe ALU 102, MAC 104, or other functional unit operations, with none leftover for flexible control of the register file. A control path based on the control instructions from the control unit may allow for static or pseudo-static control of substantially all functional units in parallel with control associated with the ALU 102 and the MAC 104. Therefore, the number of bits to control the register file 112 need not adhere to the same limits and optimizations of the general processor. In other words, the register file 112 does not need an instruction every clock. Rather, instruction control bits representing “1-of-many” options fetched every clock cycle are replaced with parallel control bits from the control logic 110, for example, which are not generally updated during the execution of the algorithm (thus, statically or pseudo-statically configured). The above solution results in a higher performance processor with higher hardware utilization efficiency that may reduce the number of “overhead” instructions and reduce constraints on bandwidth for controlling data path. In particular, with a small amount of additional hardware, in conjunction with independent control from the control unit, basic functions can be implemented to offload some of the microprocessors tasks to the register file 112.
The register file system in
Bit reversing on the data can occur during a write operation or a read operation (
The directional movement of the data through the register file can be either to the left (i.e., left shifting), to the right (i.e., right shifting), and/or left-in but right-out (i.e., rotation). In some aspects, the directional movement of data through the register file includes both left and right shifting within the same register thereby making it bidirectional (
Each word 301, 330 includes several single bit D-Type Data Latches or flip flops 304-314 and 334-344 connected together in a serial or daisy-chain arrangement. Each word 301, 330 also include 2:1 multiplexers 316-326 and 346-356 coupled to the input of each data latch/flip flop 304-314 and 334-344. Thus, output from one data latch/flip flop 304-314 and 334-344 becomes the input of the 2:1 multiplexor 316-326 and 346-356 associated with the next latch 304-314 and 334-344 and so on. The input to each 2:1 multiplexer 316-326 and 346-356 also includes new register file (RF) data input RF_in[0], RF_in[1], RF_in[7], RF_in[8], RF_in[14], RF_in[15]. The 2:1 multiplexer selects between either the new data or the output from a previous data latch 304-314 and 334-344.
Although a 2:1 multiplexer is shown for each data latch 304-314 and 334-344, aspects of the disclosure are not limited to a specific size of a multiplexer. For example, the size of the multiplexers may vary depending on the function implemented by the register file. The multiplexer or multiplexing logic may be integrated in the memory interface 116 or may be independent but coupled to the memory interface 116.
The leftmost 2:1 data multiplexor 316, 346 receives input from a 5:1 multiplexor 302, 332, rather than input directly from a previous latch. Thus, the leftmost multiplexor 316, 346 outputs either new data from RF_in[15] (register file input bit 15) or the output from the 5:1 multiplexor 302, 332.
The 5:1 multiplexor 302, 332 controls input to the leftmost 2:1 multiplexor 316, 346 to change the function of the register file. For example, data already in the flip flops 304-314 and 334-344 can shifted from left to right toward the least significant bit (LSB) in one clock cycle (as seen in FIG. 2C(ii)). In particular, the latches or flip flops 304, 334, for example, shift their output, Q, to flip flops 306, 336 (respectively) via the 2:1 multiplexor 318, 348. The latches or flip flops 306, 336, for example, shift their output Q to flip flops 308, 338 (respectively) via the 2:1 multiplexor 320, 350 and so on. A “0” or a “1” (depending on the function implemented by the register file) can be inserted at the MSB location (i.e., at the leftmost flip flops 304, 334) that may be otherwise void due to the right shift of data already in the MSB. The “0” or “1” is selected as a dummy bit for the flip flops 304, 334.
By selecting other inputs from the 5:1 multiplexors 302, 332, other functions can be achieved. For example, when the most significant bit (MSB) is selected at the 5:1 multiplexor 302, 332, the shift register performs right shifts with the value at the MSB (i.e., leftmost flip flops 304, 334) fed back into the input of the leftmost flip flops 304, 334. In particular, leftmost flip flops 304, 334 retain their value. This implementation is a right shift with sign extension. Thus, the selection of a “0”, “1” or MSB at the 5:1 multiplexor 302, 332 results in shifting functions corresponding to those illustrated with respect to FIG. 2C(ii), for example.
Similar to the right shift implementation of FIG. 2C(i), when an LSB is selected at the 5:1 multiplexor 302, 332, data already in the flip flops 304-314 and 334-344 is shifted from left to right toward the LSB in one clock cycle. However, in this case, the output of the LSB (i.e., data currently in the rightmost flip flops 314, 344) is selected at the 5:1 multiplexor 302, 332 and subsequently input to the leftmost latch 304, 334. This circular shifting implementation corresponds to the shifting illustrated in FIG. 2C(iii) where data from the LSB is fed into the MSB.
When a previous least significant bit (LSB) (“prey LSB”) is selected at the 5:1 multiplexors 302, 332, the register file system can perform circular shifting on concatenated entries in the register file, to achieve a function corresponding to the function shown in FIG. 2E(ii). In this implementation, the flip flop 304, for example, shifts its output to the flip flop 306 and so on until the output data is shifted to the final flip flop 314 in the chain of the first row 330. The output of the final flip flop 314 is circulated to the prey LSB input of the 5:1 multiplexor 332 and subsequently to the flip flop 334 via the 2:1 multiplexor 346. The flip flop 334 then shifts its output to flip flop 336 and so on until the output data is shifted to the final flip flop 344 in the chain. The process continues until the output data is shifted to the right most flip flop (LSB) at the end of the last word or row (not shown) of the register file system. The output of this rightmost flip flop of the last word or row is fed back into the 5:1 multiplexor 302 (i.e., prey LSB) and subsequently to the leftmost flip flop 304 of the first word or row 301.
In one configuration, the apparatus includes means for for storing information in a processor including a logical circuit. In one aspect of the disclosure, the information storing means may be the register file 112, the register file 200, the register 560 and/or the register file system 300 configured to perform the functions recited by the information storing means. The apparatus may also include processing means. The processor may be the microprocessor 100. Tn another aspect, the aforementioned means may be a module or any apparatus configured to perform the functions recited by the aforementioned means.
Referring to
In a particular aspect, an input device 530 and a power supply 544 are coupled to the system-on-chip device 522. Moreover, in a particular aspect, as illustrated in
It should be noted that although
Although specific circuitry has been set forth, it will be appreciated by those skilled in the art that not all of the disclosed circuitry is required to practice the disclosed embodiments. Moreover, certain well known circuits have not been described, to maintain focus on the disclosure.
The methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware, firmware, software, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the methodologies may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. Any machine or computer readable medium tangibly embodying instructions may be used in implementing the methodologies described herein. For example, software code may be stored in a memory and executed by a processor. When executed by the processor, the executing software code generates the operational environment that implements the various methodologies and functionalities of the different aspects of the teachings presented herein. Memory may be implemented within the processor or external to the processor.
As used herein, the term “memory” refers to any type of long term, short term, volatile, nonvolatile, or other memory and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored.
The machine or computer readable medium that stores the software code defining the methodologies and functions described herein includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. As used herein, disk and/or disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer readable media.
In addition to storage on computer readable medium, instructions and/or data may be provided as signals on transmission media included in a communication apparatus. For example, a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the claims.
Although the present teachings and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the technology of the teachings as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular aspects of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized according to the present teachings. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.