The instant disclosure relates to processors. More specifically, portions of this disclosure relate to supporting multiple modes of operation within a processor.
It is often desirable for a processor to support multiple data wordlengths. For example, when 16-bit processors were released, it was desirable for the 16-bit processor to also be able to operate on 8-bit data. Likewise, when 32-bit processors were released, it was desirable for the 32-bit processor to be able to operate on 16-bit data. However, these processors were desktop processors with generally no power constraints and no size constraints. Thus, solutions for supporting multiple wordlengths described below that may be implemented, for example, in desktop processors, are not ideal for mobile applications.
One conventional solution for supporting multiple data wordlengths is to have separate datapaths for each of the possible wordlengths. For example, when there are two possible wordlengths of 16-bit and 24-bit, two datapaths may be constructed in the physical processor and each datapath activated when data of its wordlength is processed.
Another conventional solution is to configure a processor with a single datapath having different operational modes to switch between different wordlengths.
Both of the conventional solutions described above with reference to
Shortcomings mentioned here are only representative and are included simply to highlight that a need exists for improved electrical components, particularly for processors employed in consumer-level devices, such as mobile phones. Embodiments described herein address certain shortcomings but not necessarily each and every one described here or known in the art.
In certain embodiments, multiple data wordlengths may be supported by a processor through a single data path and/or a single set of registers. For example, the processor may have multiple modes, wherein each mode operates on data of a different word length. In one embodiment, the processor may have two modes: a first, low-precision mode for processing, e.g., 16-bit wordlengths, and a second, high-precision mode for processing, e.g., 24-bit wordlengths. In this embodiment, the processor may have registers and datapaths matching a widest wordlength, e.g., 24 bits.
Regardless of the number or operating modes, for supported data wordlengths that are less than the wordlength of the registers and datapath, the data may be left-aligned within the registers and datapath. The left alignment of data may allow saturation detection in the processor to be performed by examining the same saturation point regardless of the wordlength of the data being operated on. Thus, in the example embodiment above, a processor with 24-bit registers and datapaths may operate on high-precision data that occupies the entire wordlength of the register and data path, but when operating in low-precision mode left align the 16-bit data in the 24-bit registers and datapaths such that the least significant bits are zeros in low-precision mode. Power consumption in the processor may be reduced by left aligning the data and setting the least significant bits to zeros during operation in low-precision mode. Although left-alignment is described, the data may be either left-aligned or right-aligned, depending on operation of the processor, to align the low-precision data in more (or the most) significant bits.
In some embodiments, the processor may support a special saturation mode to set the lower bits to zero when a configuration register or instruction-bit is set. For example, an indication of operating mode may be received by the processor through a configuration register or bit in a received instruction. The processor may switch operating mode and process data based on the received indication. For example, the received indication may indicate to operate in either a second, high-precision mode (in which data has a second wordlength) or a first, low-precision mode (in which data has a first wordlength that is shorter than the second wordlength). In low-precision mode, the processor may set a certain number of lower bits of registers and datapaths to zero. Processing of data in the first mode and the second mode may use the same datapath within the processor. Further, when saturation is detected while operating in the low-precision mode, the processor may take steps to clear the least significant bits. In some embodiments, the processor may be configured to clear the least significant bits whenever certain operations are executed that may cause the least significant bits to be set. Thus, the least significant bits may remain zeros during low-precision mode operation to reduce power consumption in the processor.
Although processor operation is described herein, the term processor may refer to any logic device capable of saturation. For example, processor may refer to a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a microprocessor, an image processor, a co-processor, a network processor, and/or an audio processor. The processor may include one or more cores, wherein the cores may be identical or heterogeneous. Further, the processor may include other integrated functionality, such as dedicated video decoding, audio decoding, encryption circuitry, and/or peripheral bus interfaces.
According to one embodiment, an apparatus may include a processor capable of saturation and configured to process data in at least a low-precision mode and a high-precision mode, wherein a register size of the processor matches a data size in the high-precision mode. The processor may be configured to perform steps including processing the data as aligned to more significant bits, such when aligned as left-aligned data in some embodiments, in the low-precision mode and/or detecting saturation during the processing of the data, wherein the same saturation point is examined whether the processor is operating in the low-precision mode or the high-precision mode. In certain embodiments, the processor may be configured to process 16-bit data when the processor is operating in low-precision mode and to process 24-bit data when the processor is operating in high-precision mode. In certain embodiments, the processor may be a digital signal processor (DSP). In certain embodiments, the low-precision mode may be used to execute control applications, provide compatibility with code originally written for a different processor, and/or to process low-fidelity audio, and the high-precision mode may be used to execute high-precision arithmetic and/or process high-fidelity audio.
In some embodiments, the processor may be configured to perform steps including clearing one or more least significant bits (LSB) not in use during operation of the processor upon detecting saturation during the processing of data, clearing the one or more least significant bits (LSB) in hardware, clearing the one or more least significant bits (LSB) in response to a received instruction, and/or clearing one or more least significant bits (LSB) after pre-determined operations are performed during the processing of the low-precision data in the first mode. Certain other pre-determined operations may not involve the clearing of LSBs after execution. For example, add, subtract, and multiply operations involving input data with LSBs all set to zero will generally not result in results with LSBs set.
According to another embodiment, a method may include receiving an indication of whether data is low-precision data or high-precision data; processing the data as data aligned to more significant bits, such as data that is left-aligned data in some embodiments, in the low-precision mode; and/or detecting saturation during the processing of the data, wherein the same saturation point is examined whether the processor is operating in the low-precision mode or the high-precision mode. In some embodiments, the indication may be received by reading a configuration register of a processor. In certain embodiments, the low-precision mode processes 16-bit or 32-bit data and the high-precision mode processes 24-bit or 48-bit data.
In some embodiments, the method may further include clearing one or more least significant bits (LSB) upon detecting saturation during the processing of data, clearing the one or more least significant bits (LSB) in processor hardware, clearing the one or more least significant bits (LSB) in response to a received instruction, and/or clearing one or more least significant bits (LSB) after pre-determined operations are performed during the processing of the data. In certain embodiments, when operating in the low-precision mode, the digital signal processor processes 16-bit or 32-bit data and when operating in the high-precision mode the digital signal processor processes 24-bit or 48-bit data.
In certain embodiments, an apparatus may include a digital signal processor (DSP). The DSP may include multiply-accumulate circuitry that is configured to process data as data aligned to more significant bits, such as data that is left-aligned data in some embodiments, in the low-precision mode and/or detect saturation during the processing of the data, wherein the same saturation point is examined whether the processor is operating in the low-precision mode or the high-precision mode. In some embodiments, the multiply-accumulate circuitry may include a first set of registers, a multiplier coupled to the first set of registers and configured to receive two operands from the first set of registers, an adder coupled to the multiplier and configured to receive a result of a multiplication operation of the two received operands, and/or an accumulation register coupled to the adder and configured to accumulate value. The multiplier may be configured to operate on both low-precision data in low-precision mode and on high-precision data in high-precision mode.
In some embodiments, the DSP may also be configured to clear one or more least significant bits (LSB) upon detecting saturation during the processing of data, clear the one or more least significant bits (LSB) in processor hardware, clear the one or more least significant bits (LSB) in response to a received instruction, and/or clear one or more least significant bits (LSB) after pre-determined operations are performed during the processing of the data.
According to one embodiment, a computer program product may include a non-transitory computer readable medium having code for performing the steps of receiving an indication of whether data is low-precision data of a first wordlength or high-precision data of a second wordlength that is longer than the first wordlength; processing the data as data aligned to most significant bits in the low-precision mode; and/or detecting saturation during the processing of the data, wherein the same saturation point is examined whether the data is low-precision data or high-precision data.
According to another embodiment, a method of processing data of two different wordlengths in a processor with a single datapath that supports the two different wordlengths may include processing first data in a first mode having a first wordlength using a datapath of a processor; and/or processing second data in a second mode having a second wordlength that is longer than the first wordlength using the datapath of the processor. The step of processing the first data in the first mode may include processing the first data as data aligned to most significant bits of the datapath, such as when the first data is left-aligned in the datapath.
According to a further embodiment, an apparatus may include a processor comprising a datapath for processing data, wherein the processor processes first data of a first wordlength in a first mode using the datapath, and wherein the processor processes second data of a second wordlength longer than the first wordlength in a second mode using the datapath, and wherein the processor processes the first data in the first mode as data aligned to most significant bits of the datapath.
The foregoing has outlined rather broadly certain features and technical advantages of embodiments of the present invention in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter that form the subject of the claims of the invention. It should be appreciated by those having ordinary skill in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same or similar purposes. It should also be realized by those having ordinary skill in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. Additional features will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended to limit the present invention.
For a more complete understanding of the disclosed system and methods, reference is now made to the following descriptions taken in conjunction with the accompanying drawings.
In a two-mode embodiment, data transmitted along the datapath 306 may be formatted as shown in data 310 and 312. Data 310 may illustrate low-precision data transmitted over datapath 306, and data 312 may illustrate high-precision data transmitted over datapath 306. “Higher bits” described herein may refer to bits of more significance, or bits that are left-aligned in a big endian computer system. “Lower bits” described herein may refer to bits of less significance, or bits that are right-aligned in a little endian computer system. The high-precision data 312 occupies all bits in the datapath 306. The low-precision data 310 occupies fewer than all bits in the datapath 306 and is left aligned, such that the data is stored in the most significant bits (MSBs) of the datapath 306, which leaves the least significant bits (LSBs) unused. Although blanks may be indicated in the N2 LSBs of data 310, in implementation zeroes may be located in these bits, and those zeroes would have no impact on the value being stored in the N1 MSBs. The least significant N2 bits may thus be set to zero during operation in low-precision mode as those bits do not impact the value represented by bits in the data 310. Setting these lower bits to zero may reduce power consumption by the circuitry 308 when processing the low-precision data. Further, setting these lower bits to zero may prevent propagation of bit toggles to higher bits that could cause arithmetic errors and higher power consumption. Although two-mode operation is described, the processor 302 may support additional modes of operation to support additional wordlengths.
When operations are performed on values contained in the data, such as the low-precision data, the values may reach a saturation point, or reach a largest possible value that can be stored in a certain number of bits. Saturation may be detected by the processor and handled to prevent arithmetic errors, such as overflow. The left alignment of data may allow saturation detection in the processor to be performed by examining the same saturation point regardless of the wordlength of the data being operated on.
Referring back to
One example of saturation detection is shown in the following source code that may be executed by the processor 302 for 16-bit and 24-bit modes of operation:
In the example above, “in” and “out” may denote memory locations 24 bits in width, such as one of the registers 304, and “S16 ” may denote a configuration bit, such as bit 312A, indicating a mode of operation for the processor 302. In the code above, when saturation is detected, the configuration bit S16 is examined, and if the S16 bit is set indicating 16-bit mode of operation, then the memory location is saturated such that the low bits remain zero. If saturation is detected and the configuration bit S16 is not set (indicating 24-bit mode of operation), then the memory location is saturated with all bits set to one. Example input values to the code above are listed in Table 1 below along with the corresponding output of the code.
Another example of saturation detection is shown in the following source code that may be executed by the processor 302 for 32-bit and 48-bit modes of operation:
Example input values to the code above are listed in Table 2 below along with the corresponding output of the code.
The processor 302 may determine the appropriate mode of operation by receiving information from an application executing on the processor 302. In one embodiment, the processor 302 may include a configuration register 312, in which one configuration bit 312A, may be set to zero or one to toggle the processor 302 between two modes of operation. In processors with more than two modes of operation, additional bits may be used in the configuration register 312 to indicate which of multiple modes of operation should be executed. The configuration bit 312A may be set during execution of an application. In another embodiment, the processor 302 may implement different instructions for operations in different modes of operations. For example, the processor 302 may receive a “MULT1” operation instructing execution of multiplication in a first mode of operation, such as multiplying two 16-bit values, and may receive a “MULT2” operation instructing execution of multiplication in a second mode of operation, such as multiplying two 24-bit values.
The registers 304 may be configured to support the multiple possible wordlengths in different modes of operation. For example, the registers 304 may have a wordlength matching the width of datapath 306, which is the largest wordlength of the various modes of operation possible within processor 300. For example, when the two modes of operation are 16-bit and 24-bit, the registers 304 may have a wordlength of 24-bit. Low-precision values may be packed into the 24-bit registers. By storing multiple wordlengths of data in the registers 304, the processor may include less circuitry and thus support a greater maximum clock speed and subsequent speed of operation.
A method of operating a processor to support multiple modes of operation is shown in
One operation that may be performed by the processor in block 504 of
a+(b×c)→a,
where a is the value stored in the accumulation register 608, and b and c are operands retrieved from the registers 602 through datapath 610. The multiply-accumulate (MAC) operation described with reference to
In one embodiment, the multiplier 604 may process data received through datapath 610 similarly regardless of the wordlength of the data. For example, when 24-bit data is received, the multiplier 604 may multiply the operands to obtain a result, and when 16-bit data is received having all lower bits set to zero, the multiplier 604 may similarly multiply the operands to obtain a result. In contrast, conventional multipliers may divide operands into pieces and perform multiplication of the various pieces of the words and sum the words together. For example, a conventional multiplier may divide a 24-bit word into a 16-bit portion and an 8-bit portion, perform multiplication using the 16-bit portion and 8-bit portions separately and sum the results. This division allows the multiplier to be capable of supporting 16-bit arithmetic when the conventional multiplier receives a 16-bit word instead of a 24-bit word. In some embodiments, the multiplier 604 may not divide operands into portions when performing multiplication or other arithmetic operations.
The processor embodiments described above may be useful in any computing device to reduce power consumption, reduce heat dissipation, decrease size, and reduce cost. One particularly advantageous embodiment may include the integrating of the processor described in various embodiments above in a mobile device.
The schematic flow chart diagram of
If implemented in firmware and/or software, functions described above may be stored as one or more instructions or code on a computer-readable medium. Examples include non-transitory computer-readable media encoded with a data structure and computer-readable media encoded with a computer program. Computer-readable media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise random access memory (RAM), read-only memory (ROM), electrically-erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc includes compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), floppy disks and Blu-ray discs. Generally, disks reproduce data magnetically, and discs reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.
In addition to storage on computer readable medium, instructions and/or data may be provided as signals on transmission media included in a communication apparatus. For example, a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the claims.
Although the present disclosure and certain representative advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. For example, although digital signal processors (DSPs) are described throughout the detailed description, aspects of the invention may be applied to the design of other processors, such as graphics processing units (GPUs) and central processing units (CPUs). Further, although ones (1s) and zeros (0s) are given as example bit values throughout the description, the function of ones and zeros may be reversed without change in operation of the processor described in embodiments above. For example, a one value in a configuration register may be used to indicate either a first mode of operation or a second mode of operation without change in the operation of the processor. Additionally, although 16-bit and 24-bit modes are described for a processor, the processor may support different wordlengths and/or additional wordlengths. For example, a processor may support 32-bit wordlength as a low-precision mode and 48-bit wordlength as a high-precision mode. As one of ordinary skill in the art will readily appreciate from the present disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.