Traditionally, in a device such as a digital camera, hardwired processors or software programs are used to execute instructions needed for image processing functions. While hardwire processors execute instructions quickly, they present a number of problems in the development of digital cameras. For example, once hardwired operations are fixed, the operations cannot be altered unless the hardwired processor is redesigned. In an area such as digital camera where designers must quickly create new instruction types to keep up with the current demand of customers for new and enhanced functionality, having to redesign the processor may increase development time and costs for a new digital camera. Additionally, if a problem is discovered in the design of a hardwired processor after the production of a digital camera, the problem cannot be fixed without replacing the hardwired processor in all of the digital cameras.
In an effort to provide flexibility, some digital camera use software programs to execute instructions. The flexibility of software comes at the price of speed due the time necessary to run the software program. Therefore, it is desirable to have a new processor for a device such as a digital camera that is able to execute instructions at an acceptable speed and provides flexibility to be able to add new instruction types to meet the current demands of customers.
Accordingly, the present invention relates to using a central processing unit having a micro-code engine within a digital camera. In one embodiment, a central processing unit with an embedded micro-code engine comprises a system memory capable of storing an instruction; at least one central processing unit (“CPU”) execution unit electrically coupled with the system memory to read the instruction stored in the system memory; and at least one micro-code engine electrically coupled with the at least one CPU execution unit to receive commands and instruction parameters. The at least one micro-code engine is operative to execute micro-code programs and operate in synchronization with the CPU execution unit to execute the instruction.
In another embodiment, a digital camera having a central processing unit with an embedded micro-code engine comprises a system memory capable of storing an instruction, at least one CPU execution unit electrically coupled with the system memory, and at least one micro-code engine electrically coupled with the CPU execution unit. The at least one CPU execution unit comprises an instruction decoder electrically coupled with the system memory to receive the instruction stored in the system memory and decode the instruction; a parameter fetching unit electrically coupled with the instruction decoder to receive instruction parameters; an arithmetic logic unit (“ALU”) execution unit electrically coupled with the parameter fetching unit to receive the instruction parameters and perform a logic operation; and a write back unit electrically coupled with the ALU execution unit to receive a result of the logic operation and electrically coupled with the system memory to write the result to said system memory. Through the electrical coupling with the CPU execution unit, the at least one micro-code engine is electrically coupled with the parameter fetching unit to receive commands and instruction parameters. The CPU execution unit and the at least one micro-code engine operate in synchronization to execute the instruction.
In yet another embodiment, a central processing unit comprises a fixed execution unit, a programmable execution unit, and a controller. The fixed execution unit is operative to perform a first plurality of functions and the programmable execution unit is operative to perform a second plurality of functions. The controller is electrically coupled with the fixed execution unit and the programmable execution unit. Typically, the controller receives an instruction from a memory. In response, the controller determines what functions are needed to perform the instruction from the first plurality of functions and the second plurality of functions. The controller generates a signal to at least one of the fixed execution unit and the programmable execution unit indicating the functions needed to perform the instruction.
The fixed execution unit comprises a first input coupled to the controller and operative to receive the signal, and a plurality of discrete logic elements coupled with the first input. Each of the plurality of discrete logic elements are interconnected with at least another of the plurality of discrete logic elements. In response to the signal, the fixed execution unit implements at least one of the first plurality of functions.
The programmable execution unit comprises a second input coupled to the controller and operative to receive the signal; a micro-code memory operative to store a plurality of micro-code programs, each of which is operative to implement at least one of the second plurality of functions; a micro-code execution unit coupled with the micro-code memory that is capable of executing each of the plurality of micro-code programs; and a micro-code controller coupled with the second input and the micro-code execution unit that is operative to cause the micro-code execution unit to execute at least one of the plurality of micro-code programs in response to the signal from the controller.
In another embodiment, a method for performing an instruction within a central processing unit comprises receiving an instruction from a memory coupled with a controller; determining a first function of at least one of a first plurality of functions capable of being performed by a fixed execution unit and a second plurality of functions capable of being performed by a programmable execution unit; generating a signal to at least one of said fixed execution unit and said programmable execution unit to perform said first function; determining in said fixed execution unit which, if any, of said first plurality of functions to execute in response to said signal; executing at least one of said first plurality of functions by a plurality of discrete logic elements to generate a first result in response to determining said fixed execution engine should execute at least one of said first plurality of functions; determining in said programmable execution unit which, if any, of said second plurality of functions to execute in response to said signal; and executing at least one micro-code program to implement at least one of said second plurality of functions to generate a second result in response to determining said programmable execution engine should execute at least one of said plurality of functions.
a is a diagram of one embodiment of a linear shift register before a shift operation;
b is a diagram of the linear shift register of
The system memory 102 may be any type of memory capable of storing a program/object code instruction and micro-code, and may include intermediate memories such as cache memories. In one embodiment, the CPU execution unit 104 is a hardware unit, or other fixed execution unit, hardwired to execute various operations and issue various commands to the micro-code engine 106. Typically, the hardware unit contains a plurality of discrete logic elements, wherein each of the plurality of discrete logic elements is interconnected with at least one other of the plurality of discrete logic elements to perform a plurality of functions.
The micro-code engine 106 is a device such as a processor, or other programmable execution unit, capable of running micro-code programs to execute various operations within a higher-level processor, i.e. the micro-code engine 106 implements one or more of the higher level object code instructions available for use to a programmer. A micro-code engine 106, as will be described in more detail below, may be utilized in addition to or in place of hardwired circuits and logic for implementing the functionality of a higher level central processing unit.
The system memory 102, CPU execution unit 104, and micro-code engine 106 may all be located on a single integrated circuit or may be discrete components. In one embodiment, at least the CPU execution unit 104 and the micro-code engine 106 are located on the same integrated circuit.
In other embodiments, the CPU execution unit 104, and the micro-code engine 106 may all be located on a field programmable gate array or on discrete field programmable gate arrays. In alternative embodiments, it is possible for a central processing unit to have multiple system memory units 102, CPU execution units 104, or micro coded engines 106 which may all be located on a single device such as an integrated circuit or a field programmable gate array, or the units may be located on discrete components.
In general, the system memory 102 stores program/object code instructions for the CPU execution unit 104 and, in one embodiment, micro-code for the micro-code engine 106. The CPU execution unit 104, acting as a controller, reads and decodes each instruction in the system memory 102. Based on each decoded instruction, the CPU execution unit 104 determines the functions needed to perform the instruction from a plurality of hardwired functions available to be performed by the arithmetic logic unit (“ALU”) execution unit 112 of the CPU execution unit 112 and a plurality of micro-code functions available to be performed by the micro-code engine 106. Typically, the CPU execution unit 104 generates a signal comprising at least one parameter to indicate what, if any, functions in the ALU execution unit 112 and the micro-code engine 106 are needed to be executed to implement the instruction. In response to receiving the signal, the ALU execution unit 112 and the micro-code engine 106 perform their operations, and typically, return a result to the CPU execution unit 104. The result may comprise an action, a second signal, and/or a result parameter. In response to receiving the result, the CPU execution unit 104 may store the result in system memory 102 or determine a second function from the first and second plurality of functions for at least one of the ALU execution unit 112 and the micro-code engine 106 to perform.
Micro-code is the lowest-level programming that may directly control a microprocessor. Typically, micro-code is not program-addressable but is capable of being modified, e.g. existing micro-code may be modified to correct defects or enhance performance, or additional micro-code programs can be added to the micro-code engine 106, such as to add new instructions types available for operations. In one embodiment, micro-code programs are added to the micro-code engine 106 by downloading the micro-code programs through the CPU execution unit 104. In alternative embodiments, micro-code programs are added to the micro-code engine 106 by downloading the micro-code programs to the micro-code engine 106 directly.
Typically, the micro-code engine 106 is able to execute micro-code programs that may include a plurality of micro-instructions or may implement one or more functions/instructions of the micro-code engine 106. In one embodiment, micro-code programs are used to emulate operations which would be typically hardwired in a processor. Processors with hardwired operations normally execute operations more quickly than a processor using a micro-code engine, but at the cost of system flexibility. Once a processor is hardwired, using devices such as logic gates or transistors to perform operations, the processor cannot execute new instruction types without redesigning the hardwired operations of the processor. Such redesigning is costly, time consuming and may introduce design errors into the overall processor design.
Micro-code engines 106 allow for reduced silicon area in comparison to hardwired operations in a processor and provide flexibility by allowing a user to write additional micro-code programs to execute new instruction types or modify existing micro-code programs to correct defects or enhance performance. For example, if new instruction types are needed for a new algorithm after the initial design of the system, a new micro-code program can simply be written and downloaded to the micro-code engine 106 thereby altering the operation of the processor. Further, if defects or performance issues are discovered in the operation of the processor, the micro-code may be altered or augmented to correct the problem or enhance performance.
Micro-code engines 106 also alleviate the need for a separate digital signal processor (“DSP”). In order to accelerate image processing, some systems use a separate DSP with a completely discrete CPU having built-in image processing functions. Accelerating image processing using a separate DSP comes at the cost of increased silicon requirements, and a lack of communication between the control CPU and the separate DSP. Micro-code engines, as disclosed herein, may be used in place of, or to augment, the DSP.
In one embodiment of a CPU having a micro-code engine, the CPU execution unit 104 is electrically coupled with the system memory 102 such that the CPU execution unit 104 can read instructions stored in the system memory 102 or write data to the system memory 102. The CPU execution unit 104 generally includes an instruction decoder 108, a parameter fetching unit 110, an Arithmetic Logic Unit (“ALU”) execution unit 112, and a write back unit 114. Preferably, the instruction decoder 108, parameter fetching unit 110, ALU execution unit 112, and write back unit 114 are electrically coupled with each other so that the parameters of each instruction can easily be passed between the different units of the CPU execution unit 104.
The micro-code engine 106 is also electrically coupled 115 with the CPU execution unit 104 such that the CPU execution unit 104 can treat the micro-code engine 106 as a hardware assisted instruction. Typically, the CPU execution unit 104 and the micro-code engine 106 are arranged in a parallel fashion where both are capable of operating substantially concurrently while executing the same or different functions. In alternative embodiments, the CPU execution unit 104 and the micro-code engine 106 may also be arranged in a serial or pipeline fashion or any other arrangement known in the art. Through the electrical coupling 115, the CPU execution unit 104 controls which instructions the micro-code engine 106 will execute, and how and when the micro-code engine 106 will execute each instruction. Specifically, the CPU execution unit 104 typically controls what micro-code and instruction parameters the micro-code engine 106 receives and what data the micro-code engine 106 may pass to the CPU execution unit 104 or the system memory 102.
Typically, the micro-code engine 106 includes a micro-code execution unit 116 and a micro-code memory 118. The micro-code execution unit 116 and the micro-code memory 118 are typically electrically coupled with each other so that the micro-code execution unit 116 can read instruction parameters or micro-code stored in the micro-code memory 118, or write data to the micro-code memory 118.
During operation, the CPU execution unit 104 reads a program/object code instruction stored in the system memory 102. Typically, the instruction decoder 108 within the CPU execution unit 104 acts as a controller to examine the instruction set and determine what operations are required to execute the various instructions contained therein. Depending on the necessary operations for a given instruction, the CPU execution unit 104 may use the hardwired operations 113 of the CPU execution unit 104 to execute the instruction, the CPU execution unit 104 may actuate the micro-code engine 106 to execute the instruction, or the CPU execution unit 104 may use the hardwired operations 113 of the CPU execution unit 104 to execute a portion of the instruction while the micro-code engine 106 executes another portion of the instruction.
If the CPU execution unit 104 uses the hardwired operations 113 of the CPU execution unit 104 to execute the instruction, the parameter fetching unit 110 typically passes the instruction parameters to the ALU execution unit 112. The ALU execution unit 104 performs the necessary operations under the direction of the hardwired logic and the write back unit 114 records the result of the instruction in the system memory 102, a register, or other storage.
If the CPU execution unit 104 actuates the micro-code engine 106 to execute the instruction, the parameter fetching unit 110 typically passes the instruction parameters, any necessary micro-code if not already loaded, and any other commands to the micro-code engine 106. The micro-code execution unit 116, acting as a micro-code controller, receives the information from the CPU execution unit 104 and reads any additional information that may be stored in the micro-code memory 118. The micro-code execution unit 116 executes the operation, i.e. the micro-code program which implements the instruction, and records the result of the instruction in the micro-code memory 118 or otherwise passes the result to the CPU execution unit 104. If necessary, the write back unit 114 then records the result of the instruction in the system memory 102, a register, or other storage. In one embodiment, while the micro-code engine 106 is processing the operation, the CPU execution unit 104 is free to perform other operations. In alternate embodiments, the CPU execution unit 104 waits for the micro-code engine 106 to complete the operation before performing other actions.
If the CPU execution unit 104 uses the hardwired operations 113 of the CPU execution unit 104 to execute a portion of the instruction while the micro-code engine 106 executes another portion of the instruction, the parameter fetching unit 110 passes the instruction parameters to the ALU execution unit 112 and the micro-code execution unit 116. The ALU execution unit 112 and the micro-code execution unit 116 execute their operations in parallel or in series depending on the algorithm, and the result of the instruction is written to the system memory 102, a register, or other storage. In one embodiment, the ALU execution unit 112 may factor the output of the micro-code engine 106 into the final result.
In another embodiment, the CPU execution unit 104 and the micro-code engine 106 are able to perform operations on either the same or different data, at the same time or at different times. In order to ensure the CPU execution unit 104 and micro-code engine 106 perform operations in the proper order, hand shake signals 119 may be implemented between the CPU execution unit 104 and the micro-code engine 106. Hand shake signals 119 are signals such as WAIT signals or other status indicators passed between at least two units of a processor to ensure that an algorithm is executed in proper order. Through the hand shake signals 119, the CPU execution unit 104 and the micro-code engine 106 are able to pass signals so that the CPU execution unit 104 and the micro-code engine 106 may operate synchronously. While the hand shake signals 119 permit the CPU execution unit 104 and micro-code engine 106 to operate synchronously with respect to each other, these signals 119 may also permit signaling between the CPU execution unit 104 and the micro-code engine 106 so as to allow either CPU execution unit 104, the micro-code engine 106 or both to internally operate synchronously or asynchronously.
In yet another embodiment, shown in
The additional hand shake signals 221 between the CPU execution unit 204 and the micro-code engine 206 allow the CPU execution unit 204 and the micro-code engine 206 to operate synchronously in more complex executions. The additional hand shake signals allow the micro-code engine 206 and the CPU execution unit 204 to communicate with each other as compared to other embodiments where only the CPU execution unit 204 issued commands to the micro-code engine 206. The ability for both the CPU execution unit 204 and the micro-code engine 206 to pass hand shake signals allows the CPU execution unit 204 and the micro-code engine 206 to act in a peer-to-peer fashion rather than in a master-slave fashion.
A central processing unit having a micro-code engine according to the disclosed embodiments can be used in devices such as a digital camera to implement and/or accelerate image processing, in addition to or in place of a separate DSP. A micro-code engine allows a digital camera manufacturer to change micro-code programs within the micro-code engine to correct problems, improve functions, implement proprietary image processing algorithms, or add features in reaction to market driven desires for camera functions without the cost of redesigning the camera hardware. Due to the flexibility of a micro-code engine, a digital camera manufacturer can change the micro-code programs, and therefore the digital camera functions, at any time during or after the design and manufacture of the camera, even after the purchase of a camera.
In one embodiment of a digital camera having a central processing unit and micro-code engine as disclosed, the micro-code engine executes one or more micro-programs which implement specific operations for compressing pixel data generated by the camera's image sensor. In another embodiment, the micro-code engine executes one or more micro-programs which implement specific operations for demosaicing the pixel data generated by the camera's image sensor where the image sensor utilizes a color filter array, such as a Bayer pattern color filter array.
In another embodiment of a central processing unit having a micro-code engine, shown in
Typically, the micro-code execution unit 302 is electrically coupled with the micro-code memory 304 such that the micro-code execution unit 302 can both read an instruction or micro-code stored in the micro-code memory 304 and write data to the micro-code memory 304. Preferably, additional micro-code can be added to the micro-code memory 304 at any time as new instruction types become available for operations.
In one embodiment, shown in
As shown in
The shift-able window 408 provides the ability to continuously execute an algorithm on the continuous data being serially input into the linear shift registers 406 by shifting data from a new section of the linear shift registers 406 into the shift-able window 408 after the micro-code execution unit 302 (
Simply shifting only the new data into the shift-able window 408, without re-aligning the old data, accelerates image processing by increasing the efficiency of the micro-code engine 300 (
Previously, to accelerate image processing in digital cameras, designers have used a shift-able window made through software or a shift-able window made through hardwired operations. The shift-able window made through software provides flexibility, but lacks the speed of hardwired operations. The hardwired operations execution image processing functions quickly, but at the cost of flexibility due to the fact the size and shape of the shift-able window are fixed. Increasing the efficiency of a processor using a linear shift register implementing a shift-able window compensates for the tradeoff between speed and flexibility, thereby providing a window operation that can both quickly execute operations for an algorithm and is flexible to accommodate future changes, e.g. new instructions or algorithms that require a new window size or shape.
a and 6b show one embodiment of a shift-able window 604, using the mapping of
In another embodiment, shown in
In general, data, such as pixel data, is constantly serially input into the series of shift-able registers 704 from a device such as a CCD of a digital camera or by any intermediate means such as Direct Memory Access (“DMA”) or by loading through the CPU core. As data shifts through the series of shift-able registers 704, the logic devices 706 calculate a result based on the current operands present in the shift-able window. This operation can be a filter operation or an interpretation based on data currently within the window, or any other logically, algorithmically useful operation. After the algorithm is complete on the current operands, a shift operation shifts the data linearly and effectively moves the shift-able window forward by one pixel. After the shift operation, the shift-able window is targeting a new pixel, even though half the surrounding pixels for the algorithm will be retained and located in their correct relative location. Only new data that is needed but not within the shift-able window is shifted into the shift-able window. After the shift operation, the logic devices 706 automatically calculate a result for the new set of operands and output a result. This process is continually repeated as data is serially input through the series of shift-able registers 604.
In another embodiment, direct memory access channels may be added between the linear shift registers and the micro-code engine to further accelerate image processing operations. In yet another embodiment to enhance performance, additional instructions may be added for the shift-able window to perform more than one shift operation per instruction and/or cycle.
In one embodiment of a digital camera having a central processing unit and a micro-code engine that includes a linear shift register to perform a shift-able window operation, the shift-able window is used to execute specific operations on pixel data such as a filter operation or an interpretation operation. In other embodiments, the shift-able window may be used to perform compression algorithms, demosaicing algorithms, or any other type of algorithm using pixel data as an operand.
For example, in an embodiment using a shift-able window to employ a demosaicing algorithm, the shift-able window surrounds a targeted pixel and the pixels nearby the targeted pixel that are needed to create a three-color per pixel image from a one-color per pixel image. When a digital camera uses a color filter array (“CFA”) such as a Bayer CFA, red, green, and blue pixels are arranged in a predetermined pattern so that each color pixel is adjacent to the two other color pixels. Demosaicing algorithms are used to create a three-color per pixel images from one-color per pixel images using processes such as bilinear interpolation. In this process, a one-color targeted pixel and its surrounding one-color pixels are used to create a single three-color pixel. A shift-able window accelerates the demosaicing algorithm by quickly shifting new pixel data into the shift-able window after each bilinear interpolation is complete on a targeted pixel and its nearby pixels.
It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention.
The present patent document claims the benefit of the filing date under 35 U.S.C. §119(e) of Provisional U.S. Patent Application Ser. No. 60/549,620, filed Mar. 2, 2004, which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60549620 | Mar 2004 | US |