The invention relates to a universal processor architecture for a microprocessor, and more particularly, a branch controller in a microprocessor including a lookup table.
Currently, hardware intellectual property cores (IP cores) are costly to fix if a critical bug is found after printing to a chip. A semiconductor intellectual property core, (IP core or IP block) is a reusable unit of logic, cell, or chip layout design and is also the property of one party. IP cores can be used as building blocks within ASIC chip designs. A disadvantage to current bug fixing techniques includes necessitating a full respin (re-processing) of the chip, including a new mask set to fix the bug. To avoid this, the functional verification of the chip is done extensively before it's turned into hardware, and this verification may consume more development cost than the design itself. Software, although much more flexible and maintainable in terms of fixing bugs, works on a processor that, in essence, executes one small task at a time. The processor's serial nature makes it difficult to handle high speed hardware events that require nearly instant response time. Although multiple processors help increase the processing throughput, the interconnect and overhead involved with this can also be expensive, and may still require a respin of the hardware when a functional bug is fixed.
Further, typical real-time function queues need to interrupt processing to add a function to the queue. In addition, once the interrupt is implemented, the processor has to return from interrupt, get the next function in the queue, and jump to that function. Thus, undesirable processing time is required to implement such functions.
It would therefore be desirable to have a single processor on a chip that was able to service hardware events by their deadline, and would reduce development and verification costs. It would also be desirable to provide a much shorter process to fix, release, and distribute bug fixes.
In an aspect according to the present invention, a functionally programmable branch controller system for a microprocessor comprises an instruction execution controller including a branch handler lookup table (LUT). A programmable logic block is embedded in an input-output (I/O) interface of the microprocessor to provide instruction address decode data to the branch handler when the programmable logic block receives a programmable event from a microprocessor.
In a related aspect, the programmable logic block may include a field programmable gate array (FPGA), and the controller system may further include a mask communicating with the programmable logic block and the execution controller, where the execution controller ignores an event specified by the mask.
In a related aspect, the microprocessor includes an execution unit which remains idle until the event from the execution controller is communicated to the execution unit.
In a related aspect, the execution unit jumps to an address of the event without saving a state of the event.
In a related aspect, the instruction execution controller further includes a state queue register communicating with the branch handler LUT for storing a plurality of events for execution by the LUT.
In a related aspect, the state queue register stores a plurality of events for sequential execution by the LUT in the order received.
In a related aspect, at least one of the plurality of events is preempted such that the preempted event is not executed in the order received.
In another aspect according to the invention, a method to enable a CPU to drive a series of tightly constrained hardware events comprises driving a functionally programmable event with a plurality of system inputs; executing a fast instruction branch in a CPU to a dedicated state machine to process the functionally programmable event; and idling a main program loop of the microprocessor without saving states when the functionally programmable event is complete and another functionally programmable event is not available.
In a related aspect, the method further comprises, before the step of idling the main program loop, servicing a plurality of events in their order of arrival.
In a related aspect, the method further comprises, before the step of idling the main program loop, servicing a plurality of events in their order of arrival unless preempted by an interrupt command.
In a related aspect, the method further comprises, before the step of idling the main program loop, servicing and storing a plurality of events in their order of arrival.
In a related aspect, the method further comprises preempting at least one of the plurality of events such that the preempted event is not executed in the order received.
In a related aspect, the method further includes masking bits in the dedicated state machine to prevent execution of a specified functionally programmable event.
In a related aspect, the method further comprises jumping to an address of the functionally programmable event without the execution unit saving a state of the event.
In another aspect according to the invention, a computer system includes a microprocessor to drive tightly constrained hardware events comprising a microprocessor having a set of system inputs to drive a functionally programmable event. A fast branch in the microprocessor includes a state handler to execute instructions from the microprocessor to process the event, and a queue in the microprocessor stores a plurality of event triggers such that non-pre-empted event triggers will be serviced in the order they are received.
In a related aspect, the state handler includes a lookup table (LUT).
In a related aspect, the fast branch in the microprocessor includes a programmable logic block communicating with the system inputs.
In a related aspect, the programmable logic block is a field programmable gate array (FPGA).
In a related aspect, the computer system further includes a specialized execution unit communicating with the queue in the microprocessor for executing the non-preempted event triggers.
An IP (intellectual property) core is generally a block of logic or data that may be used in making a field programmable gate array (FPGA) or application-specific integrated circuit for a product. Universal Asynchronous Receiver/Transmitter (UARTs), central processing units (CPUs), ethernet controllers, and PCI (Peripheral Component Interconnect) interfaces are examples of IP cores. An IP core library typically contains a multitude of unique designs that are costly to design, maintain, and migrate between technology nodes. However, an IP core library may serve a useful and vital role in an application-specific integrated circuit (ASIC) integrated circuit design function. In general, the present invention includes using a processor or multiple processors as a core or cores. The system and method of the present invention provides the original IP core library with a small set of generic software based microprocessor (uP) cores that are configurable to meet multiple core IP functions.
Referring to
Generally, an event signal will cause the fast branch 24 in the CPU 14 to communicate with a state handler (execution controller 36 shown in
Referring to
A specialized execution unit 52, which is part of the CPU 14, receives input from the program counter (PC) 48 communicating an address for executing a command. The specialized execution unit 52 executes the command and generates an output 56.
Referring to
More specifically, referring to
Referring to
The next step of the method is to proceed to step 162 to determine if the last instruction was received. If yes 163a, the method proceeds to step 166 to determine if other events are in the queue. Time and processing savings are obtained using the method of the present invention because when the processor returns from an event handler, if there is another event in the queue to handle 167a, instead of returning to the idle loop, the processor jumps directly to the PC address (step 174) without going back to idle, and without saving context. The PC address is loaded 174 and executed 158. If no 163b, the method loops back via path 170 to step 158 to continue executing the current event until the last instruction is received. Once the last instruction is received 163a and there are no more events in the queue 166, the method proceeds via path 167b to step 178, which is the same as step 154, for the execution unit 52 to remain idle until a new PC is received via path 149 from steps 120 and 112.
An advantage of using the method according to the present invention is that less processing power and time is used than traditional processing techniques. The overhead expended to poll, mask, or calculate a particular condition, and multiple context switches are not required with this method, making hardware applications with a single CPU more feasible. Thus, real-time events can be handled with software on a single CPU. This allows traditional hardware designs to be run with software, and can also accelerate the hardware design schedule because a substantial amount of the verification can be done after the hardware skeleton is created.
Another advantage of the present invention is, in real-time operating systems, a “function queue” does not conflict with the “branch queue” of the present invention. Further, the branch queue of the present invention does not need to return from interrupt, does not need a context save, and can branch directly to the next state without returning to the main loop. Thus, several cycles of overhead are avoided. More specifically, the branch queue holds event addresses to which the CPU must branch, in order to handle the given event. When a new hardware event occurs, the address of the event handler is found in the LUT and goes directly to the top of the branch queue. The CPU looks at the top of this branch queue when it is idle, or returns from another event handler. If there is an event to handle (i.e., there is an address in the queue), the CPU will process the address from the queue and jump (branch) directly to that address.
The branch controller system of the present invention emulates hardware behavior with software on a processor. The processor 52 in the specialized execution unit runs significantly faster than the frequency of hardware events inputted 18. For example, if the external hardware bus 18 runs at 1 GHz, the processor inside the specialized execution unit 52, may run, for example, at 10 GHz, giving the processor inside the specialized execution unit 10 cycles to handle the hardware events. For example, assuming one instruction is executed per CPU clock cycle, and the processor 52 is 10 times faster than the hardware events, the cycle differential between the processor 52 and the hardware events results in the specialized execution unit 52 having 10 cycles as a deadline to finish any work before the next event is inputted.
Generally in a microprocessor, response time is critical to bus transactions, and thus it is necessary to respond to an event by the deadline associated with it. In the embodiment shown in
Further, for states that are not subject to a strict, short deadline, the lookup table 40 can also hold a value to indicate whether the state/event can be preempted by another routine. However, to avoid the overhead of context switching, all preemption may be disallowed by default.
In another example of an error, an event handler may fail to respond to an event and return by the deadline. This could happen, for example, because of hardware issues (e.g., power states, clocking errors) or programmer errors. This type of error may be severe and unrecoverable. However, if it is detected that a deadline will be missed, it's possible that the root problem can be fixed and normal operation can resume. To detect a deadline miss, another hardware device will determine how long a particular event handler is processing, and compare it to the predetermined deadline for the response. This predetermined deadline will be loaded into a deadline detector from the same lookup table, at the start of a state branch. An exemplary lookup table, Table 1, is shown below.
Another advantage of the present invention is the ability to make software implementation of hardware more practical. For example, in the present invention, the software is easily maintainable, for example in the case of fixing errors, which reduces costs of support and version redelivery. More specifically, designing hardware for manufacturing on a chip is a long and expensive process. If a defect is found in the product after it has been manufactured, the entire process (e.g., verification, synthesis, layout, checking, mask fabrication, photolithography, etc.) needs to start again, and could cause the product to miss its market window. To avoid this, extensive logic verification of the design is completed before it is released to manufacturing to remove as many bugs (problems/defects) as possible. The cost of verification is roughly twice the cost of designing the chip, and with increasing complexity the cost of verification is increasing exponentially. If the products logic function, however, is done in software, in accordance with the present invention, defects can simply be delivered directly to the customer (for example, downloaded into a computer's RAM). This solution to a bug is thereby rapid and cost efficient, compared to a full hardware respin.
Additionally, the method according to the present invention differs from an interrupt operation. An interrupt operation is used by an operating system and has the ability to handle multiple interrupts concurrently. The method of the present invention differs by having a queue of next states, where each state is in a branch location. The method further includes an automatic return to idle, and a branch to register. No preemption is allowed, a sequence must be finished that is running, then the next task in the queue is run that may include a hardware task queue which is built on demand.
Moreover, function queues according to the present invention include an (interrupt service routine) ISR which adds a function pointer to the function queue in real-time embedded systems. The pointer simply points to a function that services the interrupt. The main loop of the program grabs the latest function from the queue and calls that function.
While the present invention has been particularly shown and described with respect to preferred embodiments thereof, it will be understood by those skilled in the art that changes in forms and details may be made without departing from the spirit and scope of the present application. It is therefore intended that the present invention not be limited to the exact forms and details described and illustrated herein, but falls within the scope of the appended claims.