The present invention generally relates to processors and processing systems. More particularly, it relates to synthesized assertions in a self-correcting processor, and applications thereof.
Functional verification in chip design involves verifying that a chip conforms to specification. This is a complex task, and it takes the majority of time and effort in most processor and electronic system design projects.
Techniques for performing functional verification in chip design exist. These techniques include logic simulation, emulation, and formal verification. While these techniques are useful, functional verification in chip design is becoming increasingly difficult as processor and electronic system complexity increases. As a result, it is likely that a chip will be sold before a problem can be detected using existing functional verification techniques. More than likely, a problem will first be detected by a customer running an application using the chip. Faulty chips in the field can result in recalls of thousands to millions of chips, resulting in heavy financial losses and inconvenience to both the manufacturer and the customer.
What are needed are new processors, systems and techniques that overcome the above mentioned deficiencies.
The present invention provides one or more synthesized assertions in a self-correcting processor, and applications thereof. In an embodiment, a synthesized assertion detects a mismatch between actual processor behavior and specified or expected processor behavior. When unexpected processor behavior is encountered, the synthesized assertion alters operation of the processor and causes the processor to behave in the specified or expected manner.
In one embodiment, a synthesized assertion is used to determine whether exceptions are being processed by the processor according to a predetermined order of priority. If the processor attempts to process exceptions in an unexpected order, the synthesized assertion overrides the current operation of the processor and forces the processor to process pending exceptions is a specified order.
In an embodiment, a synthesized assertion detects and corrects instruction address errors that can cause the processor to fetch instructions from incorrect addresses.
In an embodiment, a synthesized assertion detects and corrects instruction opcode errors.
In an embodiment, a synthesized assertion detects and corrects errors that can cause the processor to stall.
In one embodiment, a synthesized assertion alters operation of the processor by overriding and/or asserting control value(s) that cause the processor to behave in the specified or expected manner.
In one embodiment, a synthesized assertion alters operation of the processor by overriding and/or asserting data value(s) that cause the processor to behave in the specified or expected manner.
Further embodiments, features, and advantages of the present invention, as well as the structure and operation of the various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.
The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.
The present invention is described with reference to the accompanying drawings. The drawing in which an element first appears is typically indicated by the leftmost digit or digits in the corresponding reference number.
The present invention provides one or more synthesized assertions in a self-correcting processor, and applications thereof. In the detailed description of the invention that follows, references to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Execution unit 102 preferably implements a load-store, Reduced Instruction Set Computer (RISC) architecture with single-cycle arithmetic logic unit operations (e.g., logical, shift, add, subtract, etc.). In one embodiment, execution unit 102 includes 32-bit general purpose registers (not shown) used for scalar integer operations and address calculations. Optionally, one or more additional register file sets can be included to minimize content switching overhead, for example, during interrupt and/or exception processing. Execution unit 102 interfaces with fetch unit 104, floating point unit 106, load/store unit 108, multiple-divide unit 120 and coprocessor 122.
Fetch unit 104 is responsible for providing instructions to execution unit 102. In one embodiment, fetch unit 104 includes control logic for instruction cache 112, a recoder for recoding compressed format instructions, dynamic branch prediction, an instruction buffer (not shown) to decouple operation of fetch unit 104 from execution unit 102, and an interface to a scratch pad (not shown). Fetch unit 104 interfaces with execution unit 102, memory management unit 110, instruction cache 112, and bus interface unit 116.
Floating point unit 106 interfaces with execution unit 102 and operates on non-integer data. As many applications do not require the functionality of a floating point unit, this component of processor 100 need not be present in some embodiments of the present invention.
Load/store unit 108 is responsible for data loads and stores, and includes data cache control logic. Load/store unit 108 interfaces with data cache 114 and other memory such as, for example, a scratch pad and/or a fill buffer. Load/store unit 108 also interfaces with memory management unit 110 and bus interface unit 116.
Memory management unit 110 translates virtual addresses to physical addresses for memory access. In one embodiment, memory management unit 110 includes a translation lookaside buffer (TLB) and may include a separate instruction TLB and a separate data TLB. Memory management unit 110 interfaces with fetch unit 104 and load/store unit 108.
Instruction cache 112 is an on-chip memory array organized as a multi-way set associative cache such as, for example, a 2-way set associative cache or a 4-way set associative cache. Instruction cache 112 is preferably virtually indexed and physically tagged, thereby allowing virtual-to-physical address translations to occur in parallel with cache accesses. In one embodiment, the tags include a valid bit and optional parity bits in addition to physical address bits. Instruction cache 112 interfaces with fetch unit 104.
Data cache 114 is also an on-chip memory array. Data cache 114 is preferably virtually indexed and physically tagged. In one embodiment, the tags include a valid bit and optional parity bits in addition to physical address bits. In embodiments of the present invention, data cache 114 can be selectively enabled and disabled to reduce the total power consumed by processor 100. Data cache 114 interfaces with load/store unit 108.
Bus interface unit 116 controls external interface signals for processor 100. In one embodiment, bus interface unit 116 includes a collapsing write buffer used to merge write-through transactions and gather writes from uncached stores.
Power management unit 118 provides a number of power management features, including low-power design features, active power management features, and power-down modes of operation.
Multiply/divide unit 120 performs multiply and divide operations for processor 100. In one embodiment, multiply/divide unit 120 preferably includes a pipelined multiplier, result and accumulation registers, and multiply and divide state machines, as well as all the control logic required to perform, for example, multiply, multiply-add, and divide functions. As shown in
Coprocessor 122 performs various overhead functions for processor 100. In one embodiment, coprocessor 122 is responsible for virtual-to-physical address translations, implementing cache protocols, exception handling, operating mode selection, and enabling/disabling interrupt functions. Coprocessor 122 interfaces with execution unit 102
Assertion logic 124 represents one or more synthesized assertions in accordance with the present invention. In embodiments, assertion logic 124 detects and/or corrects unexpected behavior of processor 100. Unexpected behavior can include, for example, any behavior that deviates from a specified architectural or a specified micro-architecture behavior.
In one embodiment, assertion logic 124 is used to determine whether exceptions are being processed according to a predetermined order of priority. If it is determined that the current or intended order of exception processing is not according to specification, assertion logic 124 overrides the current order of exception processing and forces processor 100 to process the exception as specified or expected.
In still other embodiments, assertion logic 124 is used to detect and/or correct, for example, errors in instruction opcodes that can result in the processor attempting to execute an illegal or reserved instruction, errors in instruction addresses that can result in fetch unit 104 fetching instructions incorrectly and/or a variety of other possible errors.
In an embodiment, assertion logic 124 is used to detect and fix address errors for branch instructions. During processing of a branch instruction, a branch hit/miss signal is sent to execution unit 102 by fetch unit 104 that indicates whether the branch was taken or not taken. When the branch instruction is resolved by execution unit 102, it is determined whether fetch unit 104 was accurate in its prediction by checking the hit/miss signal from fetch unit 104. If the branch was correctly predicted by fetch unit 104, execution continues as normal. If it is determined that the branch was incorrectly predicted, however, execution unit 102 redirects fetch unit 104 to fetch from the resolved address and flushes the pipeline of instructions fetched from the mis-predicted branch address. In a case where the branch was predicted correctly, but due to some error the address of the instruction after the branch instruction is not from the expected predicted address, assertion logic 124 causes execution unit 102 to redirect fetch unit 104 to fetch from the resolved address and flushes the pipeline of instructions fetched from the wrong address.
In an embodiment, assertion logic 124 is used to identify and/or prevent intellectual property theft. For example, in an embodiment, assertion logic 124 is set to react to a specific sequence of software events. Implementing this specific sequence of software events can then be used to trigger assertion logic 124. As a result, a particular error or theft detection code may be written to debug register 502.
In order to more fully appreciate the present invention and how assertion logic 124 operates, consider an example in which assertion logic 124 is used to detect and correct instruction address errors.
Contrary to conventional chip designs, selected assertions are synthesized onto a chip in accordance with the present invention and used to detect and/or correct errors during operation of the chip. For example, a hardware manufacturing error or stray radiation may corrupt a bit value in a processor. In accordance with the present invention, however, a synthesized assertion can be used to detect the corrupt value and assert the correct value.
In embodiments of the present invention, synthesized assertion logic 124 monitors the actual behavior of embodiments of processor 100 and compares actual behavior of processor 100 to expected behavior. When there is a mismatch between the actual behavior of processor 100 and the expected behavior of processor 100, assertion logic 124 forces or asserts the expected behavior. In embodiments of the present invention, synthesized assertions occupy approximately one percent of a chip's total die area and can potentially prevent the recall of millions of chips by self-correcting the behavior of processor 100 in the case of an error.
As shown in
Assertion logic 124 includes storage 208. Assertion logic 124 is coupled to fetch unit 104 and execution unit 102. As shown in
Also shown in
The instructions illustrated in program pseudo-code 204 cause processor 100 to perform in a manner that would be known to persons skilled in the relevant art. For example, the BNE instruction causes execution unit 102 to compare two values stored in two distinct registers in register file 212. If the values are unequal, the branch is taken. The JAL (address) instruction causes a jump to a subroutine starting at the address specified in parentheses. In the example of program pseudo-code 204, the JAL (B) instruction causes a jump to the Mult instruction at address B. When the JAL instruction is executed, a return address (i.e., A+8) is computed by processor 100 and stored in a specified register (e.g., register $31) of register file 212. The JR instruction causes a jump to an instruction pointed to by an address stored in a specific register (e.g., register $31).
As shown in program pseudo-code 204, the JAL instruction and the JR instruction each have a paired delay slot or delay instruction. The delay slot is used with certain instructions because processor 100 implements a pipelined architecture and there are data dependencies among pipeline stages. The delay slots allow for an extra cycle that is used to fetch the targets of the JAL and JR instructions from instruction cache 112. Although not shown, the BNE instruction also would have a paired delay slot or delay instruction. As would be known to persons skilled in the relevant art, however, the JAL and JR instructions, for example, of program pseudo-code 204 can be replaced with jump and link register compact (JALRC) and jump register compact (JRC) instructions, respectfully, which do not have paired delay slots or delay instructions, without departing from the intended scope of the present invention. Thus, it is to be appreciated that although the JAL and JR instructions are illustrated in
In operation, fetch unit 104 sends an instruction stored in instruction buffer 200 along with its associated instruction address to execution unit 102. The instruction is sent to execution unit 102 via bus 218. The instruction address is sent to execution unit 102 via bus 220. For JR (or JRC) instructions, for example, fetch unit 104 also sends a predicted address, retrieved from prediction buffer 202, to execution unit 102 via bus 222. The predicted address is the address used by fetch unit 104 to pre-fetch instructions before the JR (or JRC) instruction is resolved by execution unit 102.
A predicted address stored in prediction buffer 202 is initially calculated and stored in a return prediction stack (RPS) 206 during processing of a JAL instruction. During processing of the JR instruction, execution unit 102 checks the predicted address on bus 222 sent along with the JR instruction on bus 218 against the address stored in the appropriate return address register, i.e. register $31 of register file 212 for this example. If there is a mismatch between the predicted address on bus 222 and the address stored in $31, execution unit 102 redirects fetch unit 104 to fetch instructions from the address stored in register $31 and flushes the pipeline of processor 100 of instructions fetched from the incorrect address.
As noted above, assertion logic 124 includes storage 208. Storage 208 is used to store data such as a predicted address, read from bus 222, that is sent to execution unit 102 together with a JR instruction. If execution unit 102 determines that the predicted address on bus 222 and the address in register $31 match, assertion logic 124 stores the predicted address in storage 208 and uses the stored predicted address to verify proper operation of fetch unit 104, as described in more detail below.
During processing of the JAL instruction, the address of the instruction to be fetched following return from the subroutine (i.e., A+8, which corresponds to the Mult instruction) is calculated and stored in return prediction stack 206. In an embodiment, return prediction stack 206 is four entries deep. As shown in
The time diagram in
During processing of the JR instruction, the address A+8 is retrieved from return prediction stack 206 and stored in prediction buffer 202. In an embodiment, prediction buffer 202 is two deep. The address A+8 is also provided to both execution unit 102 and assertion logic 124 via predicted address bus 222. In an embodiment, arithmetic logic unit 210 compares the address received on predicted address bus 222 with the address value stored in register $31 of register file 212. If a match occurs, execution unit 102 provides a signal to assertion logic 124 to store the predicted address value in storage memory 208. Because the predicted address and the address in register $31 of register file 212 matched, the predicted address, A+8, is known to be the correct address of the instruction to be executed upon return from the subroutine call (i.e., the next instruction to be executed following the delay instruction at memory address B+16).
In the embodiment of processor 100, shown in
The time diagram shown in
As described above with reference to
As shown in the time diagram in
Assertion logic 124 reads the address value A+40 from bus 220 when it is passed to execution unit 102 and compares this address to the A+8 address stored in storage 208. Based on this comparison, assertion logic 124 detects the mismatch between the stored address in storage 208 and the instruction address on bus 220. In response to this detected mismatch, assertion logic 124 generates one or more control signal, which are provided to execution unit 102 via bus 302. These one or more control signals cause execution unit 102 to generate signals 224 that redirect fetch unit 104 to fetch from instruction address A+8 and to flush the pipeline of instructions fetched from address A+40 and onwards.
As illustrated by the above example, it is a feature of the present invention that synthesized assertions (represented for example by assertion logic 124 in
As shown in
In the present embodiment, assertion logic 124 includes an instruction type decoder 308, an adder 306, a multiplexer 310, storage 208 and a comparator 312. Assertion logic 124 receives as inputs instruction addresses on bus 220, instructions on bus 218 and target addresses on a bus 314. The target addresses are provided to assertion logic 124 from arithmetic logic unit 210.
Assertion logic 124 checks the address of an instruction coming in on bus 220. If the address does not match the expected instruction address, assertion logic 124 generates a redirect/flush signal 302. This signal is provided to redirect/flush logic 300. Arithmetic logic unit 210 also provides a redirect/flush signal 304 to redirect/flush logic 300. If either redirect/flush signal 302 or redirect/flush signal 304 is asserted, redirect/flush logic 300 generates redirect and flush signals 224, which as described above redirect fetch unit 104 to fetch from a specified address and flush certain stages of the pipeline of processor 100.
In an embodiment, for non jump/branch instructions, assertion logic 124 computes the address of the next expected instruction by using adder 306 to add a value of four to the address of the current instruction address on bus 220. This new address value is stored in storage 208. For jump/branch instructions, assertion logic 124 receives the target address of the next instruction on bus 314 from arithmetic logic unit 210. If a jump or branch instruction has an associated delay slot instruction, assertion logic 124 accounts for the delay slot instruction and uses the target address on bus 314 for the instruction following the delay slot instruction.
The target address on bus 314 or the address on bus 316, computed by adder 306, is selected using multiplexer 310 as the expected address for the next instruction. The select signal 318 for multiplexer 310 is provided by instruction type decoder 308. Instruction type decoder 308 receives instruction 218 as an input and determines, for example, whether the instruction is a jump instruction or a branch. If the instruction is a jump/branch instruction, instruction type decoder 308 accounts for any delay slot associated with the jump/branch instruction. In embodiments, instruction type decoder 308 determine the type of an instruction (e.g., whether an instruction is a jump instruction or a branch instruction, with or without a delay slot) using selected bits of the instruction that indicate instruction type. The expected instruction address for the next instruction is stored in storage 208.
In an embodiment, the instruction address on bus 220 is compared using comparator 312 against the expected address stored in storage 208. If there is a mismatch between the expected address stored in storage 208 and the instruction address on bus 220, comparator 312 causes redirect/flush logic 300 to place redirect and flush signals on bus 224. These signals cause fetch unit 102 to re-fetch from the expected address stored in storage 208 and flush stages of the pipeline of processor 100 that have been filled using an incorrect address. If there is a match between the expected address stored in storage 208 and the instruction address on bus 220, execution continues normally. In embodiments, assertion logic 124 accounts for stalls and bubbles in the pipeline of processor 100 when computing the address of the next expected instruction and/or storing the address of the next expected address in storage 208.
In an embodiment, the synthesized assertions represented by assertion logic 124a-n monitor the interactions between respective caches 114a-n and processors 100a-n. In an embodiment, one or more of the synthesized assertions has a built in timer. If a particular cache 114 fails to respond to a request by an associated processor 100 for data, for example, in a certain number of cycles, assertion logic 124 resets system 400 or a portion thereof such as the requesting processor as appropriate. In an embodiment, assertion logic 124 restarts the cache and resends the request for data to the cache. In another embodiment, assertion logic 124 monitors a bus 408 connecting a processor 100 with a cache 114. If the processor fails to make a request for data from the cache, for example, for a specific number of cycles, assertion logic 124 resets the processor, or takes an exception, or causes the processor to fetch instructions from a particular address in instruction memory.
In an embodiment, assertion logic 124o monitors bus 404 for data requests. If a data request to main memory 420 does not yield data, for example, in a specific number of cycles, assertion logic 124o may resend the request to memory 420 or cause system 400 to reset.
In an embodiment, assertion logic 124q monitors bus 406 for interactions between custom hardware 430 and processors 100. For example, if custom hardware 430 sends a request to a processor 100 and does not receive a reply, for example, within a specific number of cycles due to a hung processor, assertion logic 124q can cause system 400 or the hung processor to reset. In an embodiment, assertion logic 124q may resend the request before causing a system reset in order to verify, for example, that the processor is hung.
In an embodiment, assertion logic 124p shown in
As described herein, processors 100a-n may include assertion logic 124a1-n1, main memory 420 may include assertion logic 124r and custom hardware 430 may include assertion logic 124s to monitor actual behavior, compare actual behavior to expected behavior, and correct actual behavior if there is a mismatch.
In embodiments of the present invention, assertion logic 124 both generates the control signals and/or values illustrated in
In the example shown above in Table 1, the pre-programmed table of fixes has the options of stalling a pipeline, flushing the pipeline, inserting a no-op in the pipeline, and/or jumping execution to a first or second correction code. In an embodiment, the generated values and the associated fixes may be programmed by an end user via firmware. For example, a match on values 1 and 2 generates a stall, a match on value 3 results in flushing of the pipeline, a match on value 4 causes a no-op to be inserted along with a jump to a first correction code and a match on value 5 causes a jump to a second correction code.
Processor 802 is any processor that includes features of the present invention described herein and/or implements a method embodiment of the present invention. In one embodiment, processor 802 includes an instruction fetch unit, an instruction cache, an instruction decode and dispatch unit, one or more instruction execution unit(s), a data cache, a register file, and a bus interface unit similar to processor 100 described above.
Memory 804 can be any memory capable of storing instructions and/or data. Memory 804 can include, for example, random access memory and/or read-only memory.
Input/output (I/O) controller 806 is used to enable components of system 800 to receive and/or send information to peripheral devices. I/O controller 806 can include, for example, an analog-to-digital converter and/or digital-to-analog converter.
Clock 808 is used to determine when sequential subsystems of system 800 change state. For example, each time a clock signal of clock 808 ticks, state registers of system 800 capture signals generated by combinatorial logic. In an embodiment, the clock signal of clock 808 can be varied. The clock signal can also be divided, for example, before it is provided to selected components of system 800.
Custom hardware 810 is any hardware added to system 800 to tailor system 800 to a specific application. Custom hardware 810 can include, for example, hardware needed to decode audio and/or video signals, accelerate graphics operations, and/or implement a smart sensor. Persons skilled in the relevant arts will understand how to implement custom hardware 810 to tailor system 800 to a specific application.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant computer arts that various changes can be made therein without departing from the scope of the invention. For example, the features of the present invention can be selectively implemented as design features. Furthermore, it should be appreciated that the detailed description of the present invention provided herein, and not the summary and abstract sections, is intended to be used to interpret the claims. The summary and abstract sections may set forth one or more but not all exemplary embodiments of the present invention as contemplated by the inventors.
For example, in addition to implementations using hardware (e.g., within or coupled to a Central Processing Unit (“CPU”), microprocessor, microcontroller, digital signal processor, processor core, System on Chip (“SOC”), or any other programmable or electronic device), implementations may also be embodied in software (e.g., computer readable code, program code and/or instructions disposed in any form, such as source, object or machine language) disposed, for example, in a computer usable (e.g., readable) medium configured to store the software. Such software can enable, for example, the function, fabrication, modeling, simulation, description, and/or testing of the apparatus and methods described herein. For example, this can be accomplished through the use of general programming languages (e.g., C, C++), hardware description languages (HDL) including Verilog HDL, VHDL, SystemC Register Transfer Level (RTL) and so on, or other available programs, databases, and/or circuit (i.e., schematic) capture tools. Such software can be disposed in any known computer usable medium including semiconductor, magnetic disk, optical disk (e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical, or analog-based medium). As such, the software can be transmitted over communication networks including the Internet and intranets.
It is understood that the apparatus and method embodiments described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits. Additionally, the apparatus and methods described herein may be embodied as a combination of hardware and software. Thus, the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalence.