Domain-specific hardware accelerators (HAs) are becoming increasingly crucial for high-throughput and energy-efficient digital systems. Today's digital systems, often referred to as System-on-Chips or SoCs, contain many HAs spanning various application domains. Each HA implements a set of functions referred to as Actions in this paper. HAs may be tightly-coupled, e.g., integrated within a processor's pipeline. More commonly though, HAs are loosely-coupled, interacting with other SoC components (other HAs, processor cores, memory) via on-chip networks. Given their pervasiveness loosely-coupled HAs (LCAs) are the focus of this paper although our presented techniques can be applied to tightly-coupled HAs as well.
Every HA must be verified for correctness both thoroughly and quickly to meet the time-to-market demands of the diverse applications they support. HA formal verification is challenged by: (1) the tremendous effort required to craft highly thorough design-specific properties and full functional specifications, and (2) the scalability of off-the-shelf formal tools. Beyond being time-consuming and error prone producing thorough properties and specifications is an uphill battle due to the rapidly evolving nature of HAs that support rapidly evolving applications.
A recent verification technique, Accelerator Quick Error Detection (A-QED) overcomes the above challenges for a class of HAs that are non-interfering—i.e., HAs that produce the same output for a given action independent of its context within a sequence of actions. A-QED uses formal verification based on Bounded Model Checking (BMC). Unlike conventional BMC-based verification, A-QED does not require extensive design-specific properties or a full functional specification. Instead, A-QED uses self-consistency checks on a given HA. Specifically, A-QED checks for functional
Functional consistency (FC), the property that actions with identical inputs always produce the same outputs. While non-interfering HAs readily capture a range of fixed-function designs, interfering HAs are becoming more and more prevalent. This is partly due to the rise of programmable HAs. In fact, traditional processors may be viewed as an extreme case of interfering HAs where each instruction is an HA action. Interfering HAs contain interfering actions whose outputs are dependent on the outputs of other actions, inherently violating A-QED's FC checking. To complicate matters even further, an HA action might read the outputs produced by another action (or write its outputs to be consumed by another action) at clock cycles that depend on the execution of various other concurrent actions active in the HA. Thus, there is an urgent need for a new and general formal verification methodology for interfering (and non-interfering) HAs that preserves the benefits of A-QED (i.e., provably sound and complete verification without requiring extensive design-specific properties or full functional specifications) while reasoning about interfering actions (not possible using A-QED)—a highly difficult challenge.
While non-interfering HAs readily capture a range of fixed-function designs, interfering HAs are becoming more and more prevalent. This is partly due to the rise of programmable HAs. In fact, traditional processors may be viewed as an extreme case of interfering HAs where each instruction is an HA action. Interfering HAs contain interfering actions whose outputs are dependent on the outputs of other actions, inherently violating A-QED's FC checking. To complicate matters even further, an HA action might read the outputs produced by another action (or write its outputs to be consumed by another action) at clock cycles that depend on the execution of various other concurrent actions active in the HA. Thus, there is an urgent need for a new and general formal verification methodology for interfering (and non-interfering) HAs that preserves the benefits of A-QED (i.e., provably sound and complete verification without requiring extensive design-specific properties or full functional specifications) while reasoning about interfering actions (not possible using A-QED)—a highly difficult challenge.
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, in accordance with the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.
The following disclosure provides many different embodiments, or examples, for implementing different features of the provided subject matter. Specific examples of components, values, operations, materials, arrangements, or the like, are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. Other components, values, operations, materials, arrangements, or the like, are contemplated. For example, the formation of a first feature over or on a second feature in the description that follows may include embodiments in which the first and second features are formed in direct contact, and may also include embodiments in which additional features may be formed between the first and second features, such that the first and second features may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed
Further, spatially relative terms, such as “beneath,” “below,” “lower,” “above,” “upper” and the like, may be used herein for ease of description to describe one element or feature's relationship to another element(s) or feature(s) as illustrated in the figures. The spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. The apparatus may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein may likewise be interpreted accordingly.
The components are utilized by a computer modeling system capable of executing computer models of HPCs in a bounded model checker (BMC). In this manner, computer models are utilized to verify the correctness of a hardware processing circuit design in a pre-silicon environment. Non-limiting examples of a computer modeling system is Very Large Scale Integration (VLSI) and non-limiting examples of the computer models include Register Transfer Level (RTL). Non-limiting examples of HPC designs that can be verified using the techniques disclosed herein include processor cores, hardware accelerators, and/or the like. In some embodiments, methods employed by the components check for functional consistency (FC), the property that actions with identical inputs always produce the same outputs. In addition to FC, the method also performs single-action checking (SAC) and response bound (RB) checking. In some examples, the G-QED techniques described herein are provably sound and complete meaning no bugs are missed and there are no false positives. In some examples, G-QED is not quite general but can verify a large class of digital designs.
In particular, each processing circuit that is being modeled herein is capable of performing a set of functions which are herein referred to as actions. Examples of actions that may be performed herein include updating registers performing addition functions, performing subtraction functions, decoding messages, for performing any other type of definable function that may be implemented by any type of processing circuitry. Processing circuitry can perform interfering actions and non-interfering actions. Non-interfering actions produce the same output independent of the context of the action within a sequence of actions. In other words, non-interfering actions only depend on the current state of independent variables and do not depend on the states of other actions. However, interfering actions have outputs that do depend on the context of the action within the sequence of actions. In other words, the output of an interfering action depends not only on the current state of independent variables but on the state of other actions in the sequence of actions. The components shown in
While the processing circuitry that only performs non-interfering actions can readily capture a range of fixed function designs, processing circuitry that performs interfering actions are becoming more and more prevalent in the industry. This is partly due to the rise of programmable HAs. In fact, traditional processors may be viewed as an extreme case of interfering HAs, where each instruction is an HA action. HAs that perform interfering actions contain interfering actions whose outputs are dependent on the outputs of other actions, thereby significantly complicating FC checks. To complicate matters even further, an HA action might read the outputs produced by another action (or write its outputs to be consumed by another action) at clock cycles that depend on the execution of various other concurrent actions active in the HA. The component shown in
As shown in
The components shown in
Additionally, there is a second set of action inputs [Ia, Ib], which are also referred to herein as the action pair. It should be noted that the second sequence of action inputs is ordered with respect to time so that the second sequence begins at action input Ia and ends at action input Ib. It is the outputs generated by the action pair that are being checked for FC. The second set of action inputs [Ia, Ib] have at least 2 inputs that result in at least one output. In some examples, the second set of action inputs [Ia, Ib] that result in at most two outputs. As explained in further detail below, at least two inputs are used is not all inputs generate an observable output. For example, some non-interfering (or interfering) actions simply update the architectural state of relevant state registers (explained in further detail below). These actions do not generate observable action outputs. Thus, in order to check the action output resulting from a non-interfering action, the non-interfering action should be followed by another action that does generate an observable output. That way the non observable action output of the non-interfering action can be inferred from the observable output of the other action. Thus, the second set of action inputs [Ia, Ib] has at least two members case action Ia and Ib in case action Ia does not generate an observable output.
There are other sequences of action inputs that may be utilized to implement the methodology of the component shown in
Each action input defines an action to be taken and one or more variable inputs for the action (i.e., this can be represented as <Action, variable input1 . . . variable inputx>). The variable inputs are the inputs needed to perform the action. In some embodiments, at least some of the variable inputs for the action are independent variables. For example, the variable input may be a data field that is updated every clock cycle and that is independent of any other action. If all of the variable inputs for the action are independent variables, the action defined by the action input is a non-interfering action because the action does not depend on any other action. In other embodiments, at least one of the variable inputs is the output of another action. For example, a previous action input may update the architectural state of a relevant state register. One of the variable inputs of the subsequent action may be the architectural state of this relevant state register. In this case, the subsequent action is defined by the subsequent action input as an interfering action because the interfering action depends on the output of the previous action input.
To perform FC, a boundary model checker (BMC) runs the first computer model 102, the second computer model 104, and the third computer model 106. The first computer model 102 is implemented on the first sequence of action inputs [I0 . . . Ik]. The first computer model 102 is allowed to idle until the first sequence is done processing the first sequence. After the first computer model 102 is allowed to idle and is done processing the first sequence, the architectural states of the first computer model 102 are recorded. Note, that by allowing the first computer model 102 to idle, determining the exact clock cycle of when the architectural states are updated is unnecessary. Instead, the first computer model 102 is simply allowed to idle and process the first sequence. In some examples, the BMC can symbolically choose all possible [I0 . . . In], k and n. This means the BMC will exhaustively check FC for all possible values of k, n, for the action inputs [I0 . . . In] starting with k=1, n=1 and incrementally increasing k and n till the exploration space is too large. In some examples, BMC chooses inputs symbolically. In some examples. G-QED is run on the BMC and in other examples G-QED is run on another software tool with similar functionality.
Once finished, the architectural states of the first computer model 102 are recorded and will be used with the third computer model 106, as explained in further detail below. In some embodiments, at least some of the architectural states that are recorded are the architectural states (i.e., values) of the relevant state registers modeled by the first computer model 102. The architectural states are recorded in a non transitory computer readable medium. Thus, the architectural states are digital representations.
Additionally, the second computer model 104 of the hardware processing design is implemented on the first sequence of action inputs [I0 . . . Ik] directly followed by the second sequence of action inputs [Ia, Ib] what's that the second sequence results in the first action outputs [Oa, Ob]. The methodology implemented by the components in
In this embodiment, an ancillary sequence of action inputs [Im+1 . . . In] are initially implemented by the third computer model 106. These are ancillary and not relevant to checking FC. Subsequently, the third computer model 106 it's set up with the recorded architectural states obtained by implementing the first sequence of action inputs [I0 . . . Ik] with the first computer model 102. Once the third computer model 106 it's set up with the recorded architectural states, the third computer model 106 implements the second sequence of action inputs [Ia, Ib] thereby resulting in second action outputs [Oa, Ob]. The third action outputs [Oa, Ob] and the second action outputs [Oa, Ob] are then compared to determine whether the hardware processing circuit design is FC. If the first action outputs [Oa, Ob] and the second action outputs [Oa, Ob] are the same, the hardware processing design is FC in accordance with some embodiments. If the first action outputs [Oa, Ob] and the second action outputs [Oa, Ob] are different, the hardware processing circuitry design is not FC in accordance with some embodiments.
In some embodiments, the action variable of the action input Ia defines a non-interfering action that does not generate an observable output Oa. In other words, reading the output Oa during the implementation of the computer models 104, 106 is not practical and therefore the output Oa is not observable. Thus, the action input Ib is provided which directly follows the action input Ia. The action input Ib has an action variable that defines an interfering action where an input variable of the interfering action is the output Oa of the action input Ia. In this manner, the output Oa can be inferred from the output Ob. In this manner, the output Oa can be checked for FC.
In this disclosure, when an action input does not result in an observable action output, the action input Ia is referred to as not resulting in an action output. Thus, since the action output Oa is not observable the action input Ia is not considered to result in an action output. Instead the action output Oa has to be inferred from the action output Ob. In some embodiments, the non-interfering action defined by the action input Ia does not result in an action output such that implementing the second computer model 104 of the hardware processing circuit design on the action input Ia does not result in any action output and implementing the third computer model 106 of the hardware processing design on the action input Ia does not result in any action output since these action outputs Oa are unobservable. For example, the non-interfering action of the action input Ia can result in an update to at least one architectural state of at least one relevant state register in the hardware processing circuit design. As such, implementing the second computer model 104 on the action input Ia does not result in an action output because the architectural state of the relevant state register is not observable during the implementation of the second computer model 104. As such, implementing the third computer model 106 on the action input Ia does not result in an action output because the architectural state of the relevant state register is not observable during the implementation of the third computer model 106.
However as mentioned above, the interfering action (in other cases, a non-interfering action) of the action input Ib results in an action output because the action output Ob is observable. One of the variable inputs of the interfering action of the action input Ib is the architectural state of the relevant state register updated by the action input Ia. Thus, implementing the action input Ib by the second computer model 104 results in an observable action output Ob. Additionally, implementing the action input Ib by the third computer model 106 results in an observable action output Ob. In this manner, the action output Oa of the action input Ia when implementing the second computer model 104 can be inferred from the observable action output Ob. Additionally, the action output Oa of the action input Ia when implementing the third computer model 106 can be inferred from the observable action output Ob. In this manner, the non observable action output Oa resulting when implementing the second computer model 104 and the non observable action output Oa resulting when implementing the third computer model 106 can be compared and checked for FC.
In some embodiments, operating the computer hardware processing circuit (HPC) computer models, inputs provided to each of the HPC computer models, and output generated by the HPC computer models in order to verify the HPC design as described in
The computer model 200 represents an HPC (e.g., an HA). To ease the discussion, the modeled components of the computer model 200 are referred to in this disclosure with respect to the hardware components they represent. However, it should be understood that what is actually being discussed are computer models of the hardware components, and not the physical hardware components themselves. This eases the discussion of the hardware processing circuit design. By providing an example of an actual hardware processing circuit design being modeled by the computer model 200, the challenges overcome by the methodology discussed in
The HPC modeled by the computer model 200 is connected to other SoC components (e,g, processor, memory) via a handshake protocol similar to the one discussed in Singh et al., “A-QED Verification of Hardware Accelerators,” in 57th ACM/IEEE Design Automation Conference, DAC 2020, San Francisco, CA, USA, Jul. 20-24, 2020. IEEE, 2020, pp. 1-6, which is hereby incorporated by reference in its entirety (hereinafter “Singh”). The HPC only reads valid inputs (in_valid asserted) from the network when it is ready (rdy_out asserted). The network reads HPC-generated outputs (out_valid asserted) when it is not blocked by other components (rdy_in asserted).
The HPC modeled by the computer model 200 and implements 3 actions {A1, A2, A3} as follows:
The HPC modeled by the computer model 200 includes fast operational HA circuitry 202 and slow operational HA circuitry 204. F( ) and Scaler( ) take multiple cycles to compute and pending inputs are stored in the FIFOs. If neither FIFO is full, the HA utilizes the slow operational HA circuitry 204 to implement F( ). If either FIFO is full, the HA utilizes the fast operational HA circuitry 202 to implement F( ). If any of the Scaler( ) inputs is 0, the unit is designed to skip computation for better power and performance. Thus, when the Scaler( ) unit is bypassed, the HA updates Factor register with 0.
Consider the following bug (adapted from an actual bug)—the FIFO 1 full signal goes high only when the write pointer reaches 15 (starting from 0) but the FIFO can hold at most 15 entries. Hence, the 16th As input overwrites its predecessor. This bug is only triggered if the rdy_in is low long enough. It can be detected by checking As for FC. However, to perform FC, we need to constrain the relevant state registers (RSRs) (i.e., Bypass, Factor) to prevent false fails.
Challenge 1: The exact clock cycles when RSRs are read depend on the internal state and can be different for different RSRs. For example, consider action A3. When the result of F( ) is fed to the Scaler( ) the value stored in Factor is also read. It is very important to precisely specify the clock cycle during which this value should be read. That is not an easy task because it depends on the latency of F( ) (which in turn depends on the FIFO 1 state). Incorrect timing information can result in false fails.
Challenge 2: Consider the same bug in FIFO 2. If we constrain Factor to a fixed value, A3 will always read the same value from the Factor register and pass FC check.
Challenge 3: Checking FC on A2 is a non-trivial problem since an update can happen either from A2 or because the HA updates it to 0 when the Scaler( ) is bypassed. So it is not necessary that the ith action input will produce the ith update to the Factor register. Thus, we need to understand the design implementation to FIG. out when an action input updates the Factor register.
The FC approach described in
We assume that the RSRs values used to calculate {Oa, Ob} in of the second computer model 104 have the same values as those saved in of the first computer model 102. This is elucidated in the following design constraint: used the same recorded RSR values as the second computer model 104.
This is elucidated in the following design constraint:
With regards to challenge 3, the Factor value updated by A2 can be propagated as an output of a future A3 action. Thus, we address challenge 3 by FC for the action pair {A2, A3} instead of checking FC for A2 action. Pair wise checking of actions allows us to find bugs in the RSR updating logic since checking the RSRs directly is non-trivial for a general HA as discussed in Challenge 3. However, this is not the case for processors, so we consider all the RSRs as the output of every instruction. Thus, we don't have to check instructions in pairs for processors but instead can check for a single output of a single action input.
To catch the FIFO 2 bug, BMC will run:
The bug in FIFO 1 can be caught in a similar manner.
Section IV of the appendix formalizes concepts discussed with respect to
Flow chart 300 includes procedures 302 to 312. In some embodiments, procedures 302 to 312 are performed by the components illustrated in
That procedure 302, a first computer model is implemented of a hardware processing circuit design on a first sequence of one or more first action inputs followed by a second sequence of one or more action inputs such that the second sequence results in at least one first action output. An example of the first computer model is the second computer model 104 shown in
An example of the first sequence of one or more first action inputs is the first sequence of action inputs the first sequence of action inputs [I0 . . . Ik]. The exemplary first sequence of action inputs the first sequence of action inputs [I0 . . . Ik] has more than one action input. Each of the first action inputs in the first sequence has a first date available that varies for the first action inputs along the first sequence and the first action variable that varies for the first action inputs along the sequence. Some or all of the first action inputs along the first sequence may also define other variable inputs, including the architectural states of RSRs. The first action variable defines an action for the action input which varies along the first sequence of action inputs [I0 . . . Ik]. The data variable may be an independent variable that defines input data for each action input along the first sequence of action inputs [I0 . . . Ik].
An example of the second sequence of one or more action inputs is second sequence of action inputs [Ia, Ib]. The exemplary second sequence of action inputs has at least two action inputs Ia, Ib. Each of the second action inputs Ia, Ib in the second sequence includes a data variable that varies for the second action inputs along the second sequence and a second action variable that varies along the second sequence. Some or all of the second action inputs along the second sequence may also define other variable inputs, including the architectural states of RSRs. For example, in some embodiments, the action input Ia defines a non-interfering action that does not generate an observable output Oa. The action input Ib defines an interfering action, wherein one of the variable inputs of the interfering action is the non observable output Oa of the action input Ia. In this manner, the observable output Ob of the action input Ib is used to infer the non observable output Oa of the action input Ia. Action outputs [Oa, Ob] of the second computer model 104 are examples of the at least one first action output. In some embodiments, the sequence of action outputs includes a non observable action output and an observable action output in accordance with some embodiments. The non observable action output may be the result of a non-interfering action that updates the architectural state of at least one RSR. The observable action output maybe the result of an interfering action that utilizes the architectural state of the at least one RSR has one or more variable inputs. Flow then proceeds to procedure 304.
At procedure 304, a second computer model is implemented of the hardware processing circuit design on the first sequence of one or more first action inputs and allowing the second computer model to idle until the second computer model is done processing the first sequence of one or more action inputs. An example of the second computer model is the first computer model 102 shown in
At procedure 306, architectural states of the second computer model are recorded after the second computer model is allowed to idle and is done processing the first sequence. In some embodiments, the architectural states are the architectural states of RSRs of the hardware processing circuit design being modeled. Flow then proceeds to procedure 308.
At procedure 308, a third computer model of the same hardware processing circuit design is set up with the architectural states recorded in procedure 306. An example of the third computer model is the third computer model 106 shown in
At procedure 310, the third computer model set up with the architectural states is implemented on the second sequence of one or more action inputs such that the second sequence results in at least one second action output. An example of the at least one second action output is the sequence of action outputs [Oa, Ob] a third computer model 106. In some embodiments, the sequence of action outputs includes a non observable action output and an observable action output in accordance with some embodiments. The non observable action output may be the result of a non-interfering action that updates the architectural state of at least one RSR. The observable action output maybe the result of an interfering action that utilizes the architectural state of the at least one RSR has one or more variable inputs. Flow then proceeds to procedure 312.
At procedure 312, the at least one first action output and the at least one second action output are compared to determine whether the hardware processing circuit design is functionally consistent. With respect to the exemplary non observable action output and observable action output discussed above with respect to procedures 302 and 312, performing procedure 312 includes comparing the first one of the at least one first action outputs and the second one of the at least one second action outputs to infer whether the first update to the at least one architectural state and the second update to the at least one architectural state are functionally consistent. In some embodiments, in order for the first update in the second update to be functionally consistent, the first update and the second update have to be the same.
In some embodiments, implementing the G-GED techniques described by the method in
The processor-based system also includes a memory system 404 that includes one or more memory arrays that each include multiple memory banks and include an integrated serialization circuit configured to convert parallel data streams of read data received from separately switched memory banks into a single, serialized, read data stream to be provided on the output bus in a burst read mode, and/or a de-serialization circuit configured to convert a received, serialized write data stream on an input bus for a write operation into separate, parallel write data streams to be written simultaneously to the memory banks in a burst write mode. The memory system 404 in this example includes an instruction cache 406, a data cache 408, and a system memory 410.
With continuing reference to
The processor 402 and the system memory 410 are coupled to the system bus 412 and can intercouple peripheral devices included in the processor-based system 400. As is well known, the processor 402 communicates with these other devices by exchanging address, control, and data information over the system bus 412. For example, the processor 402 can communicate bus transaction requests to a memory controller 414 in the system memory 410 as an example of a slave device. Although not illustrated in
Other devices can be connected to the system bus 412. As illustrated in
The processor-based system 400 in
While the non-transitory computer-readable medium 432 is shown in an exemplary aspect to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the processing device and that cause the processing device to perform any one or more of the methodologies of the aspects disclosed herein. The term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical medium, and magnetic medium.
In some aspects, in response to executing the computer executable instructions stored in the computer readable medium 432, the processor 402 is configured to perform the methodology described with respect to
The aspects disclosed herein include various steps. The steps of the aspects disclosed herein may be formed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the steps. Alternatively, the steps may be performed by a combination of hardware and software.
The aspects disclosed herein may be provided as a computer program product, or software, that may include a machine-readable medium (or computer-readable medium) having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the aspects disclosed herein. A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes: a machine-readable storage medium (e.g., ROM, random access memory (“RAM”), a magnetic disk storage medium, an optical storage medium, flash memory devices, etc.); and the like.
The foregoing outlines features of several embodiments so that those skilled in the art may better understand the aspects of the present disclosure. Those skilled in the art should appreciate that they may readily use the present disclosure as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions, and alterations herein without departing from the spirit and scope of the present disclosure.
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, in accordance with the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.