Constrained random verification (CRV) is a de facto standard in industrial-design verification. Central to this process is a design of a testbench that applies pseudorandom stimulus to a design under test (DUT) downstream. The testbench typically includes parameterized tests that are manually crafted for verifying functionality. Each parameter acts as a high-level knob to control stimulus generation, and the testbench then generates a family of related stimuli based on these configurable parameters (and a pseudorandom generator controlled by a seed).
Examples of such test parameterizations are shown in
216×(23)15=261
This represents a very large search space that may be difficult for verification engineers to fully explore.
Verification engineers typically rely on increased coverage over time simply through variations in the random seed. With increasing design complexity, the number of parameters increases, making it harder for a human to effectively reason about the higher dimensional parameter space. For example, human engineers may miss trends among parameter values or draw incorrect conclusions, resulting in sub-optimal coverage and runtime for tests. These problems add to the cost of verification and coverage closure, which is a costly, inefficient process that may take multiple person-years to verify industrial designs.
This document discloses systems and methods for implementing automatic test parameter tuning in constrained random verification. In aspects, a method receives a first set of parameters for testing a design under test, performs a first regression on a design under test using the first set of parameters, and analyzes the results of the first regression including determining a coverage percentage for the first regression. The method then generates an optimized set of parameters based on the analysis of the results of the first regression and performs a subsequent regression on the design under test using the optimized set of parameters. The method may be repeated using the iteratively optimized set of parameters until a coverage percentage is reached (e.g., based on a threshold), or in some implementations, full coverage is reached. Some implementations of the method may utilize black-box optimization through use of a Bayesian optimization algorithm In aspects, black-box optimization is an optimization method that optimizes a function without the function being visible to the optimizer. Specifically, the function, or the function outputs may be analyzed, but the function itself is unknown to the optimizer. In some implementations, the method is implemented as computer-readable instructions stored in computer-readable storage media (CRM) and executed by a processor. For example, the black-box optimization may be implemented in a hardware description language (HDL). Alternatively, the method may be implemented by a system containing one or more processors that execute computer-readable instructions of a CRM to implement the described aspects.
This Summary is provided to introduce simplified concepts for implementing automatic test parameter tuning in constrained random verification. The simplified concepts are further described below in the Detailed Description. This Summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter.
The details of one or more aspects of the described systems and methods are described below. The use of the same reference numbers in different instances in the description and the figures indicate similar elements:
The systems and methods for automatic test parameter tuning in constrained random verification described herein may provide an improved approach to automatically configuring test parameters that leads toward increased test coverage. Aspects of the described systems and methods are useful in testing various hardware devices, such as central processing units (CPUs), machine learning processors, memory, cache controllers, and other processing devices. Described systems and methods are also useful in testing different types of algorithms and software applications.
Aspects of a Smart Regression Planner (SRP) framework discussed herein may automatically configure multiple tunable test parameters to better explore an input space and converge quickly towards a desired verification coverage. The SRP framework may formulate test parameter configuration as a black-box optimization problem and solve it using Bayesian optimization or other (non-Bayesian) optimization methods with analyzed coverage results fed back from regression iterations. Particular examples discussed herein refer to regression iterations or “nightly regressions,” which refers to running a testing process on a nightly basis. An iteration or nightly regression is an example of a continuous integration (CI) with automated testing. In alternate implementations of the systems and methods discussed herein, the testing process may be repeated at any interval and is not limited to nightly testing.
As discussed herein, random perturbation of human-defined test parameters may provide an improved functional coverage (and coverage in general) of the testing systems and methods. For example, in a RISC-V verification platform, the described systems and methods for automatic test parameter tuning in constrained random verification may result in approximately 60% functional coverage difference between the best and worst configurations. The fine-grained parameter tuning described herein provides a unique opportunity for increasing functional and code coverage with no additional effort from the verification engineers beyond setting up the system.
SRP framework 200 may formulate test parameter configuration as a black-box optimization problem, which is implemented by a black-box optimizer 202. The tractable input dimensions allow the application of powerful ML-based (machine learning-based) Bayesian optimization methods to test parameter configuration. Bayesian optimization is agnostic to structure and flexible enough to adapt to changes in an evolving design. In addition to Bayesian optimization, the described systems and methods may also employ a simple random search (random perturbation of parameters) as a comparative algorithm In alternate implementations, other algorithms based on an evolutionary search may be used instead of Bayesian optimization. While random search relies purely on exploration, Bayesian optimization exploits learning through feedback. In some aspects, the systems and methods also explore common use cases of: simultaneously minimizing runtime and maximizing coverage using multi-objective Bayesian optimization; and transfer learning, or the ability to transfer learned heuristics from one set of parameters to another through design evolution.
As shown in the example of
For example, a Verilog simulator 222 can be used to simulate a test 212 on a DUT 216. The test 212 and the DUT 216 are configured from a parameterized test 206 and DUT information 204 provided by the verification engineer. In aspects, the Verilog simulator 222 analyzes an iteration or nightly regression performed on the DUT 216 using an input stimulus 214 configured from the default test configuration 208 (set of parameters). The analyzed results of the first nightly regression contain a coverage 218 (point-in-time coverage) which may be input to the black-box optimizer 202 to determine an optimized test configuration 220 (set of parameters) from the domain of parameters 210, or input space, provided by the verification engineer. The optimized test configuration 220 may be used to perform a subsequent nightly regression, the results of which may be analyzed and again provided to the black-box optimizer 202. In some implementations the DUT 216 is a representation of a hardware system, for example, using a hardware description language, a very-high-speed integrated circuit HDL (VHDL), Verilog, SystemC, or so forth. In aspects, the DUT 216 can be represented as a logical representation of a hardware system of logical components for fabrication as an integrated circuit. As such, the SRP framework 200 can be used to validate hardware systems, such as silicon devices, processor cores, co-processors, or SoCs in whole or part. In aspects, the logical components of the hardware system may include a processor, a memory, an arithmetic unit, a data bus, a register, a logic gate array, a logic gate map, a configurable logic block, a look up table (LUT), input/output (I/O) logic, selection tree, sense amplifier, memory cell, or the like.
In some aspects, SRP framework 200 may be implemented in an industrial nightly regression flow. Through that implementation, it was found that the approach fits with minimal overhead into the flow. Furthermore, when regressions are run in the cloud, SRP framework 200 can opportunistically exploit idle capacity to increase exploration, leading to faster coverage improvement. Certain SRP framework 200 experiments were evaluated on two sets of designs: open-source (RISC-V, Ibex) as well as a larger industrial design MLChip. MLChip is an artificial intelligence accelerator implemented, for example, using an application-specific integrated circuit (ASIC) developed specifically for neural network machine learning. In some aspects, the accelerator is a Tensor Processing Unit (TPU). In particular aspects, the MLChip is the main block inside an artificial intelligence chip or system. As shown in detail below, the comprehensive set of experiments illustrated that test parameter optimization may consistently provide significant value to constrained random verification (CRV). In practical settings, where even a 1-2% coverage improvement is considered significant, use of the SRP framework 200 can result in substantial savings in human effort and time to closure.
In some aspects, the systems and methods measure coverage computed per iteration (point-in-time coverage) and the cumulative coverage over many iterations (accumulated coverage). During the SRP framework 200 experiments, the framework consistently achieved the maximal point-in-time coverage percentage (up to 9.52% higher than baseline) as well as maximal accumulated coverage percentage (up to 3.89% higher than baseline) across 100 nights on all designs. In some aspects, SRP framework 200 converges much faster than the human baseline on the last 5% and 0.5% accumulated coverage margins, showing its value in the difficult coverage closure phase. Over all designs, it takes the SRP framework 200 approach up to 81 fewer nights to reach the last 5% and at least 25 fewer nights to reach the last 0.5% of maximum attainable coverage. In comparison, the baseline did not reach the maximum coverage even in 100 nights for the MLChip. Additionally, the SRP framework 200 is able to detect more issues (6.86 failures per 1000 tests) than the human baseline (5.94 failures per 1000 tests) on the industrial design MLChip. This ability to detect more issues allows for faster debugging of the design under test. In aspects, multi-objective optimization using SRP framework 200 improves the runtime of simulations by 15% without experiencing a lack of coverage.
Although Bayesian optimization consistently outperforms random search and human baseline, even simple random search achieves much higher accumulated coverage than the human baseline. This approach could provide an easy advantage in many industrial settings. Based on the experiments performed using SRP framework 200, the various advantages were identified. In some implementations, SRP framework 200 provides a high-return, minimal-overhead opportunity for faster coverage convergence by automatically configuring test parameters in constrained random verification. In some experiments, it is illustrated that black-box Bayesian optimization is ideally suited to configure test parameters to achieve higher per-iteration coverage and higher total coverage. In aspects, the experiments demonstrate that, with comprehensive evaluation in an industrial setting, test parameter optimization provides a consistent increase in point-in-time, and fast convergence to accumulated coverage, which are highly valuable during coverage closure and bug detection. Further, the experiments demonstrate that even random perturbation of test parameter configurations may offer a simple yet powerful method of achieving higher accumulated coverage. Additionally, the experiments show value in Bayesian multi-objective optimization that trades off runtime for coverage and in transfer learning of heuristics between tests. In some implementations, the experiments address a problem space in verification that is practically valuable yet tractable enough to apply advanced machine learning and white-box optimization algorithms in the future.
In some aspects, the parameterized test 206 is defined as T(p1, p2, . . . pn) where pi is a test parameter that takes values in the given domain of parameters 210 (e.g., range) and test parameters can be numerical, categorical, or Boolean. In some instances, the parameterized test 206 includes hardware parameters (e.g., for a hardware design or processing system), for example, one or more of a bus width, data width, register depth, memory depth, voltage, clock frequency, timing variables, delays, etc. that can be used to test the hardware of a DUT 216. Given an assignment of values vi to each pi, simulation returns the (point-in-time) coverage 218 CPIT, which is a real number between 0 and 100%. A test configuration for T is the set of the test parameter values v=(v1, v2, . . . vn). The objective of regression planning may be to find a test configuration v* that maximizes CPIT. Since the function f, which maps a test configuration v to CPIT, does not have any obvious structure that may be exploited for optimization (such as convexity or smoothness), the systems and methods for implementing automatic test parameter tuning in constrained random verification consider black-box techniques.
Even though optimizing the point-in-time coverage 218 CPIT may be the main objective of the problem, some aspects also measure the accumulated coverage CACC for a sequence of test configurations (also a real number between 0 and 100%) since it is a critical metric for coverage closure and sign-off.
As mentioned above,
In some aspects, for efficiency in the setup, the first configuration suggested by black-box optimizer 202 is set up to be the default test configuration 208 provided by the verification engineer (instead of a random configuration), though this can be changed in alternate implementations. Additionally, suggested test configurations of the black-box optimizer 202 may be automatically checked in to source control alongside the parameterized tests to track and diagnose regression failures with the existing tools.
The described systems and methods may not expose the random seed parameter to black-box optimizer 202 as a test parameter, which consequently sees a stochastic optimization problem. In some aspects, black-box optimizer 202 is optimized by formulating regression planning as an optimization problem. The goal is to find v*=argmaxvƒ(v), where v* is a test configuration that maximizes the CPIT. Several classes of algorithms have been proposed from a simple RANDOM-SEARCH to more powerful techniques like Gaussian Process Bandits (GP-BANDIT) that are fundamentally Bayesian in nature. Both of these techniques are discussed herein.
RANDOM-SEARCH selects vt uniformly at random at time step t independent of the previous points selected. As such, the RANDOM-SEARCH technique may arbitrarily change the test configuration in every simulation.
In contrast, GP-BANDIT aims to sequentially optimize function ƒ by getting feedback from the environment. GP-BANDIT uses a Bayesian optimization technique, where it maintains a Gaussian prior GP(μ, Σ) over ƒ and updates it with samples drawn from ƒ to get a posterior from previous regressions that better approximates the objective. This algorithm may perform exploration and exploitation techniques to choose a test configuration. In an example, the strategy to pick a new configuration vt is:
v
t=argmaxv(μt-1(v)+βtσt-1(v))
where βt is a constant that implicitly negotiates the exploration-exploitation tradeoff In the exploitation phase, GP-BANDIT may pick argmaxv, μt-1(v), to maximize the expected reward based on the posterior from previous regressions and maximize the reward (CPIT 218). In the exploration, however, it acquires new information by choosing a new test configuration v, where ƒ s uncertain (σt−i(v) is large).
For example, a first simulation is run with parameter value 0.4. As a result, the predicted coverage 302 value is updated and the uncertainty at parameter x=0.4 is reduced. It should be noted that the uncertainty may increase as distance from a sampled value increases. The acquisition function determines the optimal value for parameter x based on the updated predicted coverage 302 and the updated uncertainty 0.4. As a result, a value of 1 is used for the next iteration. Again, the simulation is run with the given parameter value (e.g., x=1) and the predicted coverage 302 value and uncertainty is updated. The acquisition function is again used to determine an optimal value for parameter x to run the next simulation. In the example 300, this process repeated for 10 iterations, producing the final predicted coverage 302.
It should be noted that the example 300 is simplified for a single parameter. Additionally, though the acquisition function and statistical model are shown as specific models, any number of acquisition functions or statistical models may be appropriate. The acquisition function may include a negotiation constant that negotiates the tradeoff between exploitation of previous results (e.g., a smaller negotiation constant) and exploration of new parameter values (e.g., a larger negotiation constant). In an aspect, the black-box optimization described above utilizes a complex variation of the example 300 to perform parameter optimization for multiple parameters of the parameterized test. Results of the black-box optimization as described are further illustrated below.
The following experiments utilize the GP-BANDIT implementation discussed above with a βt value of 0.01. Further, the test results discussed herein are related to three different designs: RISC-V, Ibex, and MLChip. As mentioned above, RISC-V is a processor design adopting RISC-V ISA. RISC-V-DV is used to verify the design, and it reports functional coverage. Ibex is a 32-bit RISC-V CPU core. Code coverage is reported on Ibex. MLChip is the main block inside an artificial intelligence chip or system. In some aspects, it may provide a proprietary deep-learning accelerator featuring systolic arrays of multiply-accumulation units. Total coverage is reported for MLChip, which includes code and functional coverage.
The tests evaluate the performance of SRP framework 200 with respect to both RANDOM-SEARCH and GP-BANDIT versus the baseline, which is the human-generated tests with fixed parameters. For black-box optimizer 202, the tests may repurpose the open-source implementations of these algorithms from a hyperparameter tuning platform (e.g., Google Vizier). To mitigate the impact of randomness on the results, the tests ran each experiment five times and reported the average coverage across the five runs.
As discussed herein,
Returning to
For example, on RISC-V, to reach 95% of max CACC, it takes GP-BANDIT only 2 nights, while it takes RANDOM-SEARCH 5 nights and baseline 11 nights. To reach 99.5% of max CACC, these numbers grow to 55, 56, and 83 nights, respectively. Like with CPIT, these numbers show the value of the SRP framework 200 process for accumulated coverage during coverage closure.
In some aspects, the SRP framework 200 algorithms might end up with isolated CPIT values that are lower than the human baseline with fixed parameter values. When it is important to ensure that the results do not dip below the human baseline on any given night, the systems and methods propose a new use case for regression testing by running the SRP framework 200 algorithms in addition to the original baseline flow. For example, the systems and methods may merge the coverage for the two runs and report it as the coverage for every iteration. The systems and methods may report the merged coverage of two runs for baseline for a fair comparison.
As shown in
Optimizing for high coverage can sometimes lead to unacceptably high simulation runtimes. In some situations, a verification engineer might want to trade off one for the other at different points in the verification phase. We explore multi-objective optimization (MO) in Bayesian optimization to simultaneously minimize simulation runtime and maximize coverage.
In some aspects, SRP framework 200 was deployed into real production for the MLChip design (under active development) and ran for 30+ days with the GP-BANDIT algorithm. The systems and methods then collected the unique failure signatures found in the final three days. This bug detection study yielded the following results: 29 signatures were found from 4230 tests (6.86 failures per 1000 tests) driven by GP-BANDIT, while baseline found 26 signatures over 4378 tests (5.94 failures per 1000 tests). Tests that were failing due to infrastructure issues were not counted. Given that each unique signature manifests a bug, this experiment demonstrates not just higher coverage but better bug detection capability with fewer tests.
One type of learning is GP-BANDIT-based learning.
Another type of learning is Transfer Learning. An example use case is the addition of new test parameters and design features as the design evolves. Instead of re-training the black-box algorithms in this case, the black-box optimizer may transfer learned heuristics and improve sample efficiency. In the transfer-learning experiments, the test held out 5 of the 11 parameters during initial optimization, then added them back to simulate new parameters being added to the test. This experiment was repeated for 100 random subsets of the parameters to account for the possibility that some parameters may have outsized influence on coverage. In some aspects, transfer learning is performed by performing regressions prior to the first regression. The prior regressions may use all parameters of the first regression or a subset. Transfer learning may include analyzing the results from the previous regression and using the analyzed results to generate test parameters for the first regression. Additionally, the analyzed results may be used in all subsequent regressions.
The tests also included an ablation study with same number of instructions: to make sure the learning is not limited to a trivial parameter like number of instructions, the test fixed the number of instructions in all experiments to the default value and re-ran the experiments with random seeds.
In another test, a control experiment was performed with a fixed seed to examine if the learning in SRP proceeds better when the randomness in feedback is eliminated. The test ran 100 nights with the same fixed seed for all algorithms, repeated the experiment 5 times with 5 different seeds, and reported the results shown in
In some aspects, method 1900 is performed by any of the systems discussed herein, such as SRP framework 200. At 1902, a system receives information related to at least one of a design under test, a parameterized test, a default test configuration, or a domain of parameters from a verification engineer. At 1904, the system performs a regression iteration based on the received information. In aspects, the regression iteration utilizes the default test configuration provided by the verification engineer. The results of the regression iteration are analyzed at 1906. For example, a Verilog simulator may be used to determine a point-in-time coverage percentage. At 1908, an optimized test configuration is generated based on the analysis of the results of the regression iteration. For example, the optimized test configuration may include an optimized value determined from the acquisition function for each of the parameters in the parameterized test. At 1910, the system performs a subsequent regression iteration based on the optimized test configuration. In some implementations, the method will begin again at 1906 using the optimized test configuration. For example, the method may continue until total coverage, a coverage threshold, or a predetermined number of iterations is reached.
The computing system 2000 includes one or more processors 2002 (e.g., any of microprocessors, microcontrollers, or other controllers) that can process various computer-executable instructions to control the operation of the computing system 2000 and to enable the methods discussed herein. Alternatively, or additionally, the computing system 2000 may be implemented with any one or combination of hardware, firmware, or fixed logic circuitry that is implemented in connection with processing and control circuits. Although not shown, the computing system 2000 may include a system bus or data transfer system that couples the various components within the device. A system bus may include any one or a combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures.
The computing system 2000 may include any number of computer-readable storage media 2004 (CRM). The computer-readable storage media 2004 may be implemented within the computing system 2000. Alternatively, the computer-readable storage media 2004 may be external but associated with the computing device 2000, for example, stored in a cloud or external hard drive. The computer-readable storage media 2004 may include volatile memory and non-volatile memory, which may include any suitable type, combination, or number of internal or external memory devices. Each memory of the computer-readable storage media 2004 may be implemented as an on-device memory of hardware or an off-device memory that communicates data with the computing system 2000 via a data interface or bus. In one example, volatile memory includes random access memory. Alternatively, or additionally, volatile memory may include other types of memory, such as static random access memory (SRAM), synchronous dynamic random access memory (DRAM), asynchronous DRAM, double-data-rate RAM (DDR), and the like. Further, non-volatile memory may include flash memory or read-only memory (ROM). Other non-volatile memory not shown may include non-volatile RAM (NVRAM), electronically-erasable programmable ROM, embedded multimedia card (eMMC) devices, single-level cell (SLC) flash memory, multi-level cell (MLC) flash memory, and the like.
The computing system 2000 may execute operating systems or applications from any suitable type of computer-readable storage media 2004 including volatile memory and non-volatile memory. Alternatively or additionally, operating systems or applications may be embodied as firmware or any other computer-readable instructions, binaries, or code. The computing system 2000 may include a user interface provided by operating systems or applications to allow specific functionality or services of the computing system 2000.
Computer-readable storage media 2004 may also include a testing component 2006 which may be implemented through machine-readable instructions executable by the processor(s) 2002. The testing component 2006 may include instructions to facilitate the testing operation of the DUT. For example, the testing component 2006 may include a Verilog simulator to analyze the results of the parameterized test of the DUT. Further, the testing component 2006 may be implemented through any combination of hardware, software, or firmware. In aspects, the testing component 2006 is implemented through a hardware description language (HDL). In other implementations, the testing component 2006 may be implemented on the hardware device itself as logic gates, one-time programmable (OTP) memory, fuses, and the like. The computer-readable storage media 2004 may additionally include a machine-learning component 2008 to determine optimized sets of parameters for the nightly regression of the DUT. The machine-learning component 2008 may store previous sets of parameters and the corresponding results of nightly regression using the previous sets of parameters. The machine-learning component 2008 may be configured to provide machine-readable instructions that, when executed by the processor(s) 2002, enable generation of the optimized set of parameters. Additionally, the machine-learning component 2008 may utilize transfer of learned heuristics from previous regression iterations (e.g., conducted prior to the first regression iteration) to better optimize subsequent regression iterations.
The computing system 2000 may also include I/O ports 2010. I/O ports 2010 may allow the computing system to interact with other devices or users. I/O ports 2010 may include any combination of internal or external ports, such as USB ports, audio ports, Serial ATA (SATA) ports, PCI-express based ports or card-slots, secure digital input/output (SDIO) slots, and/or other legacy ports. Various peripherals may be operatively coupled with I/O ports 2010, such as human-input devices (HIDs), external computer-readable storage media, or other peripherals. For example, the I/O ports 2010 may be utilized to receive manually created inputs such as information related to at least one of a design under test, a parameterized test, a default test configuration, or a domain of parameters. In an aspect, the I/O ports 2010 may input or output information through a wired or wireless connection.
Examples of automatic test parameter tuning in constrained random verification is provided below, including those implemented as a computer-readable storage medium or performed by a system containing one or more processor:
Although aspects of the described systems and methods for implementing automatic test parameter tuning in constrained random verification have been described in language specific to features and/or methods, the subject of the appended claims is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as example implementations of automatic test parameter tuning in constrained random verification, and other equivalent features and methods are intended to be within the scope of the appended claims. Further, various aspects of automatic test parameter tuning in constrained random verification are described, and it is to be appreciated that each described aspect can be implemented independently or in connection with one or more other described aspects.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/058302 | 11/5/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63111995 | Nov 2020 | US |