This disclosure relates to integrated circuitry to efficiently generate software-defined test streams to use on a device under test.
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it may be understood that these statements are to be read in this light, and not as admissions of prior art.
Integrated circuits are found in numerous electronic devices and provide a variety of functionality. Before they may be operated, many integrated circuits undergo a variety of tests. These include tests while the integrated circuit is being designed, after the integrated circuit has been manufactured, or after the integrated circuit is in use in a product. Depending on the functionality provided by the integrated circuit, different tests may be performed. As the bandwidth or throughput supported by many integrated circuits has grown, generating test signals to sufficiently test these integrated circuits may be untenable using existing solutions. Moreover, as new vulnerabilities of integrated circuit devices are discovered, such as row-hammer vulnerabilities, new test signals may be desired to more fully test the integrated circuit.
Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which:
One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
When introducing elements of various embodiments of the present disclosure, the articles “a,” “an,” and “the” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. Additionally, it should be understood that references to “one embodiment” or “an embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features.
To test a variety of different integrated circuit devices under test (DUT), a testbench processor is provided. The testbench processor may allow a software programmer to generate code that can run on the testbench processor to test a variety of different conditions on an integrated circuit device under test. For example, the testbench processor may be used to generate a variety of traffic scenarios, including sequential patterns, random patterns, pseudo-random patterns, row-hammer address patterns, or new test patterns that may be of interest in the future. The testbench processor may generate read-heavy traffic interleaved with writes, generate different burst lengths (e.g., alternate between two burst lengths). Moreover, because the testbench processor may avoid using nested finite state machines, intertwined control signals may be avoided and the testbench processor may be relatively easily extended by writing new code to run on the testbench processor. The testbench processor may be used to test a variety of different integrated circuit devices under test—including different FPGA system designs even on the same integrated circuit device as the testbench processor—without additional, sometimes complex tasks such as customizing register transfer language (RTL) of test logic circuitry for different devices under test. In this way, testbench processor may be used to test a wide variety of integrated circuit devices, including but not limited to DDR3/4 memory, QDR-IV memory, DDR-T memory, and high-bandwidth memory (HBM).
To do so, the testbench processor may include a traffic generator that stitched using pipelined RTL blocks designed to be latency-insensitive and generate a high (e.g., 90%, 95%, 100%) throughput stream. The RTL blocks include control units to fetch and stream instructions, and ALU generators to stream complex patterns. Various drivers are then integrated along with clocking and reset circuitry and remote access paths (e.g., Joint Test Action Group (JTAG) access paths) to build the testbench processor. Because the testbench processor may be instantiated in programmable logic circuitry (e.g., FPGA circuitry), the testbench processor may be described as a software-driven synthesizable testbench. The reusable RTL blocks allow easily building hardware testbenches for new IPs, which is software-driven and may have a comparatively high maximum frequency (Fmax). Because the testbench processor is software-driven, the traffic pattern generated by the testbench processor may be customizable via application programming interfaces of suitable programming languages (e.g., Python APIs).
With the foregoing in mind,
In a configuration mode of the integrated circuit 12, a designer may use an electronic device 13 (e.g., a computer) to implement high-level designs (e.g., a system user design) using design software 14, such as a version of INTEL® QUARTUS® by INTEL CORPORATION. The electronic device 13 may use the design software 14 and a compiler 16 to convert the high-level program into a lower-level description (e.g., a configuration program, a bitstream). The compiler 16 may provide machine-readable instructions representative of the high-level program to a host 18 and the integrated circuit 12. The host 18 may receive a host program 22 that may be implemented by the kernel programs 20. To implement the host program 22, the host 18 may communicate instructions from the host program 22 to the integrated circuit 12 via a communications link 24 that may be, for example, direct memory access (DMA) communications or peripheral component interconnect express (PCIe) communications. In some embodiments, the kernel programs 20 and the host 18 may enable configuration of programmable logic 26 on the integrated circuit 12. The programmable logic 26 may include circuitry and/or other logic elements and may be configurable to implement arithmetic operations, such as addition and multiplication.
The designer may use the design software 14 to generate and/or to specify a low-level program, such as the low-level hardware description languages described above. Further, in some embodiments, the system 10 may be implemented without a separate host program 22. Thus, embodiments described herein are intended to be illustrative and not limiting.
Turning now to a more detailed discussion of the integrated circuit 12,
Programmable logic devices, such as the integrated circuit 12, may include programmable elements 50 with the programmable logic 26. For example, as discussed above, a designer (e.g., a customer) may program (e.g., configure) or reprogram (e.g., reconfigure, partially reconfigure) the programmable logic 26 to perform one or more desired functions. By way of example, some programmable logic devices may be programmed or reprogrammed by configuring programmable elements 50 using mask programming arrangements that is performed during semiconductor manufacturing. Other programmable logic devices are configurable after semiconductor fabrication operations have been completed, such as by using electrical programming or laser programming to program programmable elements 50. In general, programmable elements 50 may be based on any suitable programmable technology, such as fuses, antifuses, electrically programmable read-only-memory technology, random-access memory cells, mask-programmed elements, and so forth.
Many programmable logic devices are electrically programmed. With electrical programming arrangements, the programmable elements 50 may be formed from one or more memory cells. For example, during programming (i.e., configuration), configuration data is loaded into the memory cells using input/output pins 44 and input/output circuitry 42. In one embodiment, the memory cells may be implemented as random-access-memory (RAM) cells. The use of memory cells based on RAM technology is described herein is intended to be only one example. Further, since these RAM cells are loaded with configuration data during programming, they are sometimes referred to as configuration RAM cells (CRAM). These memory cells may each provide a corresponding static control output signal that controls the state of an associated logic component in programmable logic 26. For instance, in some embodiments, the output signals may be applied to the gates of metal-oxide-semiconductor (MOS) transistors within the programmable logic 26.
Keeping the discussion of
The device under test (DUT) 84 may be any suitable integrated circuit device or system. For example, the device under test (DUT) 84 may be a memory device (e.g., DDR3/4 memory, QDR-IV memory, DDR-T memory, high-bandwidth memory (HBM)), networking circuitry, or a processor, to provide a few examples. The device under test (DUT) 84 may be a circuit component on the same die as the integrated circuit 12, a different die in the same package, or a different package. Indeed, additionally or alternatively, the device under test (DUT) 84 may be circuitry of a system design configured into the programmable logic circuitry 26 or another component of the integrated circuit 12 (e.g., memory of the integrated circuit 12).
The testbench processor 82 may be programmed by a developer to carry out a variety of test patterns. For example, the testbench processor 82 may be used to send test signals on an Advanced eXtensible Interface (AXI) bus or peripheral component interconnect express (PCIe) bus. The AXI bus protocol allows multiple traffic streams interleaved on a single bus. Each traffic stream is tagged with a unique ID, allowing the DUT to return responses out-of-order across the traffic streams. A current version of AXI allows 512 IDs, so the testbench processor 82 may be used to generate 512 unique traffic streams. Future versions of AXI or other protocols may generate even more unique traffic streams. One approach, visualized in a software construct of a testbench processor 82A shown in
If the software construct of the testbench processor 82A shown in
The device under test (DUT) (not shown in
The testbench processor 82 may avoid significant overhead using an instruction set architecture (ISA) referred to as Distributed Instruction Graph (DIG). The DIG ISA allows for very efficient instruction memory usage by distributing instructions across several different memories. These memories are shown in
The DIG ISA is helpful because a software traffic pattern can define a single traffic stream (for a single ID) or 512 traffic streams (for 512 IDs). Using a very long instruction word (VLIW) ISA instead of DIG would involve allocating for the largest instruction size of ˜100,000 bits, which would consume 3,125 32-bit wide 20 kB memories, which is infeasible. Distributed Instruction Graph (DIG) may be used to split the variable-length portions of the instruction into separate RAMs and split the time-multiplexed portions across separate rows. While this introduces an overhead to store pointers, RAM usage is proportional to code complexity (˜200 bits per traffic stream which consumes 6 32-bit wide 20 kB memories). RAM usage can further be reduced by removing replicas and editing pointers.
Any suitable programming language may be used to define an AXI traffic pattern. One example of code to define an AXI traffic pattern as instructions 108 in the Python API is provided in
Where “Op” corresponds to any suitable operation (e.g., read, write), “issue” corresponds to the order of worker IDs that are to execute ALU instructions, and “worker” corresponds to the instructions to be carried out by different worker IDs. Metadata “A” and “B” correspond to any suitable metadata relating to the operation, and metadata “U”, “V”, “X”, “Y”, “W”, and “Z” correspond to any suitable metadata relating to executing a particular worker instruction. The ALU instructions “I”, “J”, and “K” correspond to any suitable ALU instructions that may be executed by the generator circuitry 128. From a different perspective, equivalent RISC pseudocode for an AXI traffic generator may take the format:
In essence, the DIG ISA implements a fixed loop structure without the use of branch instructions. In the example of
For responses, the compiler may distribute aspects of the instructions based on worker ID rather than operation order, since responses from a device under test (DUT) to the test signals may return in a different order than the test signals. Thus, in the example of
The high efficiency of Distributed Instruction Graph (DIG) thus may allow 100% throughput instruction stream, and consequently a 100% throughput data stream for AXI protocol test streams. This is an ideal property for test traffic generation. The testbench processor 82 of this disclosure may also be used to test non-memory protocols such as content-addressable memory (CAM), PHY-Lite, and Mobile Industry Processor Interface (MIPI). Additionally, the remote access path may leverage PCIe or JTAG for faster access speeds. Moreover, the software may execute on-chip either in a hardened processor system (HPS) processor or in a soft processor (e.g., a NIOS processor configured onto an FPGA), rather than remotely on a host PC.
The integrated circuit system 12 may be a component included in a data processing system, such as a data processing system 500, shown in
The data processing system 500 may be part of a data center that processes a variety of different requests. For instance, the data processing system 500 may receive a data processing request via the network interface 506 to perform encryption, decryption, machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, digital signal processing, or other specialized tasks.
While the embodiments set forth in the present disclosure may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. However, it should be understood that the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims.
The techniques presented and claimed herein are referenced and applied to material objects and concrete examples of a practical nature that demonstrably improve the present technical field and, as such, are not abstract, intangible or purely theoretical. Further, if any claims appended to the end of this specification contain one or more elements designated as “means for [perform]ing [a function] . . . ” or “step for [perform]ing [a function] . . . ”, it is intended that such elements are to be interpreted under 35 U.S.C. 112(f). However, for any claims containing elements designated in any other manner, it is intended that such elements are not to be interpreted under 35 U.S.C. 112(f).
An integrated circuit device comprising:
memory comprising instructions to generate a plurality of test streams to send to a device under test; and
a testbench processor to generate the plurality of test streams based on the instructions using thread execution circuitry that switches context based on context identifiers corresponding to respective test streams of the plurality of test streams.
The integrated circuit device of example embodiment 1, wherein the memory comprises a plurality of memories over which components of the instructions are distributed to implement a fixed loop structure without branch instructions.
The integrated circuit device of example embodiment 1, wherein the memory comprises at least three memories over which components of the instructions are distributed.
The integrated circuit device of example embodiment 1, wherein the memory comprises:
an arithmetic logic unit (ALU) instruction memory to store entries comprising ALU instructions of operations to be executed in one or more ALUs of the testbench processor;
a worker instruction memory to store entries per context identifier and operation and pointers to the ALU instruction memory;
an issue instruction memory to store entries per issue order of context identifiers and pointers to the worker instruction memory; and
a main instruction memory to store entries per operation comprising pointers to the issue instruction memory.
The integrated circuit device of example embodiment 4, wherein respective entries of the main instruction memory comprise:
an indication of a number of times to repeat the operation corresponding to that entry of the main instruction memory; and
metadata corresponding to the operation corresponding to that entry of the main instruction memory.
The integrated circuit device of example embodiment 1, wherein the testbench processor comprises a single instance of the thread execution circuitry.
The integrated circuit device of example embodiment 1, wherein the testbench processor comprises a command region to generate the plurality of test streams and a response region to analyze responses from the device under test in response to the test streams.
The integrated circuit device of example embodiment 7, wherein the response region is to analyze responses by context.
The integrated circuit of example embodiment 1, wherein the plurality of test streams are to send to the device under test via an Advanced eXtensible Interface (AXI) bus.
The integrated circuit of example embodiment 1, wherein the testbench processor is formed at least in part in field programmable gate array (FPGA) programmable logic circuitry.
A system comprising:
a device under test to receive a plurality of test streams associated with respective identifiers and respond with respective response signals associated with the respective identifiers; and
an integrated circuit to generate the plurality of test streams using an instance of thread execution circuitry that switches context based on the identifier.
The system of example embodiment 11, wherein the plurality of test streams conform to the Advanced eXtensible Interface (AXI) protocol.
The system of example embodiment 11, wherein the plurality of test streams conform to a non-memory protocol.
The system of example embodiment 11, wherein the integrated circuit comprises a plurality of disjunct memories storing instructions to cause the integrated circuit to generate the plurality of test streams, wherein the instructions are distributed over the plurality of memories to implement a fixed loop structure without branch instructions.
The system of example embodiment 11, wherein the plurality of disjunct memories comprise:
an arithmetic logic unit (ALU) instruction memory to store entries comprising ALU instructions of operations to be executed in one or more ALUs of the integrated circuit;
a worker instruction memory to store entries per context identifier and operation and pointers to the ALU instruction memory;
an issue instruction memory to store entries per issue order of context identifiers and pointers to the worker instruction memory; and
a main instruction memory to store entries per operation comprising pointers to the issue instruction memory.
The system of example embodiment 11, wherein the integrated circuit comprises a command region to generate the plurality of test streams and a response region to analyze responses from the device under test in response to the test streams.
The system of example embodiment 16, wherein the response region is to analyze responses by context.
The system of example embodiment 11, wherein the device under test comprises DDR3 memory, DDR4 memory, QDR-IV memory, DDR-T memory, or high-bandwidth memory (HBM).
Thread execution circuitry comprising:
command control circuitry to issue command instructions associated with a context identifier;
first generator circuitry to generate a dynamic component of a test stream associated with the context identifier based on the command instructions;
output circuitry to send the test stream to a device under test;
input circuitry to receive a response from the device under test based on the test stream, wherein the device under test comprises the context identifier;
recovery control circuitry to issue recovery instructions based on the context identifier of the response;
second generator circuitry to generate an expected response from the device under test based on the test stream;
analysis circuitry to compare the response from the device under test to the expected response.
The thread execution circuitry of example embodiment 19, wherein the command control circuitry comprises static data or metadata and wherein the static data or metadata is combined with the dynamic component of the test stream before the test stream is sent to the device under test.
The circuitry of example embodiment 19, wherein the circuitry is formed at least in part in field programmable gate array (FPGA) programmable logic circuitry.
This application claims priority to U.S. Provisional Application No. 63/409,648 filed Sep. 23, 2022, entitled “Software-Defined Synthesizable Testbench,” which is incorporated herein by reference in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
63409648 | Sep 2022 | US |