The present disclosure relates generally to the field of electronic device testing systems and more specifically to the field of electronic device testing equipment for testing devices under test (DUTs).
Automated test equipment (ATE) can be any testing assembly that performs a test on a semiconductor device or electronic assembly. ATE assemblies may be used to execute automated tests that quickly perform measurements and generate test results that can then be analyzed. An ATE assembly may be anything from a computer system coupled to a meter, to a complicated automated test assembly that may include a custom, dedicated computer control system and many different test instruments that are capable of automatically testing electronics parts and/or semiconductor wafer testing, such as system-on-chip (SOC) testing or integrated circuit testing. ATE systems both reduce the amount of time spent on testing devices to ensure that the device functions as designed and serve as a diagnostic tool to determine the presence of faulty components within a given device before it reaches the consumer.
One of the drawbacks with conventional ATE is that they typically only report pass/fail results. In other words, the ATE only reports whether one or more devices under test (DUTs) passed or failed the respective test being executed. The ATE is not configured to identify root causes of device failure that occur during qualification testing. In a typical testing environment, the technicians operating the ATE will need to identify the root cause of failure manually by collecting data logs and performing analysis on the logs manually. This approach is labor intensive, error prone and not scalable. It may also not yield the desired result since there may not be enough information available to the technicians to determine which data logs to analyze or how to find the root causes of device failure within the data logs.
Accordingly, a need exists for an ATE that automatically parses through detailed logs generated by the ATE during testing and provide relevant information to the user. Further, a need exists for a log post-processor tool that can sift through extensive log information and, based on information regarding the methodology by which the logs are generated, can extract meaningful information regarding root causes of device failure from within the logs.
In one embodiment, a method for diagnosing a root cause of failure using automated test equipment (ATE) is disclosed. The method comprises identifying a failing device under test (DUT). Further, the method comprises opening a test program log associated with the failing DUT and determining a time of failure by parsing through the test program log to find an identifier and timestamp associated with the failure. Finally, the method comprises displaying the test program log in a window within a graphical user interface, wherein a relevant section of the test program log associated with the failure is displayed in the window.
In one embodiment, a computer-readable storage medium having stored thereon, computer executable instructions that, if executed by a computer system cause the computer system to perform a method for diagnosing a root cause of failure using automated test equipment (ATE) is disclosed. The method comprises highlighting a failing device under test (DUT) and opening a test program log associated with the failing DUT in response to executing a script associated with a log post-processor. Further, the method comprises determining a time of failure by parsing through the test program log to locate an identifier and timestamp associated with the failure and displaying the test program log in a window within a graphical user interface, wherein a relevant section of the test program log associated with the failure is displayed in the window.
In another embodiment, a system for performing a method for diagnosing a root cause of failure using automated test equipment (ATE) is disclosed. The system comprises a memory comprising a test program and a log post-processor script stored on a tester operating system, a communicative interface operable to connect to one or more devices under test (DUTs) and a processor coupled to the memory and the communicative interface. The processor is configured to operate in accordance with the log post-processor script to: (a) execute the test program; (b) identify a failing device under test (DUT), wherein the failing DUT produces an error condition in response to executing the test program; (c) open a test program log associated with the failing DUT in response to executing the log post-processor script; (d) determine a time of failure by parsing through the test program log to find an identifier and timestamp associated with the failure; and (e) display the test program log in a window within a graphical user interface, wherein a relevant section of the test program log associated with the failure is displayed in the window.
The following detailed description together with the accompanying drawings will provide a better understanding of the nature and advantages of the present invention.
Embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
In the figures, elements having the same designation have the same or similar function.
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. While the embodiments will be described in conjunction with the drawings, it will be understood that they are not intended to limit the embodiments. On the contrary, the embodiments are intended to cover alternatives, modifications and equivalents. Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding. However, it will be recognized by one of ordinary skill in the art that the embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments.
A Log Post-Processor for Identifying Root Causes of Device Failure During Automated Testing
In conventional testers, the diagnostic process in case of device failure is time-consuming and tedious because there are several steps an expert or technician needs to take in order to interpret the test logs generated by a tester to find the root cause of failure. Additionally, in some cases, conventional testers do not provide detailed test results at all. Instead, they simply provide pass/fail results. Moreover, because all testing protocols are unique, a technician or test engineer may not have the necessary information to sift through the various test logs to determine the root cause of failure.
Test throughput can usually be improved in a number of ways. One way of improving test throughput is by providing a tool that will automatically parse through detailed logs generated by a tester during testing and provide relevant information to the user. Further, test throughput can be improved by providing a log post-processor tool within the tester that can sift through extensive log information and, based on information regarding the methodology by which the logs are generated, can extract meaningful information regarding root causes of device failure from within the logs in real-time. The log post-processor of the present invention advantageously conserves time and labor resources during testing and, further, can be scaled to analyze results from one or more testers simultaneously. Instead of a human analyzing logs manually, embodiments of the present invention analyze the debug data in real-time, identify suspicious conditions and flag potential error-related conditions for a human or software to examine.
Embodiments of the present invention automatically identify root causes of failures during device testing by sifting through all the data generated by a tester to first identify test logs that relate to the failure. Most testers will generate numerous logs, e.g., data logs, snap logs, data capture logs, etc. in the course of testing, not all of which will be pertinent to debugging. The log post-processor will, therefore, be configured to identify the relevant test logs. For example, the identification may be based on naming conventions, e.g., the log post-processor may identify certain file names most commonly associated with critical data related to device failures.
Subsequently, the log post-processor of the present invention identifies areas of interest within the test logs that are generated. In other words, the log post-processor is configured to identify areas within the test logs that are most likely to contain information regarding the root cause of device failure.
Embodiments of the present invention are further configured to translate and correlate timestamps generated by the logs and further translate and correlate identifiers generated by the logs. For example, the log post-processor may be configured to extract timestamps associated with all error message identifiers from within the logs.
Additionally, embodiments of the present invention can be configured to perform rules checking using the data generated by the tester and adding any information collected from the test logs into a knowledge database. For example, the log post-processor may be configured to inspect a log generated by a DUT and perform rule checking on the test data generated by the DUT. The rules can be predetermined and programmed into the log post-processor script. Any new information obtained from the testing and rule checking can subsequently be added to a knowledge database.
Accordingly, embodiments of the present invention advantageously save time from performing test operations manually. Ordinarily, it would take a technician or test engineer several hours to diagnose failures by manually analyzing data logs generated by the testing. The log post-processor of the present invention can extract and provide all the information a technician would need to identify the failures within seconds or minutes of the test completing.
If the devices under test, for example, are PCIe devices, the technician may then get the transaction layer packet (TLP) capture time from either the snap log or the test program log as shown in
As seen in
In a typical embodiment, the log post-processor will provide details regarding one failing DUT at a time. In other words, the log post-processor will be executed separately for each instance of a failing DUT. However, it should be noted that in other embodiments the log processor may be configured to provide details regarding several DUTs at a time as well.
In a typical embodiment, the tester software responsible for interacting with and testing the DUTs would automatically execute the script or program associated with the log post-processor after the DUTs have been tested. Before executing the log post-processor however, the tester software will first wait for the test program to finish testing all the DUTs and for all the various log files generated by the DUTs during the testing process to be available. For each failing DUT, the log post processor will execute and automatically scan the various logs associated with the failing DUT (or DUTs) and determine the locations in each of the log files that contain information relevant to the failure. In order to determine the locations in each of the log files with relevant failure related information the log post-processor can, for example, be programmed to perform a keyword search of the log files. Note that each DUT being tested will generate various log files. Certain log files will be particular to a given DUT while other log files will contain information from several DUTs in the same log file.
The log post-processor will generate a batch file that can be executed to bring up windows displaying the various relevant log files, where each log file will have sections highlighted pertaining to the failing DUT. Note that the log post-processor will typically be configured to generate a separate batch file for each of the failing DUTs. The batch file may, for example, contain commands to a particular software, e.g., notepad++ (as shown in window 290 in
For example, if the tester software determines that there are 20 failing DUTs, the log post-processor, in one embodiment, will be executed separately for each failing DUT automatically by the tester software. The log post-processor will then parse through the various log files, determine the locations of the pertinent information, and generate a separate batch file for each of the 20 failing DUTs. In other words, the log post-processor will generate 20 separate batch files. The user can then execute each of the batch files separately to obtain diagnostic information for each of the failing DUTs. For example, when the user runs the batch file 290 shown in
In one embodiment, instead of the tester software executing the log post processor automatically, a user can execute the log post-processor manually, e.g., by running the batch file (or shell script) associated with the log post processor from a command line. In one embodiment, the log post-processor may be executed using a batch file or shell script so that the program can be executed with options, e.g., parameters specifying file names, file locations etc. However, in one embodiment, the log post-processor can also be executed directly on the command line without using a batch file or script.
As shown in
Further, the log post processor (subsequent to the execution of the batch file) is able to automatically bring up the snap log with the logical to physical mapping of the DUT in window 211 (as discussed in connection with
As mentioned above, in one embodiment, the log post-processor can, upon execution, automatically create a batch file (or shell script) that executes to bring up logs in viewers to lines of interest as shown in
As mentioned above, the log post-processor can be programmed to execute directly from the tester software or from the command line manually. Further, in one embodiment, the log post-processor itself can be prepared using a scripting language and be executed by running an associated script (e.g., a Unix shell script) using a command line interface. In one embodiment, the script associated with the log post-processor can be prepared using the Python language for instance. In other embodiments, the script can be written using any other scripting language, e.g., Perl, Ruby etc. The Perl, Python or Ruby scripts. The log post-processor program may also be developed with a language such as C, C++ etc. As mentioned above, the script for the log post-processor can be run as part of the batch file (for MS-DOS) or shell script (for Unix/Linux systems). In one embodiment, the file paths, directories and filenames that log post-processor searches to look for relevant logs can be programmed into the log post-processor script using regular expressions. A regular expression is a sequence of characters that define a search pattern.
The shell script associated with executing the log post-processor can be configured by first copying the script to a desired work folder. Subsequently, a technician or test engineer would need to edit a batch file or shell script associated with the execution of the log post-processor. Thereafter, the technician would run the shell script from a command window or terminal. In one embodiment, the log post-processor may take several parameters as input and, therefore, a batch file (or shell script) is convenient because it allows the user to input several parameters at the command line.
In one embodiment, the log post-processor can be programmed to collect all the information produced by running the shell script and interprets the information to determine the root cause of failure. In other words, instead of leaving it to the user to review all the information from the various logs to determine the root cause of failure manually, the log post-processor can be configured to collect and synthesize the information automatically and provide the user with a prediction as to the root cause of failure. In one embodiment, the log post-processor comprises a rule-checker that can parse through all the failure related information to identify some possible causes of the failure. In this embodiment, the user would still be allowed the option to view all the log files and review the log files manually to get further details regarding the problems. In one embodiment, the log post-processor is configured to display a summary of the test results in an on-screen display for the user to view.
In one embodiment, the technician would simply need to type the batch file name (associated with executing the log post-processor) at the MS-DOS (or Linux) command prompt in order for the log post-processor to run. Executing the log post-processor may then generate another batch file that is associated with bringing up the logs in viewers with relevant sections highlighted. In one embodiment, the batch file generated by the log post-processor can be configured to execute automatically once the log post-processor is done parsing through all the log files. In a different embodiment, the user can run this batch file generated by the log post-processor from the command line interface.
In one embodiment, the log post-processor can comprise a GUI intermediary between the user and the underlying script.
The TLP directory points to where the capture logs are saved for the DUTs. In other words, the transaction layer packets pertaining to the protocol, e.g., PCIe are stored in the TLP directory. As mentioned above, the PCIe protocol communications using transaction layer packets. Further, the PCIe protocol may be implemented using an FPGA with a state machine that executes the protocol. The FPGAs can capture information during the protocol execution, including the TLPs, which can be used by a technician to figure out any problems associated with the test. This information is typically contained in a TLP capture log in the TLP directory. As mentioned above, the TLP log may contain state-machine related packets, e.g., LTSSM packets for PCIe or Equalization information. The TLP log can be inspected for example to determine if the state machine associated with the PCIe protocol is functioning correctly. For example, a technician would be able to review the TP log to determine if any particular state is out of order. Further, in one embodiment, during debugging, a technician may be able to inject errors intentionally into the TLP packets to cause failures to determine if the errors are captured by the FPGA and flagged correctly during post-processing.
It should be noted that the invention disclosed herein is not limited to simply capturing TLPs, LTSSM packets or equalization information. There may be many different types of information that are captured by the TLP and other logs that may be relevant to a technician.
The snap directory 506 points to where various software-level snap logs are saved. For example, a snap log would contain information regarding a logical to physical mapping of the DUT.
Finally, the syslog path 508 points to the location the system log is saved, e.g., a Linux system log. The syslog will typically contain more details or log information pertaining to the software that is controlling the hardware testing. For example, if a test is accidentally started while the DUT is missing, the software will need to be programmed to recognize that a device is missing and the manner in which to handle that exception. The syslog will typically contain a detailed trace of the test execution, including a software level trace.
At step 902, the tester software identifies the failing DUT and automatically executes the log post-processor. In one embodiment, the technician may have to manually identify the failing DUT and provide it as an input to the log post-processor using a command line interface prior to execution.
At step 904, the log post-processor, when executed, opens a test program (TP) log for the identified failing DUT.
At step 906, the log post-processor is programmed to go to the failure point in the TP log to determine the time of failure. The failure point may be identified on the basis of certain identifiers in the log that signal failure, e.g., a “FAILURE” message accompanied with a timestamp indicating the time of failure.
At step 908, the log post-processor can be configured to open the snap log. From the snap log, the log post-processor can determine the logical to physical mapping of the DUT (also known as a device map) at step 910. At step 912, the log post-processor can be configured to go to the time of failure in the snap log using the time identified from the test program log and analyze the snap log around the time of failure for possible causes of failure. In one embodiment, the log post-processor may use a rule-checker to analyze the snap log to determine possible root causes of failure.
At step 914, the log post-processor can be programmed to get the transaction layer packet (TLP) capture time from either the snap log or the test program log. For example, transaction layer packets are exchanged between a host and a client (or between a tester and a device under test) using the PCIe protocol and the tester may capture these TLPs for further inspection and to collect failure related information. Information related to the TLPs may be collected in a TLP log, for example. At step 916, once the TLP capture time is obtained from either the snap log or the test program log, the log post-processor would open the pertinent TLP log (based on the time of failure and the DUT). At step 918, the log post-processor can be configured to analyze the TLP log to ascertain a root cause of failure, e.g., the log post-processor may use a rule checker to determine the cause of failure.
At step 920, the log post-processor can be programmed to generate a batch file, which is configured to open various windows displaying the log files with relevant sections highlighted. This batch file can either be programmed to execute automatically after the log post-processor has finishing executing or can be executed manually by the user. For example, executing the batch file shown in window 290 in
For example, the batch file generated at step 920 can be executed to open a window 212 showing the snap log around the time failure for the technician to be able to inspect the snap log around the time of failure for further clues related to the failure. In one embodiment, the log post-processor can highlight the relevant lines in the snap log to clearly indicate which lines in the snap log need to be inspected. In a different embodiment, an indicated above, the log post-processor can be programmed to automatically parse through the relevant lines in the log file and identify a possible cause of failure to the technician.
Further, by way of example, executing the batch file can also bring up the pertinent TLP log in window 213 for the technician to inspect the captured TLP in the TLP log. In a different embodiment, the log post-processor automatically parses through the relevant lines in the TLP log and identifies a possible cause of failure to the technician. Executing the batch file can also pop open a window with the TP log (e.g. window 210) for a user to examine the error related identifiers.
At step 922, the log post-processor is configured to generate summary results for all the failure-related information and display the results in an on-screen display for the user to view.
Processor 1114 generally represents any type or form of processing unit capable of processing data or interpreting and executing instructions. In certain embodiments, processor 1114 may receive instructions from a software application or module. These instructions may cause processor 1114 to perform the functions of one or more of the example embodiments described and/or illustrated herein.
System memory 1116 generally represents any type or form of volatile or non-volatile storage device or medium capable of storing data and/or other computer-readable instructions. Examples of system memory 1116 include, without limitation, RAM, ROM, flash memory, or any other suitable memory device. Although not required, in certain embodiments control system 1110 may include both a volatile memory unit (such as, for example, system memory 1116) and a non-volatile storage device (such as, for example, primary storage device 1132).
Tester control system 1110 may also include one or more components or elements in addition to processor 1114 and system memory 1116. For example, in the embodiment of
Memory controller 1118 generally represents any type or form of device capable of handling memory or data or controlling communication between one or more components of control system 1110. For example, memory controller 1118 may control communication between processor 1114, system memory 1116, and I/O controller 1120 via communication infrastructure 1112.
I/O controller 1120 generally represents any type or form of module capable of coordinating and/or controlling the input and output functions of a computing device. For example, I/O controller 1120 may control or facilitate transfer of data between one or more elements of control system 1110, such as processor 1114, system memory 1116, communication interface 1122, display adapter 1126, input interface 1130, and storage interface 1134.
Communication interface 1122 broadly represents any type or form of communication device or adapter capable of facilitating communication between example control system 1110 and one or more additional devices. For example, communication interface 1122 may facilitate communication between control system 1110 and a private or public network including additional control systems. Examples of communication interface 1122 include, without limitation, a wired network interface (such as a network interface card), a wireless network interface (such as a wireless network interface card), a modem, and any other suitable interface. In one embodiment, communication interface 1122 provides a direct connection to a remote server via a direct link to a network, such as the Internet. Communication interface 1122 may also indirectly provide such a connection through any other suitable connection.
Communication interface 1122 may also represent a host adapter configured to facilitate communication between control system 1110 and one or more additional network or storage devices via an external bus or communications channel. Examples of host adapters include, without limitation, Small Computer System Interface (SCSI) host adapters, Universal Serial Bus (USB) host adapters, IEEE (Institute of Electrical and Electronics Engineers) 1394 host adapters, Serial Advanced Technology Attachment (SATA) and External SATA (eSATA) host adapters, Advanced Technology Attachment (ATA) and Parallel ATA (PATA) host adapters, Fibre Channel interface adapters, Ethernet adapters, or the like. Communication interface 1122 may also allow control system 1110 to engage in distributed or remote computing. For example, communication interface 1122 may receive instructions from a remote device or send instructions to a remote device for execution.
As illustrated in
As illustrated in
As illustrated in
In one example, databases 1140 may be stored in primary storage device 1132. Databases 1140 may represent portions of a single database or computing device or it may represent multiple databases or computing devices. For example, databases 1140 may represent (be stored on) a portion of control system 1110 or on connected network devices. Alternatively, databases 1140 may represent (be stored on) one or more physically separate devices capable of being accessed by a computing device, such as control system 1110 and/or portions of network architecture.
Continuing with reference to
Many other devices or subsystems may be connected to control system 1110. Conversely, all of the components and devices illustrated in
The computer-readable medium containing the computer program may be loaded into control system 1110. All or a portion of the computer program stored on the computer-readable medium may then be stored in system memory 1116 and/or various portions of storage devices 1132 and 1133. When executed by processor 1114, a computer program loaded into control system 1110 may cause processor 1114 to perform and/or be a means for performing the functions of the example embodiments described and/or illustrated herein. Additionally or alternatively, the example embodiments described and/or illustrated herein may be implemented in firmware and/or hardware.
The communicator bus 1215 provides a high-speed electronic communication channel between the system controller and the tester hardware. The communicator bus can also be referred to as a backplane, a module connection enabler, or system bus. Physically, communicator bus 1215 is a fast, high-bandwidth duplex connection bus that can be electrical, optical, etc. System controller 1201 sets up the conditions for testing the DUTs 1211-1214 by programming the tester hardware through commands sent over the communicator bus 1215.
Tester hardware 1202 comprises the complex set of electronic and electrical parts and connectors necessary to provide the test stimulus to the devices under test (DUTs) 1211-1214 and measure the response of the DUTs to the stimulus, and compare it against the expected response.
A test program or test plan comprises all user-defined data and control flows that are necessary to perform a semiconductor device test on an ATE system. It typically runs on the system controller 1201. The main control flow in a test program, which dictates the sequence of individual tests to be applied to the DUTs, and the order in which the tests will be applied (which is dependent on the results of individual tests), is referred to as the test program flow.
At block 1210, the DUTs generate several logs containing, among other things, results of the testing. At block 1230, the log post-processor can be executed, manually using a shell script or automatically from the tester software. The log post-processor parses through the various logs and determines the locations of interest in the various log files.
The log file generates a batch file at block 1240. The batch file can be executed at block 1250 to display the various log files with their relevant sections pertaining to the failure highlighted on-screen for the user. Subsequently, the logs are presented on screen for the user to view and inspect at block 1260. Also, the log post-processor can generate a summary of the results on the screen for a user to ascertain the root causes of failure.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as may be suited to the particular use contemplated.