The present description relates to software testing and, more specifically, to a method of injecting faults in order to test error responses.
Despite the explosive growth in code complexity, modern software is actually more reliable than its predecessors. This is due in no small part to advances in testability and fault-recovery. Improved testbeds allow designers to investigate software responses to a wide array of error conditions including system panics, hangs, deadlocks, and livelocks. By studying these faults, designers are able to construct graceful responses to common and uncommon errors.
Fault injection is one technique commonly used to verify software. Fault injection involves modifying code behavior in order to produce a deliberate error. For example, a fault injection routine may modify a function to read an undefined variable or modify the function to perform an incorrect mathematical operation. The effects of the resulting error can then be studied as it propagates throughout the code.
Fault injection can be thorough, but it is not without drawbacks. For example, conventional fault injection is highly iterative. To achieve comprehensive coverage, code must be run and rerun until each statement is executed and each branch is traversed. Conventional fault injection can also be labor-intensive. Often, inserting injection points to verify each statement and branch is a manual process requiring a programmer to write a substantial amount of fault-injection code. This runs the risk of introducing errors if the fault-injection code is removed and dramatically increasing program size if it is left in place. While compile-time injection can be used to keep fault-injection instructions out of the release code, it may require the code to be recompiled every time a new fault is tested. Hours of compile time per test for a hundred thousand tests is often unacceptable. Run-time injection can be used to avoid recompiling, but often requires the fault-injection code to be permanently added to the functional code.
Conventional fault injection has been generally successful. However, for these reasons and others, conventional methods have inefficiencies that have become increasingly significant in light of the increasing number of code paths. Accordingly, a need exists for a streamlined testing process that delivers increased coverage with fewer iterations and less manual intervention.
The present disclosure is best understood from the following detailed description when read with the accompanying figures.
All examples and illustrative references are non-limiting and should not be used to limit the claims to specific implementations and embodiments described herein and their equivalents. For simplicity, reference numbers may be repeated between various examples. This repetition is for clarity only and does not dictate a relationship between the respective embodiments. Finally, in view of this disclosure, particular features described in relation to one aspect or embodiment may be applied to other disclosed aspects or embodiments of the disclosure, even though not specifically shown in the drawings or described in the text.
Various embodiments include systems, methods, and computer programs that verify the operation of a software product by injecting faults into the code of the software product. In one example, a testbed system analyzes the software product to determine the error-signifying return codes of various functions. Fault injection points are inserted in the software product to monitor calls to these functions, and when calls are detected, an error return code is returned to the calling statement instead of executing the function. By stepping through the error return codes, the testbed system progressively tests the error domain of the functions in a structured fashion. As merely one advantage, this technique of verification can be easily automated to reduce time spent manually inserting fault injection points. It also improves thoroughness and test coverage and reduces the number of fault injection points compared to conventional fault injection techniques. In addition, it allows users to select alternate error domains to test the most common or most severe faults.
Various program structures can be formed by grouping statements 102. For example, an arbitrary number of statements 102 can be grouped into a function 104. Functions 104 are often used to contain statements 102 that perform related tasks. Compartmentalizing statements 102 into functions 104 improves readability, testability, and protects variables within the function from being modified by unrelated instructions. Statements 102 that initiate a function 104 are said to call the function 104. Conceptually, the calling statement 102 can be thought of as pausing to wait for the function 104 to complete before proceeding. The calling statement may pass values (often referred to as parameters) to the called function 104. Similarly, the function 104 may return values to the calling statement 102. In one such example, a calling statement 102 provides a function 104 with a set of parameters, the function 104 performs one or more instructions on the received parameters, and the function 104 returns a value to the calling statement based on the results. Functions 104 may be written to return particular values that indicate the function 104 did not complete successfully. To provide additional diagnostic information, a function 104 may define a set of error return codes that classify particular failures. Based on the error return code and the associated fault, a calling statement 102 may take corrective action.
In the organizational diagram of
Fault injection deliberately causes some portion of the software product 100 to fail in order to evaluate the effect and any corrective responses. In the ideal case, enough fault injection is performed on the software product 100 so that each statement 102 is executed (100% statement coverage) and each branch is traversed (100% branch coverage).
A method of fault injection is described with reference to
As described in detail below, the method 200 determines a set of error return codes for a function 104 within a software product 300. Fault injection points 302 are added to the software product 300 to detect calls to the functions 104. The fault injection points 302 can execute the function 104 normally and can provide error return codes to the respective calling statements 102 as an alternative to executing the function 104. The software product 300 is then run for multiple iterations. In each iteration, when a function call is detected, a different error return code is provided. By stepping through each of the error return codes, the method 200 can test the response of the software product 300 to the error return codes and determine if the associated faults are handled correctly. During some iterations, the fault injection points 302 allow the functions to execute naturally and provide the appropriate return values to the respective calling statements 102. When the error domain of a given function 104 has been tested, the fault injection process may be repeated for other functions 104 in the software product 300.
Referring first to block 202 of
Referring to block 204 of
In an exemplary embodiment, the characterization of block 204 parses the compiled and/or uncompiled source code of the functions 104 of the software product 300 to determine instructions within the functions 104 that exit the respective functions 104 and return a value. From the instructions, the characterization determines which returns correspond to errors and further determine the associated error return codes. In a further exemplary embodiment, the characterization of block 204 parses a table separate from the software product 300 that lists functions and their corresponding error return codes.
Referring to block 206 of
Referring to block 208 of
Referring to
In the illustrated embodiment, each function 104 has a table entry 402 for each combination of calling statement 102 address and error return code plus one additional table entry 402 corresponding to a natural return and having the value “null”. The error code table entries 402 cause a fault injection point 302 to return the corresponding error code instead of executing the function 104, while the natural table entry 402 causes the fault injection point to allow the function 104 to execute normally. For example, function A has an error domain of (−1, −2, −3, −4, −5). Correspondingly, function A has, for each calling statement 102, five table entries 402 corresponding to error return codes and one table entry 402 corresponding to a natural return. As shown in
In the illustrated embodiment, the fault table data structure 400 includes table entries 402 for each error return code within an error domain. However, in some embodiments, fault injection testing is limited to a subset of the error domain. The subset may be determined based on commonality of a fault, severity of a fault, known errors, user identified faults, and/or other suitable criteria. In some such embodiments, the fault table data structure 400 includes only a subset of the error domain. In further such embodiments, the table entries 402 of the fault table data structure 400 include an identifier of the subset or subsets to which they belong. The identifier may be incorporated into the key and/or the value of the table entries 402.
Referring to
In the illustrated embodiment, each function 104 has a terminal node 508 for each combination of calling statement 102 address and error return code plus one additional terminal node 508 corresponding to a natural return. The error code terminal nodes 508 cause a fault injection point 302 to return the corresponding error code instead of executing the function 104, while the natural terminal node 508 causes the fault injection point to execute the function 104 normally. As in the example above, function A has an error domain of (−1, −2, −3, −4, −5). Correspondingly, function A has, for each calling statement 102, five terminal nodes 508 corresponding to error return codes and one terminal node 508 corresponding to a natural return. As shown in
In the illustrated embodiment, the fault table data structure 500 includes terminal nodes 508 for each error return code within an error domain. However, in some embodiments, fault injection testing is limited to a subset of the error domain. The subset may be determined based on commonality of a fault, severity of a fault, known errors, user identified faults, and/or other suitable criteria. In some such embodiments, the fault table data structure 500 includes only a subset of the error domain. In further such embodiments, the nodes of the fault table data structure 400 include an identifier of the subset or subsets to which they belong. The identifier may take the form of another level of nodes between the root node 501 and the terminal nodes 508 or may be incorporated into the values of other nodes.
Finally, referring to
Referring to block 210 of
In an exemplary embodiment, a fault injection point 302 is added by modifying the source code of the software product 300 and replacing a call to a function 104 with a call to the respective fault injection point 302. In a further exemplary embodiment, a fault injection point 302 is added by incorporating the fault injection point 302 into a wrapper surrounding the function 104 and/or the associated library 108. As can be seen, these techniques are easily automated and easily reversible. Other techniques are both contemplated and provided for.
As described in blocks 212-226 of
Referring to block 216, when a function call is detected in block 214, the testbed system determines, based on the fault table data structure 400, whether to provide an error return code in lieu of executing the function or to execute the function and pass the “natural” return code to the calling statement 102. In various embodiments, this includes querying the fault table data structure 400 based on an identifier of the called function 104, an address of the calling statement 102, and/or a progression identifier. In the illustrated embodiment, each error return code is tested before allowing the function to complete naturally, although it is understood that the natural execution of the function may take place at any time before, during, and/or after testing the error domain.
If it is determined in block 216 that an error return code is to be provided instead of executing the function, in block 218, the error return code to be provided is determined from the fault table data structure 400 and is provided by the testbed system to the calling statement 102. In various embodiments, this includes querying the data structure 400 based on an identifier of the called function 104, an address of the calling statement 102, and/or a progression identifier. If it is determined in block 216 that the function is to be allowed to execute naturally, the function is executed in block 220. This may include passing a return code (which may signify an error or a successful completion) produced by the function 104 to the calling statement 102. Referring to block 222, the software product 300 is monitored to determine one or more responses to the provided error return code or the natural execution of the function.
Referring to block 224, the testbed system determines whether the final fault injection test has been performed for the current calling statement 102 and the current function 104. In some embodiments, each error code in the error domain is tested for the calling statement 102 and at least one natural return is performed before proceeding to the next calling statement 102. In some embodiments, the natural return is omitted. In some embodiments, only a subset of the error domain is tested for each combination of calling statement 102 and function 104. The subset may be determined based on commonality, severity, known errors, user identified faults, and/or other suitable criteria. If it is determined that the final fault injection test has not been performed, execution may be restarted in block 212 and another iteration of blocks 214-224 may be performed. If the final injection test has been performed, in block 226, the next combination of calling statement 102 and function 104 is selected for monitoring, and execution may be restarted in block 212. If the selection of block 226 indicates that all specified fault injection tests have been run for all specified combinations of calling statements 102 and functions 104, testing may be completed.
In the illustrated embodiment, the orderly progression through the error domain described in blocks 212-226 provides the error return codes in sequential order. However, this example is presented merely for clarity. In various embodiments, the error return codes provided in block 218 are provided in ascending order, descending order, and/or in a pseudo-random order where each error return code is provided at least once. Embodiments utilizing a pseudo-random order allow a user to determine whether the sequence of error return codes affects code behavior.
The method 200 improves verification efficiency over traditional fault injection techniques by reducing the number of fault injection points 302, by simplifying the process of adding fault injection points 302, and by an orderly progression through the errors of the error domain. However, the processes of method 200 may be further streamlined by utilizing retry points and early termination points to reduce the number of statements executed in each iteration. A method 700 of identifying and using retry points and early termination points in conjunction with progressive fault injection is described with reference to
As described in detail below, the method 700 identifies a statement 102 that calls a function 104 and determines a point in the flow preceding the calling statement 102 where execution can be restarted without affecting the behavior of the calling statement 102. This avoids restarting execution from scratch for every iteration of the fault injection process. The fewer statements between the retry point and the calling statement 102, the greater the performance benefits. However, efficiency may be balanced against idempotency and retriability. The method 700 may also determine a point in the flow following the calling statement 102 where failure analysis may be terminated without impacting analysis. Early termination also improves efficiency and delivers increasing benefit as the number of iterations rises.
Referring first to block 702 of
Referring to block 704 of
Referring to block 706 of
Referring to block 708 of
Referring to block 710, a retry point 802 may be set based on the environmental analysis of block 706 and, if applicable, the memory snapshot of block 708. The retry point identifies a statement 102 within the software program 800 where execution may begin in order to reliably perform fault injection analysis on a calling statement. Beginning a fault injection process at a retry point 802 that is not the first statement 102 of the software program 800 reduces the number of statements 102 executed in each iteration, thereby reducing both runtime and computing resources. In various embodiments, a retry point 802 is selected to provide retriability and/or idempotency of a respective calling statement 102. In some such embodiments, the retry point is selected to minimize the number of statements 102 executed while still providing retriability and/or idempotency of the calling statement 102. Multiple retry points 802 may be set, each corresponding to a unique calling statement 102.
Referring to block 712, a testbed system may analyze the error handling of the calling statement and/or one or more subsequent statements 102 to determine how to reliably capture the effects of an injected fault without executing all of the subsequent statements 102. Referring to block 714, an early termination point 804 may be set based on the analysis of block 712. The early termination point 804 identifies a statement 102 within the software program 800 where execution may reliably be halted while still providing sufficient diagnostic information for analyzing an injected fault. Similar to a retry point 802, ending a fault injection process at an early termination point 804 that is not a terminal statement 102 of the software program 800 reduces the number of statements 102 executed in each iteration. Accordingly, the early termination point 804 may be selected to minimize the number of statements 102 executed while still capturing the effects of an injected fault. Multiple early termination points 804 may be set, each corresponding to a unique calling statement 102.
Referring to block 716, fault injection is performed using the retry point 802 and/or the early termination point 804. The method 700 of optimizing progressive fault injection is suitable for use with any method of fault injection. When used in conjunction with the progressive fault injection of
The present embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. In that regard, in some embodiments, the testbed system 900 is programmable and is programmed to execute processes including those associated with fault injection testing such as the process of method 200 of
Thus, the present disclosure provides a system and method for verifying a software product using a progressive fault injection technique. In some embodiments, the method for validating software comprises: loading a software product into the memory of a testbed computing system, wherein the software product includes a function and a calling statement of the function; updating a data structure based on an error domain of the function; executing the calling statement for each of one or more error return codes of a subset of the error domain; and for each iteration of executing of the calling statement: detecting a call of the function by the calling statement during the iteration; in response to the detecting of the call of the function, providing an error return code of the one or more error return codes in lieu of executing the function, wherein the provided error return code corresponds to the iteration; and monitoring a response of the software product to the provided error return code. In one such embodiment, the method further comprises: further executing the calling statement; detecting a further call of the function by the calling statement during the further execution of the calling statement; in response to detecting the further call of the function, performing the function; and monitoring a response of the software product to performing the function.
In further embodiments, a computer system includes a processor and a non-transitory storage medium for storing instructions. The processor performs the following actions: loading a software product including a function and a calling statement of the function into a memory of the computer system; the method for fault injection testing comprises: receiving a software product including a function and a calling statement of the function; determining a set of error return codes of the function; detecting a call of the function by the calling statement; in response to the detecting of the call of the function, providing an error return code of the set of error return codes without executing the function; and monitoring a response of the software product to the error return code. In one such embodiment, the providing of the error return code of the set of error return codes includes providing each error return code of the set of error return codes. Furthermore, in one such embodiment, the set of error return codes is determined based on at least one of commonality, severity, known errors, and user identified faults
In yet further embodiments, the apparatus comprises: a non-transitory, tangible computer readable storage medium storing a computer program, wherein the computer program has instructions that, when executed by a computer processor, carry out: identifying a function of software product and further identifying a calling statement of the function; creating a data structure based on an error domain of the function; iteratively executing the calling statement for each of one or more error return codes of a subset of the error domain; and for each iteration of iterative executing of the calling statement: using a fault injection point, detecting a call of the function by the calling statement during the iteration; in response to the detecting of the call of the function, providing an error return code of the one or more error return codes by the fault injection point, wherein the provided error return code corresponds to the iteration; and monitoring a response of the software product to the provided error return code.
The foregoing outlines features of several embodiments so that those skilled in the art may better understand the aspects of the present disclosure. Those skilled in the art should appreciate that they may readily use the present disclosure as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions, and alterations herein without departing from the spirit and scope of the present disclosure.