In developing software, it is typical that errors or “bugs” in the code will be discovered. Hopefully, the errors are discovered during software testing before the software is released to avoid user frustration or the need to create and apply patches, fixes, or corrected versions. Software testing may involve simulated user or multi-user interaction with the software being developed, during which a script of test data is applied to the software to simulate actual use and, hopefully, identify errors in the software.
In testing software, a test script including a number of simulated user instructions is applied to the software. At the conclusion of the test script, it is determined if an error occurred. The results of the test generally are recorded to inform the developers of the software whether the software includes any bugs. At the conclusion of the test, the computer or computing environment used to perform the test may be reallocated to run another test of the same software or to run a test on a different software program.
Unfortunately, when software fails during a test, even if the test script included relatively few instructions, once the complete test script has been executed, it may prove difficult to determine what were the circumstances or causes of the failure. Even knowing what type of failure occurred during execution of the test still may result in a great number of possible problems that may have to be investigated and addressed in order to resolve the failure.
The difficulty in resolving errors or bugs may be particularly acute in the case of intermittent failures. Intermittent failures, by definition, do not occur each time that a software program is run, each time it is tested, or even each time the same software program is subjected to the same test. Such failures may be caused by a series of events that sometimes cause processes within a computing environment to conflict or by a coincidence of events that sometimes occur in a computing environment that result in an occurrence of failure in another computing environment. Because it may be difficult to identify the circumstances that result in the occurrence of such an intermittent failure, it may be more difficult to resolve than a failure that occurs regularly.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The present disclosure is directed to computer-implemented methods, computer-readable media and a system for facilitating debugging of a software program by breaking, and optionally holding, execution of the software program when a failure occurs. Implementations of break and hold preserve a state of a computing environment on which the software program has failed. Thus, by being able to examine the state of the computing environment existing upon the occurrence of the failure, including the condition of various processes and values, it may be easier to resolve the error or errors that resulted in the failure.
In one implementation, upon occurrence of a failure during the execution of a first software program in a first computing environment, execution of the first software program breaks. A first state of the first computing environment existing upon the breaking of the execution of the first software program is then held. A failure notification is generated to signal the failure to a monitoring system. The monitoring system accesses hold information to determine whether the first computing environment should hold its current state and whether one or more other computing environments interacting with the first computing environment should also hold their states.
Breaking of the execution and holding the state may occur during the performance of a test of the first software program. A test client may be operated in the first computing environment, in which the test client causes the break in execution, the holding of the state, and the generation of the failure notification. In one implementation, the test client is configured to cause the breaking of the execution of the first software program following execution of a test instruction resulting in the occurrence of the failure and prior to executing a next test instruction. In one implementation, a test support system is in communication with the test client. The test client may engage the test support system to perform the test of the first software program.
The holding of the state may continue until the monitoring system directs the test client to continue holding the state, discontinue holding the state, or after the end of a time interval. The monitoring system may direct the test client to continue the holding of the first state based upon failure data accessible to the monitoring system. The failure data may include submission data included in initiating the execution of the first software program indicating that the execution of the first software program is to be held upon occurrence of a first failure. Alternatively the failure data may include failure tag data that indicates the execution of the first software program is to be held upon occurrence of a selected failure type represented by the failure in the execution of the first software program.
A second software program executing in a second computing environment that interacts with the first software program also may be held. The monitoring system initiates a break in the execution of the second software program, and causes a second state of the second software program existing at the time of the break to be held. Software programs that interact may include, for example, a client application interacting with a server application, or multiple client applications that interact with a common server.
In another implementation, upon the break and hold in execution, execution of the first software program may be terminated, and the computing environment will log information about the existing state.
These and other features and advantages will be apparent from reading the following detailed description and reviewing the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive. Among other things, the various embodiments described herein may be embodied as methods, devices, or a combination thereof. Likewise, the various embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. The disclosure herein is, therefore, not to be taken in a limiting sense.
In the drawings, like numerals represent like elements. The first digit in the reference numerals refers to the figure in which the referenced element first appears.
This detailed description describes implementations of breaking and optional holding on failure. Although implementations of breaking and holding would prove useful even in executing a software program in a single computing environment, examples referenced in the following discussion contemplate a testing environment in which a plurality of networked computing environments execute a plurality of tests of one or more software programs. As is understood by those skilled in the art, performing tests on a software program executing in a plurality of computing environments allows for the software program to be tested more thoroughly and efficiently.
Implementations of breaking and holding may be supported by a number of different computing environments on which software may be executed or tested.
Referring to
The computing device 110 may also have additional features or functionality. For example, the computing device 110 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
The computing device 110 also includes one or more communication connections 180 that allow the device to communicate with other computing devices 190, such as over a network or a wireless network. The one or more communication connections 180 represent an example of communications media. Communications media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” may include a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. The term computer readable media as used herein includes both storage media and communication media.
The testing environment 200 includes a plurality of test machines 210. Each of the test machines 210 includes a computing environment including at least a portion of the attributes and functionality of the computing environment 100 described with reference to
The test machines 210 are in communication with one or more test servers 220 that administer the operation of the software tests. For example, the test servers 220 identify which of the test machines 210 will run which tests, initiate the tests, and report the results. The test servers 220 are in communication with a plurality of test workstations 230 used by personnel who desire to test software programs. When a user of a test or workstation 230 submits a test, the test servers 220 prioritize, schedule, and prepare the tests for execution. The test servers 220 also are in communication with an administrator workstation 240 that allows for control and management of the test servers 220 as well as the test machines 210 and the test workstations 230.
The test servers 220 and data stores 260-280 also may serve as a monitoring system for overseeing the testing executed on the plurality of test machines. Operation of the monitoring system and its interaction with the test machines is described in U.S. patent application Ser. No. ______, for “TRACKING DOWN ELUSIVE INTERMITTENT FAILURES,” filed on Jan. ______, 2007, the disclosure of which is incorporated in this application by reference.
The test servers 220 and, in turn, the test machines 210, are in communication with a plurality of data stores including test data 260, failure data 270, and error data 280. The test data 260 includes, for example, test scripts including the instructions used to provide input or commands to test the software being tested. The failure data 270 specifies programs or failure types the testing personnel wish to investigate, as is described further below. The error data 280 is a repository for storing information about failing programs and failures that occur, such as logs written by failing machines.
Implementation of a Test Client within a Test Machine
In one implementation of a testing environment 200 (
In one implementation, the software layers 310 also include a test client 340. The test client 340 is, in effect, a software harness executing under the control of the operating system 320. The software program 330 executes within the control of the test client 340. The use of the test client 340 provides at least two advantageous features. First, the test client 340 engages test scripts stored in the test data 260 and applies the simulated input included in the test scripts to the software program 330. Thus, the test client 340 in combination with the test data 260 (via the test servers 220) controls the execution of the tests performed on the software program 330.
Second, because the software program 330 executes within the control of the test client 340, should a failure occur during the execution of the software program 330, the test client can break the execution of a software program 330 when a failure occurs. Thus, the test client 340 can preserve the state of the computing environment existing on the test machine 210 attending the occurrence of the failure. While preserving that state, the test client 340 can communicate with other systems within the testing environment 200 (
One skilled in the art will appreciate that, typically, once a test script is applied in testing a software program, the full test script is executed on the software program and any failure that occurs is reported at the conclusion of the test script's execution. As a result, the state of the computing environment at the point when the failure occurred, including the status of any processes, pointer and register values, and other conditions, may be changed as a result of instructions applied by the test script after the occurrence of the failure. By being able to hold the state of the computing environment upon the occurrence of the failure without executing further instructions in the test script, testing personnel may be able to more readily identify the causes of the failure.
At 420, it is determined if a failure has occurred during the execution of the test. If not, the flow diagram 400 loops to 410 to continue monitoring the execution of the software program. On the other hand, if it is determined that a failure has occurred, at 430, execution of the software program breaks. At 440, the state of the computing environment existing upon the breaking of execution is held. As is described further below, the state existing upon the occurrence of the failure can be held for a predetermined period of time or until the local computing environment receives instructions to continue holding the state or to resume execution.
At 450, a failure notification is generated to signal the occurrence of the failure to a monitoring system. For example, in the testing environment 200 of
At 510, the test client is invoked on the computing system on which the software program is being tested. At 520, the test client engages a test control system to test the software program. In the testing environment 200 of
At 530, it is determined if a failure has occurred. If it is determined that no failure has occurred, the flow diagram 500 loops to 520 to continue to monitor for the occurrence of a failure. On the other hand, if it is determined at 530 that a failure has occurred, at 540, execution of the software program breaks. At 540, execution of the computing environment existing at the point of breaking is held. As previously described, a suitable test client is configured to control the execution of the software program so that the test client can preserve the states of processes operating in the computing environment upon the occurrence of a failure.
At 550, a failure notification is generated to signal the occurrence of the failure. With reference to the testing environment 200 of
In the implementation described in
If a monitoring system determines that the computing environment for which a failure notification was generated at 550 should not be held, in one implementation the monitoring system directs the computing environment to resume execution of the software program. At 560, if it is determined that the computing environment has been directed to resume execution, at 590, the computing environment resumes execution.
If it is determined at 560 that the computing environment is not directed to resume execution, at 570, it is determined whether the state of the computing environment and the cause of the failure have already been investigated. In one implementation, the computing environment experiencing the failure may be held at the point of failure until the failure has been investigated, and then will continue execution. If it is determined that the state of the computing system and the cause of the failure have been investigated, at 590, execution of the software program resumes on the computing environment.
If it is determined at 570 that the state of the computing environment has not yet been investigated, at 580, it is determined whether a time limit for holding execution of the computing environment has been reached. In one implementation, once a computing environment is held upon the occurrence of failure, the computing environment will hold that state for a predetermined period unless otherwise instructed to continue to hold its state or to resume execution. By imposing a time limit, personnel potentially interested in investigating the occurrence of failure are given time to study the failure but, if they do not act within the allotted time, the computing environment will resume execution so as not to unnecessarily tie up the computing environment and keep it from being used for other purposes.
If it is determined at 580 that the time limit has not been reached, the flow diagram 500 loops to 560 to determine if the computing system has been directed to resume execution then to 570 to determine if the failure already has been investigated. In this manner, the flow diagram 500 continues to loop until the time limit is reached or the computing environment is directed to resume execution, whichever comes first. If it is determined at 580 that the time limit has been reached, at 590, execution resumes. After the computing environment resumes execution at 590, the flow diagram 500 loops to 520 to continue the test of the software program.
In the implementation described in
As previously described with reference to
At 610, the monitoring system waits to receive notification of the occurrence of a failure in one of the computing environments it monitors. At 620, it is determined whether a failure notification has been received. If not, the flow diagram 600 loops to 610 where the monitoring system continues to wait for notification of the occurrence of a failure. On the other hand, if it is determined at 620 that a failure notification has been received, at 630, failure data is consulted to determine if the computing environment reporting the failure should be held at its current state.
The failure data 270 (
The failure data 270 also may include failure tag information that identifies a type of failure that testing personnel wish to investigate. For one example, a certain type of intermittent failure resulting in a particular failure type may occur frequently enough that resolving the failure becomes a priority. Thus, the failure data 270 may specify that, when a test machine 210 executing a software program reports the failure type of interest, that failing machine will be held for further investigation. Again, alternatively, the failure data 270 may specify that machines experiencing the selected failure not be held but instead instruct that the state of the test machine be stored in the error data 280 for later investigation. Also, the failure tag data may specify that a selected number of test machines on which an error of the identified failure type be held for subsequent investigation.
Once the failure data is consulted at 630, at 640, it is determined if the computing environment is to be held. If not, at 650, the computing environment will be directed or caused to continue its execution. Causing the computing environment to continue execution may be implemented in a number of ways. For example, as previously described with reference to
On the other hand, if it is determined at 640 that the computing environment is to be held, at 660, the monitoring system causes the computing environment to hold its current state. Again, the monitoring system can cause the computing environment to hold a number of ways. For one example, the monitoring system may generate an instruction to the computing environment that instructs the computing environment to continue holding its state. On the other hand, the computing environment may be configured to continue to hold its state, indefinitely or for a predetermined interval as described with reference to
After causing the computing environment to hold at 660, at 670, it is determined if the failure indicates that one or more additional computing environments should also be held. For example, the software program being tested on the computing environment may be an e-mail client program that interacts with an e-mail server program executing in another computing environment or with other e-mail clients executing in still other computing environments. Because the failure occurring may result from the interaction with these other systems, and the failure may actually lie with one or more of those other systems, it may be appropriate to hold those other computing environments as well as the computing environment for which the occurrence of failure was reported. Whether holding other computing environments is appropriate in which of those environments should be held may be described in the failure data 270 (
If it is determined at 670 that no additional computing environments are to be held, the flow diagram 600 loops to 610 to await notification of occurrence of another failure. However, if it is determined at 670 that one or more additional holds are appropriate, at 680, the monitoring system causes the additional indicated computing environments to also be held. The additional computing environments can be caused to hold in a number of ways. For one example, if those other computing environments support a test client program as previously described, a hold message sent from the monitoring system will cause the test client to hold its current state at the point of receiving the hold message. Alternatively, when a number of software programs interact with one another, as each of these software programs executes, it may check a status of a flag, recurring message, or other indicator maintained by the monitoring system indicating that each should continue execution. In such an environment, when an indication to continue is not communicated, each of the computing environments executing the interacting software programs may be configured hold their execution until they instructed to continue.
When one or more computing environments are held for investigation, the held computing environment or environments are not available to continue the testing or the work for which the held environments were allocated. At 690, it is determined whether one or more replacement computing environments are available to replace the held computing environment or environments. Whether replacement computing environments are available to be allocated to replace the held computing environments may be a factor of whether there are unused computing environments to be allocated as replacements and/or whether the testing or work that was being performed by the held computing environments is of sufficient priority to merit allocation of replacement computing environments. Information about whether available computing environments should be allocated may be maintained in the failure data 270 (
If it is determined that replacement environments are not to be allocated, the flow diagram 600 loops to 610 to wait for notification of an occurrence of failure. However, if it is determined that replacement computing environments are to be allocated, at 695, one or more replacement computing environments is allocated to replace the computing environments being held.
The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.