Claims
- 1 A method for automatically managing a distributed software test system, wherein the test system includes a network of test computers for execution of a plurality of test jobs and at least one client computer for controlling the test computers, the method comprising the steps of:
(a) providing the test computers with a service program for automatically registering the availability of the computer and the attributes of the computer with the client computer; (b) comparing execution requirements of each test job with the attributes associated with the available computers; (c) dispatching the test jobs to the computers having matching attributes; (d) providing the service programs with a heartbeat function so that the service programs transmit signals at predefined intervals over the network to indicate activity of each test job running on the corresponding computer; (e) monitoring the signals from the service programs and determining a failure has occurred for a particular test job when the corresponding signal is undetected; and (f) automatically notifying the user when a failure has been detected.
- 2 The method of claim 1 wherein the step of determining whether a failure has occurred further includes monitoring how long the test jobs wait to be executed by the service programs.
- 3 The method of claim 1 wherein the step of determining whether a failure has occurred further includes checking for run-time errors by comparing snapshots of test logs produced by the test jobs.
- 4 The method of claim 1 wherein the step of determining whether a failure has occurred further includes monitoring maximum and minimum time allowed for executing test jobs.
- 5 The method of claim 1 further including the step of recovering from the failure by automatically rescheduling the failed test job for execution.
- 6 The method of claim 5 further including the step of rescheduling the failed test job on a different computer.
- 7 The method of claim 6 further including the step of increasing the priority of the test job to ensure the service programs complete that test job in a timely manner.
- 8 The method of claim 1 wherein step (d) further includes the step of launching a test management system (TMS) from each of the service programs, and using the TMS to run the test jobs.
- 9 The method of claim 8 wherein the TMS generates and passes the heartbeat signals to the service program.
- 10 The method of claim 9 further including providing a client-service mode for TMS in which a TMS-server is started that invokes a server test program and notifies a TMS-client to start and invoke a client test program, whereby once the server and client test programs are started, the server test program and the client test program communicate with each other.
- 11 The method of claim 1 wherein step (f) further includes the steps of increasing the priority of the test job and rescheduling the test job to ensure that the service programs complete the scheduled test job in a timely manner.
- 12 The method of claim 1 wherein step (f) further includes the step of notifying the user if the percentage of test failures is greater than or equal to than a predetermined maximum percentage rate of test failures.
- 13 An automated test management system for testing software applications, comprising:
multiple computers connected to a network wherein the computers have a variety of hardware and software computer attributes; a lookup service accessible over the network for storing availability and attributes of the computers; a service program running on each of the computers for registering with the lookup service and publishing the availability and the attributes of the corresponding computer; at least one central database for storing executable versions of the test jobs, computer attributes required for each test job to run and results and logs produced during execution of these test jobs; a client software running on at least one of the computers in the network for creating a client that controls and monitors the service programs; and a communications protocol for allowing the client software, the service programs and the lookup service to communicate with one another over the network, wherein when the client determines that test jobs in the central database need to be run, the client queries the lookup service, finds available computers having attributes matching the required attributes of the test jobs and dispatches the test jobs to the corresponding computers, wherein once the service programs receive the test jobs, the service programs initiate execution of the test jobs and transmit heartbeat signals indicating activity of each running test job over the network such that the client can automatically detect test failures by monitoring the heartbeat signals and determine whether a failure has occurred for a particular test job when the corresponding heartbeat signal is not present, wherein upon detecting the failure, the client automatically notifies the user of the failure and reschedules the test job for execution.
- 14 The system of claim 13 wherein when the client determines that one of the test jobs in the central database needs to be run, the client checks for starvation by starting a timer to keep track of how long the test job waits for one of the service programs to be available.
- 15 The system of claim 14 wherein if it is determined that a configurable maximum allowed service search time has elapsed, the user is notified, and the test job is rescheduled.
- 16 The system of claim 15 wherein after dispatching the test job, the client starts monitoring the test job to ensure that the test job does not take more than predetermined time for execution.
- 17 The system of claim 16 wherein if the maximum execution time has elapsed, then execution of the test job is killed, the user is notified and the test job is rescheduled.
- 18 The system of claim 17 wherein if the heartbeat for one of the test jobs is present, then a current snapshot of the log for the test job is compared with a previous log snapshot, and if there is no difference then it is assumed that the test job is no longer making progress and the test job is killed.
- 19 The system of claim 18 wherein once the test job finishes execution and if the job execution time was shorter than a minimum time, it is assumed that there was an error and the user is notified and the test job is rescheduled.
- 20 The system of claim 13 wherein the client includes a graphical user interface, a lookup monitor, a test manager and a test manager.
- 21 The system of claim 14 wherein the graphical user interface (GUI) allows the user to create and update test jobs in the central database and initiates the process of dispatching test jobs to matching computers.
- 22 The system of claim 21 wherein the lookup monitor checks for the existence of the lookup service and monitors the lookup service to determine if any of the services programs on the network have been updated.
- 23 The system of claim 22 further including a local database that includes a task repository, an in-process-task repository and a completed task repository.
- 24 The system of claim 23 wherein the task manager manages the local database by scanning the central database for previous test jobs and any newly added test jobs and creates a file for each of the test jobs in the task repository, wherein each file includes the computer attributes required for the test job, a priority assigned to the test job, and a reference to the executable version of the test job stored in the central database.
- 25 The system of claim 24 wherein the in-process-task repository stores a reference for each test job currently executing, and the completed task repository stores a reference for each completed test job.
- 26 The system of claim 25 wherein when a user requests the status of any of the test jobs via the GUI, the local database is queried and the results are returned to the GUI for display.
- 27 The system of claim 26 wherein service programs are started on the computers as part of the boot process.
- 28 The system of claim 27 wherein each of the service programs creates an environment to run the test jobs and launches a test management system (TMS), which in turn, runs the test jobs.
- 29 The system of claim 28 wherein upon detection of a failure, the client reschedules the test job for execution on a different computer.
- 30 A computer-readable medium containing program instructions for managing and monitoring software test jobs running on a network of computers, the program instructions for:
(a) receiving from a user a plurality of test jobs, each requiring a particular set of computer attributes to run; (b) providing at least a portion of the computers with a respective service program that automatically registers the computer's availability and attributes with a lookup service on the network; (c) for each test job, searching the lookup service for a registered computer having attributes matching the attributes required by the test job, and dispatching the test job to that computer; (d) using the service programs to start execution of the test jobs on the computers; (e) storing test results and test logs for each test job in a database accessible over the network; (f) determining if each test job is active during test execution by monitoring a heartbeat signal transmitted from the service programs for each test job and determining that a failure has occurred for a particular test job when the corresponding heartbeat signal is not present; (g) notifying the user of the failure and rescheduling the test job for execution on a different computer; and (h) allowing the user to monitor status and results of any of the test jobs from at least one of the computers on the network.
CROSS REFERENCE TO RELATED APPLICATION
[0001] This application is claiming under 35 USC 119(e) the benefit of provisional patent application Serial No. 60/318,432, filed Sep. 10, 2001.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60318432 |
Sep 2001 |
US |