For information management (IM) applications such as data protection, one major administrative task is to maintain schedules for jobs that have to be executed on a regular base. In large enterprise or cloud environments, it can become difficult for administrators to ensure that the scheduled jobs will be processed as expected and that there are no conflicts with resources that are allocated.
Generally, within large enterprise or cloud environments, thousands of jobs may have to be scheduled. These jobs may utilize multiple resources during their execution, including, for example, devices, network bandwidth, resources on the management server etc. Moreover, job schedules may be created and modified by multiple users in parallel. Due to such factors, it can become challenging to define new job schedules that do not cause resource conflicts with existing job schedules. As a result, more and more jobs are either delayed or even fail at the planned execution time due to lack of resources needed for their execution. Thus job execution can become unpredictable and defined service level objectives may be violated. Furthermore, optimal utilization of an expensive hardware infrastructure cannot be ensured since no hint is given to the user about expected resource usage and how it may be improved. Moreover, the effect of temporary resource shortages (e.g. due to device or network failures) cannot be determined unless related jobs are actually affected during runtime. In addition, known job schedule solutions cannot simulate beforehand the effect of planned changes to the environment. The result is thus seen as soon as the changes are actually applied.
For example, a company's IT infrastructure may have thousands of server systems that have to be backed up using thousands of backup devices. The complexity of manually scheduling backup jobs within this environment can be very inefficient. The scheduling may be performed on a trial and error basis, which can take an extensive amount of time and effort.
As discussed above, in today's environment, resource conflicts are generally discovered at job execution time. One way of dealing with these conflicts is to either queue up jobs until all needed resources become available or to cancel jobs in case resources do not become available within a predetermined time period. For example, if jobs process adequately, a user may conclude that the job scheduling was appropriate. However, if there is a conflict, job execution may have to be modified at job execution time to modify, for example, timing or resource allocation. Either of these options can cause delay in job processing. Device utilization reports may be generated based on history data, but this also does not help in case existing job schedules have to be changed or new jobs have to be added.
The embodiments are described in detail in the following description with reference to the following figures.
For simplicity and illustrative purposes, the principles of the embodiments are described by referring mainly to examples thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It is apparent that the embodiments may be practiced without limitation to all the specific details. Also, the embodiments may be used together in various combinations.
A job plan verification system is described herein and provides a user with the ability to verify a job roster in a very flexible and automated manner before jobs are actually executed. The system provides verification of job schedules for a defined planning scope using a flexible rule-based approach. As a result, the service quality is improved while the labor time and effort for the job plan maintenance is minimized.
As described in greater detail below, the job plan verification system may generally receive data from an IM system that performs job scheduling and job execution. The IM system may include an IM application that includes job details such as assigned resources and defined schedules. The IM system may also include IM data about prior job executions. The job plan verification system may include a job analysis module to collect job details such as assigned resources and defined schedules from the IM application. A job history management module may collect data about prior job executions from the session history of the IM application, which feeds into the IM data. Based on this historical data, the estimated job duration may be calculated. An environment data module may collect environment specific data related to, for example, network topology, server capacity, and capacities of connections used by the job resources (e.g. devices). Job data, which includes the schedule information collected by the job analysis module and the job history management module, and environment data may be used by a facts creator module to generate planned execution object instances within a defined planning scope. Alternatively, the facts creator module may use just the job data to generate planned execution object instances within the defined planning scope. Each planned execution object instance may represent one execution of a job within the planning scope having a defined starting date, time and duration. The generated planned execution object instances and the job details collected by the job analysis module may be asserted into a rule-based verification module. The verification module may be executed to generate a verification report that provides details about the different jobs, possible conflicts and resource utilization.
The job plan verification system provides for detection of job resource conflicts before they actually happen. In other words, the job plan verification system simulates and models future job executions and possible conflicts within a defined planning scope. The rule-based verification of job plan schedules offers a flexible mechanism to automatically check existing job schedules for resource conflicts within a defined future time period. New constraints may be readily added by defining additional rules. This automated job schedule verification gives the user the opportunity to prevent possible resource conflicts proactively before they actually happen during job execution.
As discussed above, the job plan verification system may also be used to simulate what-if scenarios. For example, the positive effect of adding a new device on an anticipated future resource conflict may be simulated. The failure of one or more resources may also be simulated. Furthermore, the addition of new jobs may be simulated in order to ensure that no resource conflicts are generated. Certain time periods (e.g. year end processing) may be verified a long time ahead. For example, the job plan verification system may be used to determine which jobs would be affected by a failure, for example, within the next 24 hours or at another time period. Reports such as the expected device allocation, server load or network utilization may be readily generated for a specified future time period using the same, rule-based approach. The job plan verification system thus provides flexibility by the rule language to check for resource conflicts, certain priority conditions and to create reports on future job executions. These aspects also facilitate adaptation to new applications or extension with new rules by providing a new set of rules. A user may react on expected issues at a convenient time before job processing, rather than out of necessity based on job processing conflicts with existing jobs.
Referring to
The generated planned execution object instances and the job details collected by the job analysis module 104 may then be asserted into the rule-based verification module 108. The facts creator module 107 may then create additional helper objects. Helper object instances may provide additional information to the rule system that is not included in the planned execution objects themselves or are derived from them. Thus the helper object instances may be related to the planned execution object instances and may be used by the rules to perform validations. The facts creator module 107 may create plan exclusion object instances (e.g. days excluded from the job schedule) based, for example, on the job schedule definition provided by the job analysis module 104. Environment specific object instances (e.g. server capabilities) may be created based on the data provided by the environment data module 106. Thereafter, the verification module 108 may then execute different types of rules from rule set 109 according to the rule flow illustrated in
The validation rules 141 may perform the actual verification of the job schedules against constraints. Validation rules may check for all types of potential resource conflicts, as long as the data is made available to the rule system. For example, the validation rules 141 may check for device conflicts, job internal schedule conflicts, and if there are too many jobs active on one server in parallel. The validation rules 141 may also check for device allocations that are acceptable, or if jobs are within a certain backup window, for example, to make sure that certain backups happen during a certain time window (e.g. between 8 pm and 8 am on weekdays).
The reporting rules 142 may be used to generate reports based on the facts available. For example, based on the device allocation object instances generated by an initialization rule, a reporting rule may calculate the estimated device utilization within the defined planning scope. Using this data, a free device slots report may be generated, helping administrators schedule new jobs accordingly. For example, the reporting rules 142 may create device allocation statistic, for example, stating that a device is utilized a certain percentage within a planning scope. This may also facilitate determination of which devices are used at capacity and which are not used at all to allow the unused devices to be used for future jobs.
Referring to
The rule-based definition of resource conflicts and other conditions provides several options. For example, the application data model may be available in a JAVA object structure using, for example, standard getter and setter methods. These data objects may be asserted as facts to the system 100 before the rules are executed, for example, if the rule system would have access to network connectivity data it could verify if certain jobs running in parallel on devices attached to the same network connection would generate a potential resource issue on the corresponding network connection. Even cross-application resource conflicts may become detectable if job schedule data would be shared with the system 100. For example, a source for such data may include HEWLETT PACKARD'S Universal Configuration Management Database software (UCMDB). The CMDB software may automatically maintain accurate, up-to-date information on the relationships between infrastructure, applications, and business services.
Depending on the length of the planning scope, the number of jobs and the schedule intervals, several planned execution objects may be generated, occupying a large amount of memory 406 (see
In addition to the verification of already scheduled jobs, the system 100 may also be used to simulate the effect of modifications to existing job schedules or additions of new job schedules to the environment. This may also include the simulation of resource outages. For example, in case of a device failure, a report may be generated showing all planned job executions within the next 24 hours that will be affected. The planning scope may also be implemented as a kind of moving window having a starting date defined somewhere in the future. For example, if combined with a user interface utilizing, for example. GANTT CHARTS to visualize the job roster, a user may verify specific time periods in the future, for example when the year-end processing is done and therefore resource conflicts are more likely.
With regard to the foregoing rule based approach, the approach may be extended to additional resources. For example, constraints may be placed on application servers to limit the number of jobs that can be handled in parallel (e.g. 10 jobs in parallel). In this regard, a user can be preemptively warned if greater than a predetermined number of jobs (e.g. greater than 10 jobs) would be running in parallel.
An example of an implementation of the system 100 with HEWLETT PACKARD'S Data Protector is shown in
Referring to
At block 301, using the system 100, the job verification process may begin by defining a planning scope. This planning scope may begin at the current date or at any future date and end <n> days later. In an example, the verification may be performed just for all planned job executions within this defined planning scope, and all planned job executions outside of the planning scope may be ignored.
At block 302, the system 100 may receive data from the IM system 101 that performs job scheduling and job execution. Referring to
At block 303, the job history management module 105 may receive data about prior job executions from the session history of the IM application 102 which may be fed into the IM data. Based on this historical data, the estimated (i.e. expected) job duration may be calculated.
At block 304, the schedule information collected by the job analysis module and the job history management module may then be combined as job data.
At block 305, the environment data 111 from the environment data module 106 may be obtained and asserted into working memory. For example, the working memory may be the memory 406 (see
At block 306, information from the job data may be used by the facts creator module 107 to generate planned execution object instances within a defined planning scope. Alternatively, information from the job data and the environment data may be used by the facts creator module 107 to generate planned execution object instances within the defined planning scope. The job data may describe the jobs including their schedules, expected duration, and resources that are being utilized. Based on this data, facts may be generated by the facts creator module 107.
At block 307, facts in the form of object instances which represent the different jobs and the representations may be presented to the verification module 108. Each planned execution object instance may represent one execution of a job within the planning scope having a defined starting date and time and duration. The generated planned execution object instances and the job details collected by the job analysis module may be asserted into the rule-based verification module 108.
At block 308, the verification module 108 may obtain rules from the rule set 109. The rule set 109 may be divided into different groups as described above. For example, referring to
At block 309, upon execution, the verification module 108 may generate a verification report 110 that provides details about the different jobs, possible conflicts and resource utilization.
The computer system 400 includes a processor 402 that may implement or execute machine readable instructions performing some or all of the methods, functions and other processes described herein. Commands and data from the processor 402 are communicated over a communication bus 404. The computer system 400 also includes a main memory 406, such as a random access memory (RAM), where the machine readable instructions and data for the processor 402 may reside during runtime, and a secondary data storage 408, which may be non-volatile and stores machine readable instructions and data. The memory and data storage are examples of computer readable mediums. The memory 406 may include modules 420 including machine readable instructions residing in the memory 406 during runtime and executed by the processor 402. The modules 420 may include the modules 104-108 of the system 100 shown in
The computer system 400 may include an I/O device 410, such as a keyboard, a mouse, a display, etc. The computer system 400 may include a network interface 412 for connecting to a network. Other known electronic components may be added or substituted in the computer system 400.
While the embodiments have been described with reference to examples, various modifications to the described embodiments may be made without departing from the scope of the claimed embodiments.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2011/045383 | 7/26/2011 | WO | 00 | 10/7/2013 |